Senior Site Reliability Engineer (SRE) in Ireland, Scotland at Jobgether
Explore Related Opportunities
Job Description
This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Senior Site Reliability Engineer (SRE) in Ireland.
Join a cutting-edge engineering environment shaping the future of AI cloud infrastructure at global scale. This role offers the opportunity to work on highly distributed, high-performance systems powering next-generation AI and machine learning workloads for enterprises and developers worldwide. As a Senior Site Reliability Engineer, you will play a critical role in ensuring reliability, scalability, and operational excellence across complex cloud platforms. You’ll collaborate with top-tier engineers to solve challenging infrastructure problems involving compute, networking, storage, orchestration, and automation technologies. The environment is fast-paced, innovation-driven, and built around ownership, continuous learning, and impactful engineering. This is an excellent opportunity for experienced SRE professionals eager to contribute to mission-critical AI infrastructure while advancing their expertise in modern cloud-native systems.
- Ensure high availability, scalability, fault tolerance, and uninterrupted operation of critical cloud infrastructure and services.
- Design, implement, and improve CI/CD pipelines and automation workflows to enhance deployment efficiency and system reliability.
- Manage and optimize containerized environments and orchestration systems using Kubernetes, Docker, Helm, and related technologies.
- Build and maintain infrastructure-as-code solutions using tools such as Terraform, Ansible, or Salt.
- Monitor system health, troubleshoot production incidents, and proactively improve performance, observability, and resilience.
- Collaborate with cross-functional engineering teams to solve complex infrastructure and backend reliability challenges.
- Contribute to the design and operation of high-load distributed systems supporting AI and machine learning workloads.
- Continuously evaluate and implement modern cloud technologies and operational best practices.
Requirements:
- Strong professional experience as a Site Reliability Engineer, DevOps Engineer, or Infrastructure Engineer in cloud-native environments.
- Solid programming skills in languages such as Go, Python, C++, or similar technologies.
- Strong understanding of algorithms, data structures, operating systems, and distributed computing principles.
- Deep hands-on expertise with Unix/Linux systems administration and network technologies.
- Proven experience with containerization and orchestration tools including Docker, Kubernetes, and Helm.
- Experience with configuration management and infrastructure automation tools such as Terraform, Ansible, or Salt.
- Familiarity with CI/CD processes, automation frameworks, and scalable cloud infrastructure operations.
- Strong troubleshooting, analytical, and problem-solving capabilities in high-performance production environments.
- Excellent collaboration and communication skills within distributed engineering teams.
- Bonus points for backend development experience, large-scale distributed systems expertise, or exposure to multiple cloud platforms.
Benefits:
- Competitive compensation package aligned with experience and expertise.
- Flexible remote work environment supporting work-life balance across Europe.
- Career growth opportunities with ongoing learning and technical development support.
- Opportunity to work on impactful AI and cloud infrastructure projects at global scale.
- Collaborative, innovative, and engineering-driven culture focused on ownership and autonomy.
- Exposure to highly skilled international teams across multiple regions and disciplines.
- Fast-paced environment offering meaningful technical challenges and long-term career advancement.
- Inclusive workplace committed to diversity, equal opportunity, and professional growth.