Staff Site Reliability Engineer in Switzerland at Jobgether
Explore Related Opportunities
Job Description
This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Staff Site Reliability Engineer in Switzerland.
Join a highly collaborative engineering environment where reliability, scalability, and automation are central to delivering world-class developer experiences at global scale. In this role, you will help design and maintain resilient infrastructure systems supporting millions of users worldwide, while driving operational excellence across distributed cloud environments. You will work closely with engineering and infrastructure teams to improve observability, optimize performance, and build automation that reduces operational complexity. This position offers the opportunity to lead incident response initiatives, shape reliability standards, and influence infrastructure strategy across the organization. It is an ideal opportunity for a senior SRE professional who thrives in fast-moving environments and enjoys solving complex distributed systems challenges while mentoring teams and promoting engineering best practices.
- Design and implement comprehensive observability solutions, including monitoring, logging, tracing, dashboards, and alerting systems to improve visibility into infrastructure health and performance.
- Define, track, and improve Service Level Objectives (SLOs) and Service Level Indicators (SLIs) in collaboration with engineering and product teams.
- Lead high-severity incident response efforts, coordinate troubleshooting activities, conduct blameless post-mortems, and implement long-term preventive solutions.
- Build and maintain infrastructure automation and Infrastructure as Code solutions using tools such as Terraform or Pulumi.
- Develop self-healing systems and automation processes that reduce operational overhead and improve system resilience.
- Optimize large-scale Kubernetes and cloud-native deployments, focusing on scalability, reliability, latency reduction, and capacity planning.
- Investigate and resolve complex distributed systems issues across multiple layers of the infrastructure stack.
- Review architectural and system designs to ensure reliability, scalability, operational efficiency, and security best practices.
- Mentor engineers across teams and help establish reliability-focused engineering culture and operational standards.
- Build internal tools, integrations, and automation workflows using languages such as Python or Go to support platform operations and infrastructure improvements.
- 8–10 years of experience in Site Reliability Engineering, DevOps, Infrastructure Engineering, or related fields.
- Strong software engineering skills with hands-on experience developing production-grade applications or tooling in Python or Go.
- Deep expertise in distributed systems architecture, cloud-native environments, and service-oriented infrastructure design.
- Extensive experience with Kubernetes, container orchestration, Docker, and modern cloud infrastructure technologies.
- Proven ability to design and maintain advanced observability and monitoring ecosystems using tools such as Prometheus, Grafana, Datadog, or OpenTelemetry.
- Strong background in incident management, root cause analysis, troubleshooting, and operational excellence practices.
- Hands-on experience with Infrastructure as Code and automation tools such as Terraform, Pulumi, or similar technologies.
- Excellent written and verbal communication skills, with the ability to explain complex technical topics clearly across teams and stakeholders.
- Demonstrated leadership and mentoring experience working with engineers across multiple seniority levels.
- Comfortable working across the full infrastructure stack and solving highly complex technical challenges in fast-paced environments.
- Experience with Google Cloud Platform (GCP), high-throughput systems, startup environments, or technical content creation is considered a strong advantage.
- Competitive salary package with equity opportunities.
- Fully remote work environment across Europe.
- Flexible time off policy and paid holidays.
- Health, dental, vision, and life insurance coverage.
- Paid parental, medical, and caregiver leave programs.
- Short-term and long-term disability coverage.
- Monthly wellness stipend to support personal well-being.
- Autonomous and flexible work culture with strong ownership opportunities.
- Quarterly team gatherings and collaborative company events.
- Professional equipment and remote workspace support.
- Opportunity to work on globally scaled infrastructure challenges using modern cloud-native technologies.