Product Reliability Engineer in Switzerland at Jobgether
Explore Related Opportunities
Job Description
This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Product Reliability Engineer in Switzerland.
This role sits at the intersection of software engineering, site reliability, and customer-facing problem solving, focusing on ensuring that complex infrastructure software performs reliably in real-world, on-prem environments. You will work directly on high-impact production issues while also building the systems and tooling that prevent them from recurring. The environment is highly technical and fast-moving, requiring strong debugging instincts and the ability to operate across distributed systems and Kubernetes-based deployments. You will collaborate closely with engineering, product, and customer-facing teams to diagnose incidents, improve observability, and strengthen system resilience. Beyond incident response, you will play a key role in shaping test automation, deployment reliability, and upgrade stability. Your work will directly influence how reliably the product runs in diverse and often unpredictable customer infrastructures. This is a hands-on, deeply technical role with strong ownership over system reliability.
- Partner with customers and internal teams to handle L2/L3 escalations, diagnosing and resolving complex issues related to deployment, upgrades, runtime behavior, and Kubernetes environments.
- Drive end-to-end root cause analysis, reproducing issues, identifying failure patterns, and coordinating fixes with engineering teams.
- Build and maintain diagnostic tooling such as health checks, support bundles, environment validation tools, and debugging utilities.
- Develop and improve test automation infrastructure, reducing flakiness, improving CI stability, and strengthening integration and end-to-end testing environments.
- Define and maintain performance baselines and regression tests to detect scalability and latency issues early in the development cycle.
- Improve installation, deployment, and upgrade reliability by identifying recurring failure modes and implementing preventative solutions.
- Write and maintain production-quality code in Python, Go, or Rust for reliability tools, automation, and product improvements.
- 4–7 years of experience in production engineering, SRE, platform engineering, or similar roles focused on system reliability and customer escalation handling.
- Strong software engineering fundamentals, including debugging, testing, system design, and writing maintainable production-grade code.
- Hands-on experience with Kubernetes, including troubleshooting workloads, networking, storage, RBAC, and multi-environment deployments.
- Strong observability and troubleshooting skills using logs, metrics, and traces in distributed systems.
- Proficiency in at least one programming language such as Python, Go, or Rust.
- Strong analytical and communication skills, with the ability to break down complex technical issues and explain findings clearly.
- Experience working in remote, distributed teams with strong async collaboration and self-direction.
- Collaborative mindset with experience working across engineering, product, and customer-facing functions.
- Competitive compensation package aligned with experience, including salary and potential equity (details shared during the hiring process).
- Comprehensive health, dental, and vision coverage depending on location.
- Flexible PTO policy supporting work-life balance.
- Home office setup support for remote productivity.
- Professional development budget for learning, training, and conferences.
- Opportunity to work in a fully remote, distributed environment with global collaboration.
- Participation in impactful work on production-grade infrastructure used by complex enterprise environments.
- Equity participation in a growing open-source-driven company.