JobTarget Logo

Product Reliability Engineer in Germany at Jobgether

NewJob Function: Engineering
Jobgether
Germany, Germany
Posted on
New job! Apply early to increase your chances of getting hired.

Explore Related Opportunities

Job Description

Product Reliability Engineer

This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Product Reliability Engineer in Germany.

This role sits at the intersection of software engineering, site reliability, and customer-facing problem solving, focusing on ensuring that complex infrastructure software performs reliably in real-world, on-prem environments. You will work directly on high-impact production issues while also building the systems and tooling that prevent them from recurring. The environment is highly technical and fast-moving, requiring strong debugging instincts and the ability to operate across distributed systems and Kubernetes-based deployments. You will collaborate closely with engineering, product, and customer-facing teams to diagnose incidents, improve observability, and strengthen system resilience. Beyond incident response, you will play a key role in shaping test automation, deployment reliability, and upgrade stability. Your work will directly influence how reliably the product runs in diverse and often unpredictable customer infrastructures. This is a hands-on, deeply technical role with strong ownership over system reliability.

Accountabilities:
  • Partner with customers and internal teams to handle L2/L3 escalations, diagnosing and resolving complex issues related to deployment, upgrades, runtime behavior, and Kubernetes environments.
  • Drive end-to-end root cause analysis, reproducing issues, identifying failure patterns, and coordinating fixes with engineering teams.
  • Build and maintain diagnostic tooling such as health checks, support bundles, environment validation tools, and debugging utilities.
  • Develop and improve test automation infrastructure, reducing flakiness, improving CI stability, and strengthening integration and end-to-end testing environments.
  • Define and maintain performance baselines and regression tests to detect scalability and latency issues early in the development cycle.
  • Improve installation, deployment, and upgrade reliability by identifying recurring failure modes and implementing preventative solutions.
  • Write and maintain production-quality code in Python, Go, or Rust for reliability tools, automation, and product improvements.
Requirements:
  • 4–7 years of experience in production engineering, SRE, platform engineering, or similar roles focused on system reliability and customer escalation handling.
  • Strong software engineering fundamentals, including debugging, testing, system design, and writing maintainable production-grade code.
  • Hands-on experience with Kubernetes, including troubleshooting workloads, networking, storage, RBAC, and multi-environment deployments.
  • Strong observability and troubleshooting skills using logs, metrics, and traces in distributed systems.
  • Proficiency in at least one programming language such as Python, Go, or Rust.
  • Strong analytical and communication skills, with the ability to break down complex technical issues and explain findings clearly.
  • Experience working in remote, distributed teams with strong async collaboration and self-direction.
  • Collaborative mindset with experience working across engineering, product, and customer-facing functions.
Benefits:
  • Competitive compensation package aligned with experience, including salary and potential equity (details shared during the hiring process).
  • Comprehensive health, dental, and vision coverage depending on location.
  • Flexible PTO policy supporting work-life balance.
  • Home office setup support for remote productivity.
  • Professional development budget for learning, training, and conferences.
  • Opportunity to work in a fully remote, distributed environment with global collaboration.
  • Participation in impactful work on production-grade infrastructure used by complex enterprise environments.
  • Equity participation in a growing open-source-driven company.
How Jobgether works:
We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.
We appreciate your interest and wish you the best!
Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.
#LI-CL1

Job Location

Germany, Germany

Frequently asked questions about this position

Continue to apply
Enter your email to continue. You’ll be redirected to the employer’s application.
By clicking Continue, you understand and agree to JobTarget's Terms of Use and Privacy Policy.