JobTarget Logo

Staff Site Reliability Engineer in Romania at Jobgether

NewJob Function: Engineering
Jobgether
Romania, Romania
Posted on
New job! Apply early to increase your chances of getting hired.

Explore Related Opportunities

Job Description

Staff Site Reliability Engineer

This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Staff Site Reliability Engineer in Romania.

Join a highly collaborative engineering environment where reliability, scalability, and automation are central to delivering world-class developer experiences at global scale. In this role, you will help design and maintain resilient infrastructure systems supporting millions of users worldwide, while driving operational excellence across distributed cloud environments. You will work closely with engineering and infrastructure teams to improve observability, optimize performance, and build automation that reduces operational complexity. This position offers the opportunity to lead incident response initiatives, shape reliability standards, and influence infrastructure strategy across the organization. It is an ideal opportunity for a senior SRE professional who thrives in fast-moving environments and enjoys solving complex distributed systems challenges while mentoring teams and promoting engineering best practices.

Accountabilities:
  • Design and implement comprehensive observability solutions, including monitoring, logging, tracing, dashboards, and alerting systems to improve visibility into infrastructure health and performance.
  • Define, track, and improve Service Level Objectives (SLOs) and Service Level Indicators (SLIs) in collaboration with engineering and product teams.
  • Lead high-severity incident response efforts, coordinate troubleshooting activities, conduct blameless post-mortems, and implement long-term preventive solutions.
  • Build and maintain infrastructure automation and Infrastructure as Code solutions using tools such as Terraform or Pulumi.
  • Develop self-healing systems and automation processes that reduce operational overhead and improve system resilience.
  • Optimize large-scale Kubernetes and cloud-native deployments, focusing on scalability, reliability, latency reduction, and capacity planning.
  • Investigate and resolve complex distributed systems issues across multiple layers of the infrastructure stack.
  • Review architectural and system designs to ensure reliability, scalability, operational efficiency, and security best practices.
  • Mentor engineers across teams and help establish reliability-focused engineering culture and operational standards.
  • Build internal tools, integrations, and automation workflows using languages such as Python or Go to support platform operations and infrastructure improvements.
Requirements:
  • 8–10 years of experience in Site Reliability Engineering, DevOps, Infrastructure Engineering, or related fields.
  • Strong software engineering skills with hands-on experience developing production-grade applications or tooling in Python or Go.
  • Deep expertise in distributed systems architecture, cloud-native environments, and service-oriented infrastructure design.
  • Extensive experience with Kubernetes, container orchestration, Docker, and modern cloud infrastructure technologies.
  • Proven ability to design and maintain advanced observability and monitoring ecosystems using tools such as Prometheus, Grafana, Datadog, or OpenTelemetry.
  • Strong background in incident management, root cause analysis, troubleshooting, and operational excellence practices.
  • Hands-on experience with Infrastructure as Code and automation tools such as Terraform, Pulumi, or similar technologies.
  • Excellent written and verbal communication skills, with the ability to explain complex technical topics clearly across teams and stakeholders.
  • Demonstrated leadership and mentoring experience working with engineers across multiple seniority levels.
  • Comfortable working across the full infrastructure stack and solving highly complex technical challenges in fast-paced environments.
  • Experience with Google Cloud Platform (GCP), high-throughput systems, startup environments, or technical content creation is considered a strong advantage.
Benefits:
  • Competitive salary package with equity opportunities.
  • Fully remote work environment across Europe.
  • Flexible time off policy and paid holidays.
  • Health, dental, vision, and life insurance coverage.
  • Paid parental, medical, and caregiver leave programs.
  • Short-term and long-term disability coverage.
  • Monthly wellness stipend to support personal well-being.
  • Autonomous and flexible work culture with strong ownership opportunities.
  • Quarterly team gatherings and collaborative company events.
  • Professional equipment and remote workspace support.
  • Opportunity to work on globally scaled infrastructure challenges using modern cloud-native technologies.
How Jobgether works:
We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.
We appreciate your interest and wish you the best!
Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.
#LI-CL1

Job Location

Romania, Romania

Frequently asked questions about this position

Continue to apply
Enter your email to continue. You’ll be redirected to the employer’s application.
By clicking Continue, you understand and agree to JobTarget's Terms of Use and Privacy Policy.