What is the role of a Sr. Site Reliability Engineer at Jobgether?

The Sr. Site Reliability Engineer position at Jobgether is a Full-time or part-time position opportunity in the relevant field.

Where is this Sr. Site Reliability Engineer job located?

United States, Other / Non-US, United States

What type of employment is offered for this Sr. Site Reliability Engineer role?

Full-time or part-time position

What industry does this Sr. Site Reliability Engineer position belong to?

This role spans multiple industries.

What is the expected salary for this Sr. Site Reliability Engineer job?

Compensation will be discussed during the hiring process.

How can I apply for the Sr. Site Reliability Engineer position at Jobgether?

You can apply directly through the application link provided.

Sr. Site Reliability Engineer at Jobgether

Sr. Site Reliability Engineer

This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Sr. Site Reliability Engineer in United States.

This role provides a unique opportunity to ensure the stability, scalability, and reliability of critical systems in a fast-paced, cloud-focused environment. The Sr. Site Reliability Engineer will work across engineering, product, and operations teams to embed reliability practices into daily workflows, automate processes, and proactively prevent system issues. This position requires a balance of hands-on technical expertise and strategic thinking to drive infrastructure improvements, optimize operational efficiency, and maintain high service availability. The role offers exposure to modern cloud platforms, containerized environments, and large-scale distributed systems, while giving you a chance to influence reliability standards and incident response practices. Ideal candidates are problem-solvers who enjoy mentoring others, designing resilient systems, and improving operational processes. This position allows you to make a measurable impact on system performance, customer experience, and engineering culture.

Accountabilities:

Own and enhance the availability, durability, and performance of production services across all environments
Lead complex reliability projects from problem identification to resolution, ensuring high-quality technical ownership
Define and enforce service health standards, including SLIs, SLOs, and error budgets
Lead critical incident response and post-incident reviews, translating insights into long-term architectural improvements
Design and implement scalable automation, monitoring, logging, and alerting solutions to reduce manual effort
Build and maintain infrastructure-as-code, CI/CD pipelines, and operational tools to improve efficiency
Collaborate with engineering, product, and operations teams to embed reliability practices and guide resilient system design
Develop operational playbooks, runbooks, and documentation to support continuous improvement and knowledge sharing

Requirements:

Bachelor’s degree in Computer Science, Engineering, or related field, or equivalent experience
8+ years of progressive experience in site reliability, systems engineering, or operations
Expert-level Linux administration, advanced troubleshooting, and system security skills
Deep understanding of distributed systems, container orchestration (Kubernetes/Docker), and microservices architecture
Proficiency in scripting/programming languages such as Python, Go, or Bash
Experience with monitoring, logging, and alerting frameworks (Prometheus, Grafana, ELK, Catchpoint)
Strong familiarity with cloud platforms (AWS, GCP, or Azure) and Hashicorp tools (Terraform, Vault, Nomad)
Excellent problem-solving, collaboration, and communication skills, with a proactive approach to continuous improvement
Preferred: ITIL/OSS experience, SaaS or hyper-scale distributed system experience, and a history of mentoring teams on reliability best practices

Benefits:

Competitive salary in the range of $150,000–$200,000 USD, based on experience and location
Comprehensive healthcare coverage, including dental and vision for family members
401(k) plan with company matching and potential RSU grants
Flexible vacation policy and parental leave
Work-from-home support including equipment stipend
Learning and development programs to grow technical expertise and career trajectory
Culture that promotes work-life balance and collaborative problem-solving

Why Apply Through Jobgether?

We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.

We appreciate your interest and wish you the best!

Why Apply Through Jobgether?

Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.

#LI-CL1

Sr. Site Reliability Engineer at Jobgether – United States

Explore Related Opportunities

About This Position

Scan to Apply

Job Location

Frequently asked questions about this position