Can I apply directly for this job on this page?

Yes, you can begin your application on this page using a quick form. You'll then be redirected to the employer's career site to complete the full application process.

What is the role of a Sr Site Reliability Engineer at Jobgether?

The Sr Site Reliability Engineer position at Jobgether is a Full-time or part-time position opportunity in the Engineering field.

Where is this Sr Site Reliability Engineer job located?

United States, Other / Non-US, United States

What type of employment is offered for this Sr Site Reliability Engineer role?

Full-time or part-time position

What is the expected salary for this Sr Site Reliability Engineer job?

Compensation will be discussed during the hiring process.

Sr Site Reliability Engineer job near me in United States, Other / Non-US at Jobgether

Sr Site Reliability Engineer

This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for a Sr Site Reliability Engineer based in the United States.

This role sits at the core of a large-scale, cloud-based SaaS platform that supports millions of users in the education sector, where reliability and performance directly impact learning outcomes worldwide. You will be responsible for ensuring the availability, scalability, and security of complex distributed systems operating in production environments. The position blends hands-on engineering with strategic influence over SRE practices, observability, and infrastructure modernization. You will work closely with engineering, security, and product teams to reduce downtime, improve system resilience, and optimize deployment pipelines. The environment is fast-paced and highly collaborative, requiring strong problem-solving skills and a proactive approach to incident prevention and response. This is an opportunity to shape SRE maturity while contributing to mission-driven technology at global scale.

Accountabilities:

Drive reliability, availability, and observability improvements across large-scale distributed systems supporting cloud-based applications.
Design, implement, and maintain infrastructure-as-code solutions using tools such as Terraform and CloudFormation.
Support and enhance CI/CD pipelines to ensure efficient, secure, and reliable software delivery.
Monitor production systems, investigate incidents, and lead resolution efforts for critical outages and performance issues.
Collaborate with engineering, security, and operations teams to identify root causes and implement long-term reliability improvements.
Conduct disaster recovery planning and exercises to validate system resilience and business continuity readiness.
Contribute to on-call rotations and provide support for off-hours incidents, deployments, and escalations.
Explore and integrate AI-driven tools to improve SRE workflows, monitoring, alerting, and incident response efficiency.
Mentor peers and contribute to building a strong engineering culture through technical guidance and feedback.

Requirements:

5+ years of experience in Site Reliability Engineering or related infrastructure/DevOps roles.
Strong experience managing production cloud environments, preferably AWS with Kubernetes (EKS) at scale.
Hands-on expertise with infrastructure-as-code and configuration tools such as Terraform, Docker, and Ansible.
Proficiency in at least one programming or scripting language (Python, Java, .NET, JavaScript, or similar).
Experience building and maintaining CI/CD pipelines in modern engineering environments.
Strong understanding of monitoring, logging, alerting, and observability best practices.
Experience with on-call rotations and incident management in production environments.
Solid understanding of agile development methodologies and cross-functional collaboration.
Strong analytical and troubleshooting skills with a focus on reliability and system performance.
Nice to have: experience with SLO/SLI/SLA frameworks, DR exercises, HPC environments, or tools such as Datadog, New Relic, Grafana, PagerDuty, or GitLab/GitHub pipelines.
Nice to have: exposure to AI-assisted engineering or automation tools in production SRE workflows.

Benefits:

Competitive compensation package including base salary, typically ranging around $109,500 to $150,550 USD, plus potential performance-based incentives.
Comprehensive medical, dental, and vision insurance coverage.
401(k) and Roth 401(k) retirement plans with company matching.
Generous paid time off, including vacation, sick leave, and 12 paid holidays.
Paid parental leave and family support programs.
Health savings accounts (HSA) and flexible spending accounts (FSA).
Tuition reimbursement and continuous learning opportunities.
Remote-first flexibility and support for work-life balance.
Wellness programs and employee assistance resources.

How Jobgether works:

We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.

We appreciate your interest and wish you the best!

Why Apply Through Jobgether?

Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.

#LI-CL1

Sr Site Reliability Engineer in United States at Jobgether

Explore Related Opportunities

Job Description

Scan to Apply

Job Location

Frequently asked questions about this position