Can I apply directly for this job on this page?

Yes, you can begin your application on this page using a quick form. You'll then be redirected to the employer's career site to complete the full application process.

What is the role of a Senior Site Reliability Engineer at Jobgether?

The Senior Site Reliability Engineer position at Jobgether is a Full-time or part-time position opportunity in the relevant field.

Where is this Senior Site Reliability Engineer job located?

United States, Other / Non-US, United States

What type of employment is offered for this Senior Site Reliability Engineer role?

Full-time or part-time position

What industry does this Senior Site Reliability Engineer position belong to?

This role spans multiple industries.

What is the expected salary for this Senior Site Reliability Engineer job?

Compensation will be discussed during the hiring process.

Senior Site Reliability Engineer job near me in United States, Other / Non-US at Jobgether

Senior Site Reliability Engineer

This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Senior Site Reliability Engineer in United States.

This is an exciting opportunity for a highly skilled Site Reliability Engineer to help build and scale the reliability foundation of a cutting-edge AI-driven platform. In this role, you will lead strategic reliability initiatives across complex cloud infrastructure, AI workloads, and developer enablement systems. You will work at the intersection of platform engineering, observability, automation, and AI operations, helping teams deliver resilient and scalable services with confidence. The environment is fast-paced, collaborative, and innovation-focused, offering strong technical ownership and leadership influence. Ideal candidates are passionate about cloud-native infrastructure, operational excellence, and enabling high-performing engineering teams. This role is fully remote within the United States and offers the chance to shape reliability practices for next-generation AI-powered systems.

Accountabilities:

Own and drive platform reliability initiatives, including defining and managing SLIs, SLOs, and error budgets across production services and AI-driven workloads.
Design and implement resilient infrastructure patterns for AI pipelines, including observability, failure detection, graceful degradation, and workload isolation.
Lead incident response processes, disaster recovery planning, and post-incident reviews focused on long-term operational improvements.
Partner closely with Software Engineering and AI Engineering teams to establish reliability standards, deployment best practices, and scalable CI/CD workflows.
Develop and maintain observability solutions using monitoring, tracing, logging, and telemetry tools to ensure visibility across services and AI operations.
Manage infrastructure as code, cloud cost optimization initiatives, and automation strategies to improve operational efficiency and scalability.
Build and enhance Internal Developer Platforms (IDP), service catalogs, and self-service tooling that empower engineering teams.
Mentor junior and intermediate engineers, contributing to technical growth, knowledge sharing, and engineering excellence across the organization.

Requirements:

Bachelor’s degree in Computer Science, Engineering, or a related technical field, or equivalent practical experience.
6–8 years of experience in Site Reliability Engineering, Platform Engineering, or DevOps with demonstrated technical leadership responsibilities.
Deep expertise with AWS services, Kubernetes, Docker, Terraform, GitOps methodologies, and cloud-native infrastructure patterns.
Strong experience with observability platforms, distributed tracing, monitoring systems, and operational tooling.
Proficiency in Python and/or Bash scripting, along with experience supporting microservices architectures and CI/CD pipelines.
Familiarity with Internal Developer Platform tools such as Backstage or similar solutions is highly desirable.
Experience supporting AI/ML infrastructure, LLM integrations, or agentic systems is considered a major asset.
Excellent analytical, communication, mentoring, and problem-solving skills, with the ability to navigate complex technical environments.
Experience with FinOps, disaster recovery planning, policy-as-code, or regulated environments is a plus.

Benefits:

Competitive salary range of approximately $149,100 – $157,800 USD
Comprehensive medical, dental, and vision coverage
401(k) matching program
Flexible vacation policy
Company-sponsored training and professional development opportunities
Annual wellness and fitness reimbursement programs
Inclusive and collaborative remote work environment
Opportunities for community involvement and charitable engagement
Access to wellness resources and employee support initiatives
Occasional travel opportunities for collaboration and team engagement.

How Jobgether works:

We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.

We appreciate your interest and wish you the best!

Why Apply Through Jobgether?

Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.

#LI-CL1

Senior Site Reliability Engineer in United States at Jobgether

Explore Related Opportunities

Job Description

Scan to Apply

Job Location

Frequently asked questions about this position