JobTarget Logo

Senior Site Reliability Engineer in United States at Jobgether

New
Jobgether
United States, United States
Posted on
New job! Apply early to increase your chances of getting hired.

Explore Related Opportunities

Job Description

Senior Site Reliability Engineer

This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Senior Site Reliability Engineer in United States.

This is an exciting opportunity for a highly skilled Site Reliability Engineer to help build and scale the reliability foundation of a cutting-edge AI-driven platform. In this role, you will lead strategic reliability initiatives across complex cloud infrastructure, AI workloads, and developer enablement systems. You will work at the intersection of platform engineering, observability, automation, and AI operations, helping teams deliver resilient and scalable services with confidence. The environment is fast-paced, collaborative, and innovation-focused, offering strong technical ownership and leadership influence. Ideal candidates are passionate about cloud-native infrastructure, operational excellence, and enabling high-performing engineering teams. This role is fully remote within the United States and offers the chance to shape reliability practices for next-generation AI-powered systems.

Accountabilities:
  • Own and drive platform reliability initiatives, including defining and managing SLIs, SLOs, and error budgets across production services and AI-driven workloads.
  • Design and implement resilient infrastructure patterns for AI pipelines, including observability, failure detection, graceful degradation, and workload isolation.
  • Lead incident response processes, disaster recovery planning, and post-incident reviews focused on long-term operational improvements.
  • Partner closely with Software Engineering and AI Engineering teams to establish reliability standards, deployment best practices, and scalable CI/CD workflows.
  • Develop and maintain observability solutions using monitoring, tracing, logging, and telemetry tools to ensure visibility across services and AI operations.
  • Manage infrastructure as code, cloud cost optimization initiatives, and automation strategies to improve operational efficiency and scalability.
  • Build and enhance Internal Developer Platforms (IDP), service catalogs, and self-service tooling that empower engineering teams.
  • Mentor junior and intermediate engineers, contributing to technical growth, knowledge sharing, and engineering excellence across the organization.
Requirements:
  • Bachelor’s degree in Computer Science, Engineering, or a related technical field, or equivalent practical experience.
  • 6–8 years of experience in Site Reliability Engineering, Platform Engineering, or DevOps with demonstrated technical leadership responsibilities.
  • Deep expertise with AWS services, Kubernetes, Docker, Terraform, GitOps methodologies, and cloud-native infrastructure patterns.
  • Strong experience with observability platforms, distributed tracing, monitoring systems, and operational tooling.
  • Proficiency in Python and/or Bash scripting, along with experience supporting microservices architectures and CI/CD pipelines.
  • Familiarity with Internal Developer Platform tools such as Backstage or similar solutions is highly desirable.
  • Experience supporting AI/ML infrastructure, LLM integrations, or agentic systems is considered a major asset.
  • Excellent analytical, communication, mentoring, and problem-solving skills, with the ability to navigate complex technical environments.
  • Experience with FinOps, disaster recovery planning, policy-as-code, or regulated environments is a plus.
Benefits:
  • Competitive salary range of approximately $149,100 – $157,800 USD
  • Comprehensive medical, dental, and vision coverage
  • 401(k) matching program
  • Flexible vacation policy
  • Company-sponsored training and professional development opportunities
  • Annual wellness and fitness reimbursement programs
  • Inclusive and collaborative remote work environment
  • Opportunities for community involvement and charitable engagement
  • Access to wellness resources and employee support initiatives
  • Occasional travel opportunities for collaboration and team engagement.
How Jobgether works:
We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.
We appreciate your interest and wish you the best!
Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.
#LI-CL1

Job Location

United States, United States

Frequently asked questions about this position

Continue to apply
Enter your email to continue. You’ll be redirected to the employer’s application.
By clicking Continue, you understand and agree to JobTarget's Terms of Use and Privacy Policy.