What is the role of a Principal Site Reliability Engineer at Jobgether?

The Principal Site Reliability Engineer position at Jobgether is a Full-time or part-time position opportunity in the Engineering field.

Where is this Principal Site Reliability Engineer job located?

United States, Other / Non-US, United States

What type of employment is offered for this Principal Site Reliability Engineer role?

Full-time or part-time position

What is the expected salary for this Principal Site Reliability Engineer job?

Compensation will be discussed during the hiring process.

How can I apply for the Principal Site Reliability Engineer position at Jobgether?

You can apply directly through the application link provided.

Principal Site Reliability Engineer at Jobgether

Principal Site Reliability Engineer

This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Principal Site Reliability Engineer in United States.

This role offers the opportunity to lead reliability strategy for a complex, high-scale platform in a mission-driven, data-intensive environment. The Principal Site Reliability Engineer will own cross-cutting initiatives to ensure system reliability, scalability, and security while reducing operational toil. This position requires setting service-level objectives (SLOs), leading incident management, and designing automated workflows and infrastructure patterns. Success involves influencing architecture decisions, mentoring engineering teams, and establishing organization-wide operational standards. The role combines technical leadership with hands-on execution, directly impacting the delivery of secure, high-quality services for critical workflows. This is an ideal opportunity for someone passionate about system reliability, operational excellence, and cloud-based infrastructure.

Accountabilities:

Serve as a technical leader for reliability across multiple domains, setting standards and maintaining hands-on involvement where necessary
Define and maintain SLOs, error budgets, and reliability KPIs aligned to customer journeys
Lead complex incident management, drive post-incident reviews, and implement remediation strategies
Design and implement automation and self-service workflows to reduce operational toil and risk
Scale infrastructure and platform operations using GitOps (Argo CD), Crossplane, Terraform, and cloud services (AWS)
Conduct operational readiness and reliability reviews for new features and architectural changes
Mentor Staff and Senior engineers, fostering best practices in reliability, performance, and security

Requirements:

8+ years of experience in SRE, platform engineering, systems engineering, or similar roles supporting production services at scale
Demonstrated principal-level impact through cross-team initiatives and architecture influence
Expertise in Kubernetes operations, troubleshooting, and safe deployment strategies
Strong experience with GitOps workflows (Argo CD) and automation using Argo Workflows
Infrastructure provisioning and orchestration skills with Crossplane and Terraform
Deep AWS knowledge (IAM, networking, compute, storage, observability) and understanding of cloud failure modes
Proficiency in Python for building automation and reliability improvements
Strong incident management experience with measurable improvements in availability, MTTR, or operational maturity
Excellent communication skills, translating technical trade-offs for diverse stakeholders

Benefits:

Flexible, remote-friendly work environment
Opportunities for personal and professional development
Collaborative and mission-driven culture
Participation in a talented, diverse, and energized engineering community
Programs supporting employee growth, well-being, and engagement

Why Apply Through Jobgether?

We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.

We appreciate your interest and wish you the best!

Why Apply Through Jobgether?

Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.

#LI-CL1

Principal Site Reliability Engineer at Jobgether – United States

Explore Related Opportunities

About This Position

Scan to Apply

Job Location

Frequently asked questions about this position