Can I apply directly for this job on this page?

Yes, you can begin your application on this page using a quick form. You'll then be redirected to the employer's career site to complete the full application process.

What is the role of a Staff Site Reliability Engineer at Jobgether?

The Staff Site Reliability Engineer position at Jobgether is a Full-time or part-time position opportunity in the Engineering field.

Where is this Staff Site Reliability Engineer job located?

United States, Other / Non-US, United States

What type of employment is offered for this Staff Site Reliability Engineer role?

Full-time or part-time position

What is the expected salary for this Staff Site Reliability Engineer job?

Compensation will be discussed during the hiring process.

Staff Site Reliability Engineer job near me in United States, Other / Non-US at Jobgether

Staff Site Reliability Engineer

This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Staff Site Reliability Engineer in the United States.

This senior-level engineering role is focused on designing, scaling, and operating highly available, cloud-native systems that support millions of connected users and IoT devices. The position plays a critical role in ensuring platform reliability, performance, and automation across large-scale AWS-based infrastructure. You will work at the intersection of software engineering and operations, helping build resilient systems that meet aggressive availability targets. The environment is highly collaborative, involving close partnership with product, engineering, and infrastructure teams. A strong emphasis is placed on automation, observability, and continuous improvement of system reliability. This is a high-impact role where your work directly supports the stability and growth of mission-critical services at scale.

Accountabilities:

Design, build, and operate scalable cloud-native infrastructure and services supporting high-volume production systems.
Collaborate with engineering and product teams to deliver reliable, event-driven architectures and new customer-facing features.
Develop and enhance Infrastructure as Code (Terraform) and configuration management (Helm) frameworks to enable scalable self-service infrastructure.
Identify, troubleshoot, and eliminate performance bottlenecks across AWS services and Kubernetes environments.
Ensure and maintain high availability targets, including 99.99% uptime for customer-facing services.
Improve monitoring, alerting, and observability systems to enable proactive incident prevention and faster resolution.
Support continuous optimization of platform reliability, scalability, and automation practices.

Requirements:

8+ years of experience in Site Reliability Engineering, DevOps, or similar infrastructure-focused roles.
Strong experience designing and operating large-scale production systems.
Advanced knowledge of AWS services such as ALB/ELB, IAM, DynamoDB, SNS, EKS, and Fargate.
Hands-on experience with Kubernetes-based environments and cloud-native architectures.
Proficiency with Infrastructure as Code tools, especially Terraform and Helm.
Strong scripting skills in Python, Bash, or similar languages; additional experience in Go or Ruby is a plus.
Deep understanding of system reliability, scalability, observability, and incident response practices.
Strong ownership mindset with the ability to drive cross-team technical initiatives to production.
Excellent collaboration and communication skills in a fast-paced engineering environment.

Benefits:

Competitive compensation ranging from $164,000 to $226,000 USD depending on experience and location.
Comprehensive benefits including health, dental, vision, and pharmacy coverage.
Paid time off, sick leave, and parental leave policies.
Short-term and long-term disability insurance, plus life insurance coverage.
Retirement savings plan with employer contributions (eligibility-based).
Equity or restricted stock unit opportunities depending on role eligibility.
Remote-friendly work environment across the United States.
Strong focus on engineering excellence, learning, and career development in a high-scale environment.

How Jobgether works:

We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.

We appreciate your interest and wish you the best!

Why Apply Through Jobgether?

Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.

#LI-CL1

Staff Site Reliability Engineer in United States at Jobgether

Explore Related Opportunities

Job Description

Scan to Apply

Job Location

Frequently asked questions about this position