Staff Site Reliability Engineer in United States at Jobgether
Explore Related Opportunities
Job Description
This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Staff Site Reliability Engineer in the United States.
This senior-level engineering role is focused on designing, scaling, and operating highly available, cloud-native systems that support millions of connected users and IoT devices. The position plays a critical role in ensuring platform reliability, performance, and automation across large-scale AWS-based infrastructure. You will work at the intersection of software engineering and operations, helping build resilient systems that meet aggressive availability targets. The environment is highly collaborative, involving close partnership with product, engineering, and infrastructure teams. A strong emphasis is placed on automation, observability, and continuous improvement of system reliability. This is a high-impact role where your work directly supports the stability and growth of mission-critical services at scale.
- Design, build, and operate scalable cloud-native infrastructure and services supporting high-volume production systems.
- Collaborate with engineering and product teams to deliver reliable, event-driven architectures and new customer-facing features.
- Develop and enhance Infrastructure as Code (Terraform) and configuration management (Helm) frameworks to enable scalable self-service infrastructure.
- Identify, troubleshoot, and eliminate performance bottlenecks across AWS services and Kubernetes environments.
- Ensure and maintain high availability targets, including 99.99% uptime for customer-facing services.
- Improve monitoring, alerting, and observability systems to enable proactive incident prevention and faster resolution.
- Support continuous optimization of platform reliability, scalability, and automation practices.
- 8+ years of experience in Site Reliability Engineering, DevOps, or similar infrastructure-focused roles.
- Strong experience designing and operating large-scale production systems.
- Advanced knowledge of AWS services such as ALB/ELB, IAM, DynamoDB, SNS, EKS, and Fargate.
- Hands-on experience with Kubernetes-based environments and cloud-native architectures.
- Proficiency with Infrastructure as Code tools, especially Terraform and Helm.
- Strong scripting skills in Python, Bash, or similar languages; additional experience in Go or Ruby is a plus.
- Deep understanding of system reliability, scalability, observability, and incident response practices.
- Strong ownership mindset with the ability to drive cross-team technical initiatives to production.
- Excellent collaboration and communication skills in a fast-paced engineering environment.
- Competitive compensation ranging from $164,000 to $226,000 USD depending on experience and location.
- Comprehensive benefits including health, dental, vision, and pharmacy coverage.
- Paid time off, sick leave, and parental leave policies.
- Short-term and long-term disability insurance, plus life insurance coverage.
- Retirement savings plan with employer contributions (eligibility-based).
- Equity or restricted stock unit opportunities depending on role eligibility.
- Remote-friendly work environment across the United States.
- Strong focus on engineering excellence, learning, and career development in a high-scale environment.