Senior Manager, Site Reliability Engineering at Jobgether – United States
Explore Related Opportunities
About This Position
This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Senior Manager, Site Reliability Engineering in the United States.
This role is a high-impact leadership position responsible for ensuring the reliability, scalability, and performance of complex cloud-based SaaS services. You will lead a team of Site Reliability Engineers, guiding operational excellence, infrastructure automation, and service availability for large-scale, mission-critical systems. The position involves collaborating closely with development, security, and cloud services teams to optimize deployments, maintain robust monitoring, and support rapid innovation. You will drive strategic decisions on architecture, automation, and tooling, ensuring high uptime and a seamless user experience. This is an opportunity to combine technical expertise with leadership in a fast-paced, collaborative environment where reliability and innovation are equally valued.
- Lead and mentor a team of 8–10 Site Reliability Engineers, fostering professional growth and operational excellence
- Define, measure, and report on Service Level Objectives (SLOs) and Service Level Indicators (SLIs) to ensure adherence to high uptime targets
- Collaborate with development, security, and cloud teams to align infrastructure and deployment strategies with organizational goals
- Drive completion of large-scale projects, coordinating with multiple teams to meet deadlines efficiently
- Conduct capacity planning, performance analysis, and infrastructure scaling using automation and Infrastructure as Code (IaC)
- Resolve complex system behavior, performance, and application issues while ensuring observability across multiple datacenters
- Establish and enforce best practices for automated deployments, monitoring, and incident response
- Minimum six years leading software-focused SRE teams of 8–10 engineers
- Proven experience in global-scale operations and strategic decision-making for infrastructure and tooling choices
- Strong proficiency with public cloud provisioning using IaC tools like CloudFormation or Terraform
- Hands-on experience with scripting and programming languages such as Python, Ruby, Bash, or Go
- Expertise with containerization and orchestration platforms including Docker and Kubernetes
- Familiarity with Git in large-scale team environments and security design principles
- Experience managing high-volume, mission-critical production services and IP networking concepts
- Preferred experience with observability tools (NewRelic, Grafana, Splunk, Cloudwatch) and DevOps automation (Jenkins, Artifactory, Spacelift)
- Competitive salary range: $166,737–$235,000 (Colorado)
- Generous paid time off and holiday schedule
- Parental leave and progressive healthcare options
- Retirement plans with company contributions
- Flexible, collaborative, and inclusive work environment
- Opportunities for professional development, training, and education reimbursement
- Global volunteering, community initiatives, and team-building events