Senior Site Reliability Engineer at Jobgether – United States
Explore Related Opportunities
About This Position
This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Senior Site Reliability Engineer in the United States.
This role offers a unique opportunity to ensure the reliability, scalability, and performance of critical platform services in a fast-paced, technology-driven environment. The Senior Site Reliability Engineer (SRE) will combine software engineering expertise with operational excellence to automate processes, improve observability, and reduce operational risk across the platform. You will collaborate closely with development, DevOps, release engineering, and security teams to embed reliability and security best practices throughout the software lifecycle. This position emphasizes proactive problem-solving, automation, and continuous improvement while providing mentorship to peers and contributing to high-impact projects. The role is ideal for someone who thrives on solving complex technical challenges while shaping the platform’s resilience and scalability.
As a Senior Site Reliability Engineer, you will be responsible for maintaining and improving platform reliability while enabling scalable operations:
Define and maintain Service Level Objectives (SLOs), Service Level Indicators (SLIs), and error budgets for critical services.
Lead capacity planning, performance tuning, design reviews, and disaster recovery exercises to validate platform resilience.
Automate infrastructure provisioning, patching, and operational tasks using Terraform, Ansible, and CI/CD pipelines to eliminate manual processes.
Partner with security teams to enforce compliance (SOC2, CIS benchmarks), implement least-privileged IAM policies, and maintain hardened, secure systems.
Serve as Tier-2 escalation during incidents, lead root cause analysis, and continuously improve incident response playbooks and on-call processes.
Identify repetitive operational tasks and implement automation or self-service modules to reduce toil and improve developer productivity.
Measure system performance, track reliability metrics, and collaborate with leadership to drive iterative improvements.
The ideal candidate combines hands-on technical expertise with strong problem-solving skills and a focus on automation and reliability:
Bachelor’s degree in Computer Science, Engineering, or related field.
Minimum of 5 years of experience in Site Reliability Engineering, DevOps, or Systems Engineering roles.
Strong experience with AWS multi-account environments, Terraform, Ansible, CI/CD tools (GitHub Actions, Bitbucket, Jenkins, AWS CodeBuild/CodePipeline), and observability platforms (New Relic, CloudWatch).
Background with containerized environments (ECS, Fargate, EKS) and resilient system architectures.
Preferred certifications: AWS DevOps Engineer or Solutions Architect, Kubernetes, or SRE/DevOps practitioner certifications.
Excellent analytical, troubleshooting, and problem-solving abilities.
Strong collaboration skills to work effectively with cross-functional teams, mentor peers, and contribute to continuous improvement.
This role provides a comprehensive benefits package designed to support health, growth, and work-life balance:
Competitive salary range: USD $120,000 – $125,000 per year.
Day-one medical, dental, vision coverage with flexible spending options (HSA/FSA).
401(k) with company match available from day one.
Paid sick leave, volunteer time, and parental leave options.
Employer-paid life and disability insurance.
Wellbeing on Demand program to support personal health and wellness.
Flexible work environment with remote opportunities and casual dress code.