Sr. Site Reliability Engineer at Jobgether – United States
Explore Related Opportunities
About This Position
This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Sr. Site Reliability Engineer in United States.
This role sits at the core of building and maintaining highly reliable, scalable, and secure infrastructure that powers mission-critical insurance technology platforms. You will work across cloud and on-prem environments, ensuring systems are resilient, observable, and optimized for performance at scale. Operating in a collaborative, engineering-driven environment, you will partner closely with development, platform, and product teams to design and evolve robust distributed systems. The position blends hands-on infrastructure engineering with automation, reliability strategy, and operational excellence. You will play a key role in improving deployment workflows, strengthening system resilience, and enabling continuous delivery practices. With a strong focus on observability and incident response, you will help ensure services remain highly available and performant. This is an opportunity to directly impact platform stability while contributing to modern DevOps and SRE practices in a fast-evolving technical ecosystem.
In this role, you will ensure the reliability, scalability, and performance of complex systems across cloud and hybrid environments while driving automation and operational maturity. You will design, build, and maintain infrastructure and tooling that supports high availability and efficient software delivery.
- Develop and maintain Infrastructure as Code (IaC) using tools such as Terraform, Terraform CDK, Packer, and Ansible to automate provisioning and configuration across environments
- Collaborate with engineering teams to design scalable, fault-tolerant, and high-performance distributed systems
- Implement and manage observability and monitoring solutions (e.g., Datadog), ensuring adherence to SLIs, SLOs, and SLAs
- Build and optimize CI/CD pipelines using GitHub Actions, GitLab, and related tools to support reliable and efficient deployments
- Manage Kubernetes environments, including Helm and ArgoCD, for container orchestration and application delivery
- Drive automation of operational processes using scripting languages such as Python, Go, Bash, and PowerShell
- Support incident response, troubleshooting, and on-call rotations to ensure system stability and rapid issue resolution
- Design disaster recovery and high-availability strategies across cloud and hybrid infrastructure
- Collaborate with vendors and cross-functional teams to integrate external tools and services into the platform ecosystem
- Document systems, workflows, and operational standards to ensure knowledge sharing and consistency across teams
You bring strong experience in site reliability engineering, DevOps, or infrastructure-focused roles, with a deep understanding of distributed systems and production-scale environments. You are comfortable working across cloud platforms, automation tooling, and containerized architectures, and you thrive in fast-paced, collaborative engineering environments.
- 5+ years of experience in DevOps, SRE, or Infrastructure Engineering roles
- Strong expertise in incident management, troubleshooting, and production system reliability
- Hands-on experience with cloud platforms such as AWS, GCP, or Azure
- Strong proficiency in Infrastructure as Code tools, especially Terraform and related frameworks
- Experience with Kubernetes, including Helm charts and ArgoCD for deployment orchestration
- Proficiency in scripting and programming languages such as Python, Go, Bash, and PowerShell
- Familiarity with CI/CD pipelines and version control systems like GitHub and GitLab
- Experience with observability tools such as Datadog and logging/monitoring best practices
- Knowledge of both Linux and Windows system administration
- Strong communication skills with the ability to collaborate across technical and non-technical teams
- Ability to prioritize effectively, troubleshoot under pressure, and drive operational improvements
- Experience mentoring engineers and contributing to team knowledge sharing is a plus
- Competitive salary ranging from $65,000 to $160,000 depending on experience and qualifications
- Bonus and additional compensation eligibility based on role
- Comprehensive medical, dental, and vision insurance coverage
- Paid vacation, holidays, health & wellness days, and a birthday bonus day
- Flexible remote work arrangement across North America
- 401(k) retirement savings plan and other financial benefits
- Strong learning and career development culture with mentorship opportunities
- Collaborative, inclusive engineering environment focused on innovation and reliability
- Work-life balance supported through flexible scheduling and remote-first practices