Senior Site Reliability Engineer in United States at Jobgether
Explore Related Opportunities
Job Description
This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Senior Site Reliability Engineer in the United States.
This senior-level role is focused on owning and evolving the stability, performance, and scalability of complex hybrid and multi-cloud infrastructure environments. You will operate at the intersection of cloud engineering, platform reliability, and infrastructure automation, supporting mission-critical systems across enterprise-scale Nutanix, AWS, and GCP ecosystems. The position requires deep technical expertise in SRE practices, infrastructure-as-code, and cloud-native architectures, with a strong emphasis on automation and proactive reliability engineering. You will act as a key escalation point for high-severity incidents, ensuring rapid resolution and long-term prevention strategies. The role involves designing resilient systems, improving observability, and driving continuous optimization across distributed environments. You will collaborate closely with engineering, security, and operations teams in a fast-paced, highly technical environment. This is a high-impact position where your work directly influences platform reliability and business continuity.
- Lead the design, deployment, and maintenance of hybrid and multi-cloud infrastructure across Nutanix, AWS, and GCP, ensuring high availability, scalability, and resilience.
- Drive automation initiatives using Python, PowerShell, Bash, and Terraform to improve infrastructure provisioning, monitoring, and operational efficiency.
- Own advanced Nutanix platform operations including cluster management, disaster recovery design, performance tuning, and troubleshooting at L3 level.
- Architect and maintain cloud-native solutions including networking (VPC, VPN, transit architectures), identity management, and multi-account governance.
- Implement and optimize CI/CD pipelines, infrastructure-as-code frameworks, and containerized workloads across Kubernetes, EKS, and GKE environments.
- Lead critical incident response, root cause analysis, and long-term remediation strategies for complex system failures.
- Enhance observability through centralized logging, monitoring, and SIEM integrations across cloud and on-prem environments.
- Ensure security, compliance, and operational best practices across all infrastructure layers.
- 8–12+ years of infrastructure engineering experience, including 8+ years working with Nutanix HCI and enterprise cloud platforms (AWS and/or GCP).
- Strong expertise in scripting and automation (Python, Bash, PowerShell) and infrastructure-as-code tools (Terraform, CloudFormation).
- Deep knowledge of Kubernetes and container orchestration platforms (EKS, GKE, ECS).
- Proven experience managing hybrid cloud environments, disaster recovery architectures, and large-scale production systems.
- Strong understanding of networking concepts (TCP/IP, VLANs, routing, load balancing, VPNs) and cloud security practices.
- Experience with L3 incident management, troubleshooting complex distributed systems, and performance optimization.
- Familiarity with ITIL practices, compliance frameworks, and enterprise governance models.
- Excellent communication skills with the ability to translate complex technical issues into clear business impact.
- Ability to operate effectively under pressure and in on-call rotations.
- Bachelor’s or Master’s degree in Computer Science, Information Technology, or related field (or equivalent experience).
- Competitive salary based on location tier (up to approximately $141,000 – $227,000 USD annually)
- Equity participation and performance-based bonus opportunities
- Comprehensive health coverage including medical, dental, vision, disability, and life insurance
- 401(k) retirement plan with company matching contributions
- Flexible work arrangements and remote-friendly environment
- Employee wellness programs, legal support, and assistance services
- Cell phone subsidy, commuter benefits, and additional employee discounts
- Ongoing learning opportunities and access to advanced technical certifications
- Inclusive, collaborative, and highly skilled engineering culture