Site Reliability Engineer in United States at Jobgether
Explore Related Opportunities
Job Description
This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for a Site Reliability Engineer based in United States.
This role focuses on building and maintaining highly available, secure, and performant cloud platforms that power critical customer-facing services. You will be responsible for ensuring system reliability through automation, observability, and strong engineering practices across distributed cloud environments.
You will work closely with product and engineering teams to design scalable infrastructure and improve system resilience in a fast-paced cloud services environment.
The role combines hands-on cloud engineering with operational excellence, including incident response, monitoring, and continuous improvement of production systems.
You will play a key part in shaping reliability standards, deployment practices, and infrastructure automation across the organization.
This is a highly collaborative role requiring strong technical depth, problem-solving skills, and a proactive mindset toward system health and performance.
You will also contribute to mentoring peers and improving engineering practices within a small, agile, and technically driven team.
Success in this role means consistently increasing system stability, reducing operational risk, and improving deployment efficiency at scale.
- Design, implement, and maintain observability solutions to ensure high availability, performance, and reliability across cloud-based systems
- Participate in on-call rotations, incident response, and postmortem analysis to drive continuous operational improvements
- Collaborate with product and engineering teams to design and deploy scalable, resilient, and secure infrastructure solutions
- Develop and enforce cloud architecture standards, reliability practices, and automation strategies for large-scale systems
- Build and maintain infrastructure automation using Infrastructure-as-Code tools such as Terraform, ARM, Bicep, or CloudFormation
- Implement CI/CD and deployment automation workflows using modern DevOps toolchains and source control systems
- Integrate and automate monitoring and operational tools such as Dynatrace, Datadog, App Insights, and similar observability platforms
- Develop scripting and automation solutions using Python, PowerShell, Bash, or REST APIs to improve operational efficiency
- Maintain technical documentation, operational runbooks, and knowledge base content to support engineering and support teams
- Collaborate on security and compliance requirements including SOC, FedRAMP, and cloud security best practices
Requirements:
- 6+ years of experience in Site Reliability Engineering, cloud infrastructure, or software engineering roles
- Strong hands-on experience with Kubernetes-based environments such as AKS, EKS, GKE, or OpenShift
- Deep knowledge of cloud platforms including Microsoft Azure, AWS, or Google Cloud Platform
- Proven experience implementing Infrastructure-as-Code using tools such as Terraform, ARM templates, Bicep, or CloudFormation
- Strong expertise in observability and monitoring tools such as Dynatrace, Datadog, New Relic, Prometheus, Grafana, or Log Analytics
- Solid scripting and automation skills using Python, PowerShell, Bash, or similar languages
- Strong understanding of CI/CD pipelines, Git-based workflows, and DevOps practices
- Experience with configuration management tools such as Ansible, Chef, Puppet, or similar
- Familiarity with distributed systems, containerized applications, and cloud-native architectures
- Ability to work independently in ambiguous environments while managing multiple priorities effectively
- Strong communication skills with the ability to collaborate across engineering, product, and operations teams
- Experience working in Agile environments using Jira or Azure DevOps Boards
- Knowledge of compliance frameworks such as SOC or FedRAMP is a strong advantage
Benefits:
- Competitive base salary ranging from USD 114,000 to 148,000 depending on experience and location
- Comprehensive health coverage including medical, dental, vision, and life insurance
- Retirement savings plan (401K) with employer support
- Short-term and long-term disability coverage
- Paid vacation time and paid holidays
- Professional development and training opportunities
- Remote work flexibility within the United States
- Exposure to large-scale cloud environments and modern DevOps practices
- Opportunity to work on high-impact production systems with strong engineering ownership