Senior Site Reliability Engineer - AWS in United States at Jobgether
Explore Related Opportunities
Job Description
This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Senior Site Reliability Engineer - AWS in United States.
This role sits at the core of a fast-growing, AI-driven engineering environment focused on building highly reliable, scalable, and automated production systems. You will ensure the stability and performance of mission-critical platforms that support complex, high-volume applications. Acting as both a technical leader and hands-on engineer, you will design and evolve infrastructure that minimizes toil and maximizes automation. The position involves deep collaboration with software engineering teams to embed reliability across the full development lifecycle. You will play a key role in strengthening CI/CD pipelines, observability, and incident response practices. This is a highly impactful role where your work directly influences system uptime, developer productivity, and customer experience in a cloud-native AWS ecosystem.
- Lead reliability engineering efforts within cross-functional teams, providing technical direction, mentorship, and operational judgment for production systems.
- Design, build, and maintain autonomous infrastructure and tooling for build, deployment, testing, monitoring, and incident response workflows.
- Own observability across systems by implementing monitoring, logging, alerting, and dashboards to ensure full operational visibility.
- Improve CI/CD pipelines, infrastructure automation, and engineering playbooks to reduce downtime and operational toil.
- Proactively identify risks in system availability, performance, scalability, and security, and implement long-term resilient solutions.
- Participate in on-call rotations, incident response, postmortems, and continuous reliability improvement initiatives.
- Document architectures, processes, and operational best practices while contributing to engineering knowledge sharing.
Requirements:
- 8+ years of experience in software engineering, infrastructure, or operations, including at least 4+ years in Site Reliability Engineering roles.
- Strong programming and scripting skills in Python, Bash, PowerShell, or similar automation-focused languages.
- Deep hands-on experience designing and operating cloud infrastructure on AWS, including services such as EC2, EKS, Lambda, S3, IAM, and CloudWatch.
- Proven ability to build autonomous systems supporting CI/CD, deployment automation, monitoring, and large-scale production operations.
- Strong expertise in incident management, capacity planning, performance tuning, and reliability engineering principles.
- Experience working in fast-paced environments with a strong track record of reducing operational toil through automation.
- Bachelor’s degree in Computer Science or equivalent experience, along with relevant cloud or infrastructure certifications preferred.
- Strong communication, collaboration, and leadership skills with a proactive and continuous improvement mindset.
Benefits:
- Competitive base salary ranging from $175,000 to $190,000 depending on experience and location
- Comprehensive medical, dental, and vision insurance for full-time employees
- Paid time off policy plus maternity and paternity leave
- Short-term and long-term disability coverage
- Opportunity to work in a high-growth, innovation-driven environment
- Continuous learning opportunities with experienced technical and leadership teams
- Inclusive, collaborative culture focused on engineering excellence and impact
- Additional perks and company-provided benefits for full-time employees