Senior Site Reliability Engineer - AWS in United States at Jobgether
Explore Related Opportunities
Job Description
This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for a Senior Site Reliability Engineer - AWS based in the United States.
This role offers an opportunity to shape the reliability, scalability, and performance of mission-critical cloud platforms supporting a rapidly growing technology environment. As a senior engineering leader within a cross-functional team, you will drive automation, operational excellence, and infrastructure resilience while helping build systems capable of operating at internet scale. The position combines deep technical expertise with strategic influence, allowing you to design autonomous platforms, enhance developer productivity, and strengthen system reliability across the software development lifecycle. You will work closely with engineers, product teams, and stakeholders to ensure seamless operations, rapid incident response, and continuous improvement. This is an ideal opportunity for a highly motivated reliability engineer who thrives on solving complex challenges, mentoring others, and building resilient cloud-native systems.
- Serve as the reliability engineering lead within a cross-functional team, providing technical leadership, mentorship, and guidance on best practices.
- Design, implement, and maintain highly automated systems that support software development, deployment, testing, monitoring, and operational workflows.
- Act as the primary advocate for reliability, scalability, and operational excellence throughout the entire software development lifecycle.
- Develop and maintain monitoring, logging, dashboarding, and alerting solutions that provide visibility into application and infrastructure health.
- Continuously improve CI/CD pipelines, automation frameworks, deployment processes, and operational tooling to increase efficiency and reduce manual effort.
- Identify and remediate reliability, performance, availability, and security risks across cloud infrastructure and production systems.
- Create and maintain technical documentation, operational procedures, architecture standards, and engineering best practices.
- Research, evaluate, and implement tools and technologies that improve system resilience and engineering productivity.
- Collaborate with engineering teams to troubleshoot complex production issues and ensure rapid incident resolution.
- Participate in on-call rotations and provide support during critical production incidents and emergency response situations.
- Mentor junior engineers and contribute to the development of a strong reliability engineering culture.
Requirements:
- 8+ years of experience in software engineering, infrastructure engineering, cloud operations, or related technical disciplines.
- Minimum of 7 years of dedicated experience in Site Reliability Engineering (SRE) or closely related reliability-focused roles.
- Strong expertise in Python, Bash, PowerShell, and other scripting or automation technologies commonly used within SRE environments.
- Extensive experience designing, building, and maintaining autonomous systems that automate deployment, testing, monitoring, and operational processes.
- Advanced hands-on experience with AWS services, including EC2, EKS/Kubernetes, CloudWatch, Lambda, S3, IAM, and related cloud-native technologies.
- Proven ability to implement and optimize monitoring, alerting, incident management, capacity planning, and performance optimization strategies.
- Deep understanding of CI/CD pipelines, infrastructure automation, and modern DevOps and SRE practices.
- Experience building and maintaining highly available, scalable, and resilient production systems in fast-paced environments.
- Strong problem-solving skills with the ability to independently identify reliability challenges and drive long-term improvements.
- Demonstrated success reducing operational toil through automation and process optimization.
- Excellent communication, collaboration, and stakeholder management skills.
- Bachelor’s degree in Computer Science, Information Systems, or a related field, equivalent certifications, or comparable professional experience.
- AWS certifications or other cloud-related certifications are considered a plus.
- Self-driven mindset with a passion for continuous learning, innovation, and operational excellence.
Benefits:
- Competitive base salary ranging from $175,000 to $195,000, based on experience, skills, and location.
- Comprehensive medical, dental, and vision insurance coverage for eligible employees.
- Generous paid time off program.
- Paid maternity and paternity leave.
- Short-term and long-term disability coverage.
- Opportunity to work within a dynamic, fast-growing technology organization.
- Exposure to large-scale cloud infrastructure and cutting-edge engineering challenges.
- Access to experienced leadership and mentorship opportunities.
- Strong culture focused on innovation, collaboration, and professional growth.
- Competitive compensation package designed to support pay equity.
- Company-branded merchandise and employee recognition perks.
- Opportunity to make a significant impact on highly scalable, mission-critical systems.