Can I apply directly for this job on this page?

Yes, you can begin your application on this page using a quick form. You'll then be redirected to the employer's career site to complete the full application process.

What is the role of a Senior Site Reliability Engineer at Jobgether?

The Senior Site Reliability Engineer position at Jobgether is a Full-time or part-time position opportunity in the relevant field.

Where is this Senior Site Reliability Engineer job located?

United States, Other / Non-US, United States

What type of employment is offered for this Senior Site Reliability Engineer role?

Full-time or part-time position

What industry does this Senior Site Reliability Engineer position belong to?

This role spans multiple industries.

What is the expected salary for this Senior Site Reliability Engineer job?

Compensation will be discussed during the hiring process.

Senior Site Reliability Engineer job near me in United States, Other / Non-US at Jobgether

Senior Site Reliability Engineer

This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Senior Site Reliability Engineer in United States.

This role is focused on ensuring the reliability, scalability, and performance of a modern, cloud-native platform that supports privacy, security, and data-driven services at enterprise scale. You will act as a senior technical owner of production stability, working closely with engineering, security, and developer experience teams to embed strong reliability practices across the software lifecycle. The environment is fast-moving and highly collaborative, requiring a balance of hands-on engineering and strategic thinking. You will help define and evolve SRE standards, turning incidents and operational learnings into long-term systemic improvements. This is a high-impact position where your work directly influences platform resilience, customer experience, and engineering efficiency. It offers the opportunity to shape observability, incident response, and infrastructure strategy in a remote-first organization.

Accountabilities:

Lead reliability design and production readiness reviews for services, ensuring strong observability, safe deployments, and rollback strategies
Build, operate, and improve observability systems including logging, metrics, tracing, dashboards, alerts, and runbooks for incident response
Own incident management processes, including on-call participation, escalation handling, post-incident reviews, and long-term remediation tracking
Design and execute disaster recovery testing, game days, and resilience exercises to validate system robustness and reduce failure points
Perform capacity planning and cloud cost optimization to ensure scalable, efficient, and high-performing infrastructure
Identify systemic reliability risks and drive cross-team initiatives to reduce incidents and improve platform stability
Collaborate with engineering and security teams to integrate reliability practices into CI/CD pipelines, tooling, and development workflows
Continuously improve on-call operations, automation, alerting quality, and operational documentation

Requirements:

5+ years of experience in Site Reliability Engineering, Production Engineering, Infrastructure Engineering, or similar production-focused roles
Strong hands-on experience with cloud infrastructure (ideally AWS), including compute, networking, storage, and security services
Proficiency in at least one programming language such as Python, JavaScript, or TypeScript, with ability to review and understand production code
Experience with infrastructure as code and CI/CD tools such as Terraform, CloudFormation, or equivalent platforms
Deep knowledge of observability tools (e.g., Datadog or similar), including alert design, monitoring strategies, and incident signal management
Proven experience leading incident response, root cause analysis, and postmortem processes with actionable outcomes
Strong communication and collaboration skills, with ability to influence across engineering teams without formal authority
Experience participating in or improving on-call rotations, escalation workflows, and operational readiness practices
Bachelor’s degree in a technical field or equivalent practical experience
Ability to thrive in a remote, high-autonomy environment with strong ownership and execution discipline

Benefits:

Competitive salary aligned with experience and location
Equity participation as part of total compensation package
Flexible remote-first work environment
Comprehensive health, dental, and vision insurance
401(k) retirement plan with company match
Flexible PTO and paid parental leave
Home office support and remote work stipend
Strong learning culture with growth and development opportunities.

How Jobgether works:

We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.

We appreciate your interest and wish you the best!

Why Apply Through Jobgether?

Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.

#LI-CL1

Senior Site Reliability Engineer in United States at Jobgether

Explore Related Opportunities

Job Description

Scan to Apply

Job Location

Frequently asked questions about this position