JobTarget Logo

Senior Site Reliability Engineer in United States at Jobgether

New
Jobgether
United States, United States
Posted on
New job! Apply early to increase your chances of getting hired.

Explore Related Opportunities

Job Description

Senior Site Reliability Engineer

This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Senior Site Reliability Engineer in United States.

This role is focused on ensuring the reliability, scalability, and performance of a modern, cloud-native platform that supports privacy, security, and data-driven services at enterprise scale. You will act as a senior technical owner of production stability, working closely with engineering, security, and developer experience teams to embed strong reliability practices across the software lifecycle. The environment is fast-moving and highly collaborative, requiring a balance of hands-on engineering and strategic thinking. You will help define and evolve SRE standards, turning incidents and operational learnings into long-term systemic improvements. This is a high-impact position where your work directly influences platform resilience, customer experience, and engineering efficiency. It offers the opportunity to shape observability, incident response, and infrastructure strategy in a remote-first organization.

Accountabilities:
  • Lead reliability design and production readiness reviews for services, ensuring strong observability, safe deployments, and rollback strategies
  • Build, operate, and improve observability systems including logging, metrics, tracing, dashboards, alerts, and runbooks for incident response
  • Own incident management processes, including on-call participation, escalation handling, post-incident reviews, and long-term remediation tracking
  • Design and execute disaster recovery testing, game days, and resilience exercises to validate system robustness and reduce failure points
  • Perform capacity planning and cloud cost optimization to ensure scalable, efficient, and high-performing infrastructure
  • Identify systemic reliability risks and drive cross-team initiatives to reduce incidents and improve platform stability
  • Collaborate with engineering and security teams to integrate reliability practices into CI/CD pipelines, tooling, and development workflows
  • Continuously improve on-call operations, automation, alerting quality, and operational documentation
Requirements:
  • 5+ years of experience in Site Reliability Engineering, Production Engineering, Infrastructure Engineering, or similar production-focused roles
  • Strong hands-on experience with cloud infrastructure (ideally AWS), including compute, networking, storage, and security services
  • Proficiency in at least one programming language such as Python, JavaScript, or TypeScript, with ability to review and understand production code
  • Experience with infrastructure as code and CI/CD tools such as Terraform, CloudFormation, or equivalent platforms
  • Deep knowledge of observability tools (e.g., Datadog or similar), including alert design, monitoring strategies, and incident signal management
  • Proven experience leading incident response, root cause analysis, and postmortem processes with actionable outcomes
  • Strong communication and collaboration skills, with ability to influence across engineering teams without formal authority
  • Experience participating in or improving on-call rotations, escalation workflows, and operational readiness practices
  • Bachelor’s degree in a technical field or equivalent practical experience
  • Ability to thrive in a remote, high-autonomy environment with strong ownership and execution discipline
Benefits:
  • Competitive salary aligned with experience and location
  • Equity participation as part of total compensation package
  • Flexible remote-first work environment
  • Comprehensive health, dental, and vision insurance
  • 401(k) retirement plan with company match
  • Flexible PTO and paid parental leave
  • Home office support and remote work stipend
  • Strong learning culture with growth and development opportunities.
How Jobgether works:
We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.
We appreciate your interest and wish you the best!
Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.
#LI-CL1

Job Location

United States, United States

Frequently asked questions about this position

Continue to apply
Enter your email to continue. You’ll be redirected to the employer’s application.
By clicking Continue, you understand and agree to JobTarget's Terms of Use and Privacy Policy.