JobTarget Logo

Senior Site Reliability Engineer at Fabric – New York

Fabric
New York, United States
Posted on
NewEmployment Type:Full-Time
New job! Apply early to increase your chances of getting hired.

About This Position

About the Role
As a Senior Site Reliability Engineer, you will own and evolve the infrastructure powering healthcare experiences for millions of patients. This role bridges the gap between traditional infrastructure excellence and the future of AI-driven operations. You will act as a primary architect for our AWS and Kubernetes (EKS) environment, ensuring the platform is resilient, scalable, and compliant while exploring how agentic workflows can modernize SRE practices.

What You'll Do
As a Senior Site Reliability Engineer, you will be a steward of Fabric’s production integrity, leading the strategy for infrastructure automation, observability, and system resilience. Your primary responsibilities will include:

  • Cluster Management: Design, deploy, and maintain production Kubernetes (EKS) infrastructure to ensure enterprise-grade availability.
  • Infrastructure as Code: Build and manage scalable infrastructure state using Terraform to eliminate manual configuration.
  • Cloud Architecture: Optimize our AWS footprint (EC2, RDS, S3) for performance, cost, and reliability.
  • Agentic Workflows: Explore and deploy agentic workflows for AI-assisted runbooks to automate complex operational decision-making and tasks.
  • CI/CD Engineering: Build and evolve robust deployment pipelines using GitHub Actions or Semaphore to ensure safe, rapid delivery.
  • Toil Reduction: Develop internal tooling to replace manual operational tasks with intelligent, autonomous systems.
  • Stack Evolution: Evolve the observability stack with a focus on Datadog, implementing sophisticated metrics, traces, logs, and SLOs.
  • Incident Leadership: Lead incident response efforts and facilitate blameless postmortems to systematically reduce recovery time (MTTR).
  • Reliability Governance: Define and monitor SLIs and SLOs to ensure the platform meets rigorous healthcare performance standards.
  • Regulatory Excellence: Ensure all infrastructure meets HIPAA compliance and healthcare regulatory requirements.
  • Collaborative Mentorship: Support engineers across the organization in adopting reliability best practices and participate in cross-functional design reviews.

Why You Might Be a Good Fit
  • You are a deeply proficient engineer who excels at the intersection of cloud infrastructure, automation, and system design.
  • You possess a meticulous approach to observability and a passion for finding the "root cause" rather than just applying a patch.
  • You enjoy exploring the "next frontier" of SRE, including how AI and agentic tools can make operations more efficient.
  • You thrive in fast-paced environments where technical rigor is balanced with pragmatism and clinical-grade safety.

This Might Not Be The Right Fit If...
  • You prefer working on static infrastructure rather than evolving systems through code and automation.
  • You are uncomfortable with the "agile" pace of tech-driven platform development or integrating AI tools into your daily workflow.
  • You prefer a siloed role that does not involve active participation in incident response or collaborative postmortems.

Job Requirements
  • 5+ years of experience in SRE, DevOps, or Platform roles managing production environments at scale.
  • Expert technical depth in AWS (EKS, EC2, RDS, S3) and production-grade Kubernetes management.
  • Proficiency with modern tooling including Terraform (IaC), Datadog (Observability), and CI/CD systems.
  • Deeply proficient coding and scripting skills in Python, Bash, Ruby, or Go.
  • Preferred experience building agentic workflows or AI-assisted tooling to drive operational efficiency.
  • A "rigor-first" mindset with a dedication to HIPAA-compliant, high-availability architecture.

The national pay range for this role is $120,000.00 – $145,000.00 per year. Actual compensation will be determined by factors such as the candidate's geographic market, experience, skills, and qualifications. Certain roles may also be eligible for additional compensation, including a comprehensive benefits package such as medical, dental, vision, unlimited PTO, and a 401(k) plan, stock options and bonuses. If your compensation requirement is greater than our posted range, please still consider applying; a determination can be made based on unique qualifications. Expected compensation ranges for this role may change over time.

Job Location

New York, United States

Frequently asked questions about this position

Latest Job Openings in New York

Camp Music Specialist

Sid Jacobson JCC
Greenvale, NY

CDL-A - Intermodal truck driver

Schneider
Hamilton, NY

CDL-A - Dedicated truck driver

Schneider
Amsterdam, NY
Continue to apply
Enter your email to continue. You’ll be redirected to the employer’s application.
By clicking Continue, you understand and agree to JobTarget's Terms of Service and Privacy Policy.
Apply Now