JobTarget Logo

Sr Site Reliability Engineer in India at Jobgether

NewJob Function: Engineering
Jobgether
India, India
Posted on
New job! Apply early to increase your chances of getting hired.

Explore Related Opportunities

Job Description

Sr Site Reliability Engineer

This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for a Sr Site Reliability Engineer based in India.

This role sits at the core of a rapidly scaling observability platform that powers how engineering teams monitor, debug, and optimize complex distributed systems. You will be responsible for ensuring the reliability, scalability, and performance of a large-scale SaaS infrastructure that processes massive volumes of observability data. The environment is highly technical and deeply hands-on, requiring strong instincts for diagnosing production issues and preventing them at scale. You will work closely with platform, data, and product engineering teams to maintain and evolve a petabyte-scale system built on modern cloud-native technologies. The role involves owning uptime, performance, and operational excellence across Kubernetes-based infrastructure and high-throughput data pipelines. This is an opportunity to shape the backbone of a globally used open-source product trusted by thousands of engineering teams.

Accountabilities:

In this role, you will own the operational stability and scalability of a large distributed observability platform while continuously improving system performance, reliability, and automation. You will:

  • Design, operate, and improve large-scale Kubernetes infrastructure including upgrades, scaling, networking, and multi-tenancy
  • Ensure system reliability through strong SRE practices including SLOs, SLIs, error budgets, incident response, and on-call optimization
  • Scale and maintain high-throughput ingestion pipelines handling petabyte-scale observability data
  • Operate, tune, and optimize data systems such as ClickHouse for performance, cost efficiency, and reliability
  • Build automation and tooling using infrastructure-as-code and CI/CD to improve deployment and operational efficiency
  • Monitor, debug, and resolve complex production issues across distributed systems
  • Improve observability of the platform itself using modern monitoring, logging, and tracing practices
Requirements:

This role requires strong experience in building and operating large-scale distributed systems with a deep focus on reliability and performance. You should bring:

  • 5–8 years of experience in SRE, infrastructure, platform engineering, or backend systems roles
  • Deep hands-on expertise with Kubernetes in production-scale environments
  • Strong understanding of distributed systems, failure modes, performance tuning, and capacity planning
  • Experience working with high-scale data systems (ClickHouse, Kafka, or similar) is highly desirable
  • Proficiency in at least one programming language (Go strongly preferred) with a focus on automation and system reliability
  • Familiarity with observability concepts and tools such as OpenTelemetry, metrics, logs, and traces
  • Strong problem-solving skills with the ability to debug complex production issues
  • Excellent communication skills with the ability to write clear documentation and runbooks
  • Experience in fast-paced, high-ownership, remote-first environments
  • Open-source contributions or strong engagement with OSS ecosystems is a plus
Benefits:
  • Competitive salary package ranging from 50L to 1Cr annually
  • Fully remote, India-based role with flexible, async-friendly working culture
  • High ownership role with direct impact on a globally used open-source platform
  • Opportunity to work on petabyte-scale distributed systems and cutting-edge observability infrastructure
  • Strong engineering culture focused on shipping, reliability, and continuous improvement
  • Exposure to modern cloud-native technologies including Kubernetes, ClickHouse, and OpenTelemetry
  • Collaborative, high-caliber team environment with strong technical peers
  • Opportunity to contribute to a fast-growing open-source ecosystem used by thousands of engineering teams
How Jobgether works:
We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.
We appreciate your interest and wish you the best!
Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.
#LI-CL1

Job Location

India, India

Frequently asked questions about this position

Continue to apply
Enter your email to continue. You’ll be redirected to the employer’s application.
By clicking Continue, you understand and agree to JobTarget's Terms of Use and Privacy Policy.