Senior Site Reliability Engineer in India at Jobgether
Explore Related Opportunities
Job Description
This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Senior Site Reliability Engineer in India.
This role sits at the heart of large-scale cloud infrastructure and platform reliability, ensuring high-performing, secure, and scalable systems that power modern retail intelligence applications. You will work in a DevOps-oriented environment where automation, observability, and continuous improvement are key drivers of success. The position involves close collaboration with engineering and product teams to design and maintain resilient infrastructure across cloud-native environments. You will play a critical role in deploying, scaling, and optimizing systems using modern tooling such as Kubernetes, Terraform, and CI/CD pipelines. Operating in a globally distributed setup, you will support US business hours while contributing to mission-critical platform stability. This is a high-impact role where your decisions directly influence system reliability, developer productivity, and operational efficiency.
- Design, build, and maintain scalable and reliable cloud infrastructure supporting production applications and internal platforms.
- Manage deployment pipelines and CI/CD workflows using tools such as GitHub Actions, CircleCI, or Argo Workflows.
- Implement infrastructure as code practices using Terraform to ensure consistency, scalability, and automation.
- Operate and optimize containerized environments using Docker and Kubernetes (EKS/GKE or similar).
- Develop and maintain internal DevOps tools that improve deployment speed, reliability, and operational efficiency.
- Establish and enhance monitoring, logging, and alerting systems using tools like Datadog, OpenSearch, or Sentry.
- Participate in on-call rotations, incident response, post-mortems, and root cause analysis to ensure system reliability.
- Collaborate with development teams to improve system design, deployment strategies, and infrastructure architecture.
- Manage authentication, authorization, and secure gateway solutions across platforms.
- Continuously optimize cloud environments, including automation, performance tuning, and cost efficiency.
- 5+ years of experience in Site Reliability Engineering, DevOps, or cloud infrastructure roles.
- Strong hands-on experience with AWS and container orchestration platforms such as Kubernetes (EKS/GKE).
- Solid expertise in Infrastructure as Code using Terraform.
- Proficiency in CI/CD pipeline management using tools like GitHub Actions, CircleCI, or similar.
- Strong programming/scripting skills in at least one language such as Python or Bash.
- Experience with Docker and containerized application deployments.
- Knowledge of monitoring and observability tools such as Datadog, Sentry, or OpenSearch.
- Hands-on experience with authentication and authorization systems in distributed environments.
- Familiarity with incident management, on-call operations, and production support best practices.
- Strong problem-solving skills, ownership mindset, and ability to work independently with minimal supervision.
- Experience working in globally distributed teams and supporting EST working hours.
- Competitive compensation with company equity eligibility.
- Remote-first flexibility with options to work from home or office.
- Broadband and home office reimbursement support.
- Comprehensive group medical insurance coverage for employees and family.
- Crèche benefit support for working parents.
- Opportunity to work on large-scale cloud-native systems and cutting-edge DevOps technologies.
- Exposure to global teams and high-impact production environments.
- Strong emphasis on learning, autonomy, and professional growth.