JobTarget Logo

Cloud Reliability & Recovery Engineer in India at Jobgether

NewJob Function: Information Technology
Jobgether
India, India
Posted on
New job! Apply early to increase your chances of getting hired.

Explore Related Opportunities

Job Description

Cloud Reliability & Recovery Engineer

This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for a Cloud Reliability & Recovery Engineer based in India.

This role sits at the core of large-scale cloud resilience engineering, focused on ensuring critical systems remain highly available, fault-tolerant, and recoverable under any disruption. You will design and operate advanced AWS-based disaster recovery and business continuity architectures across multi-region environments. The position requires deep hands-on engineering expertise in cloud infrastructure, automation, and reliability practices, with a strong emphasis on Kubernetes, Infrastructure as Code, and CI/CD-driven operations. You will work closely with security, infrastructure, and application teams to define and enforce recovery strategies aligned with strict RTO/RPO objectives. This is a highly technical role where you will build automated DR systems, validate resiliency through chaos engineering, and continuously improve platform stability. The environment is fast-paced, engineering-driven, and focused on measurable reliability outcomes at enterprise scale.

Accountabilities:

Design, implement, and maintain highly resilient cloud architectures with a strong focus on disaster recovery, business continuity, and system availability. Responsibilities include:

  • Designing multi-region and multi-AZ AWS architectures aligned with defined RTO/RPO targets
  • Building and maintaining failover and failback mechanisms using Route 53, Global Accelerator, and CloudFront
  • Developing automated disaster recovery runbooks using AWS Systems Manager, Step Functions, and related services
  • Implementing backup and recovery strategies across AWS services including EC2, RDS, S3, DynamoDB, and Aurora
  • Automating backup policies, replication workflows, and recovery validation processes
  • Performing chaos engineering and resilience testing using AWS Fault Injection Simulator
  • Managing Infrastructure as Code using Terraform and/or CloudFormation for DR environments
  • Developing CI/CD-driven automation for failover, deployment, and recovery workflows
  • Building observability dashboards, alerts, and incident response workflows using CloudWatch and third-party tools
  • Participating in on-call rotations, incident response, and post-incident reviews
  • Maintaining DR documentation, compliance artifacts, and audit-ready recovery evidence
Requirements:

The ideal candidate brings strong AWS expertise, deep cloud reliability experience, and a proven ability to design and operate large-scale disaster recovery systems.

  • 5+ years of experience in cloud infrastructure, SRE, or disaster recovery engineering roles
  • 3+ years of hands-on AWS production experience at scale
  • Proven experience designing and implementing multi-region DR architectures with defined RTO/RPO
  • Strong expertise in AWS services including EC2, RDS, S3, DynamoDB, Aurora, and related resilience tools
  • Hands-on experience with Kubernetes-based deployments and cloud-native architecture
  • Strong scripting skills in Python, Bash, or PowerShell for automation and orchestration
  • Experience with Infrastructure as Code tools such as Terraform or AWS CloudFormation
  • Solid understanding of networking concepts including VPC, DNS failover, VPN, and Direct Connect
  • Strong knowledge of CI/CD pipelines and automation frameworks
  • Excellent communication skills with the ability to produce clear technical and executive reports
  • Experience with resilience frameworks, compliance standards, and operational best practices
Benefits:
  • Competitive compensation aligned with experience and industry standards
  • Opportunity to work on mission-critical, large-scale cloud resilience systems
  • Remote-friendly work environment with global collaboration
  • Exposure to advanced AWS architectures, DR automation, and chaos engineering practices
  • Strong focus on engineering excellence, automation, and continuous improvement
  • Learning opportunities in cloud reliability, security, and enterprise-scale infrastructure
  • Collaborative environment working with highly skilled engineering and security teams
How Jobgether works:
We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.
We appreciate your interest and wish you the best!
Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.
#LI-CL1

Job Location

India, India

Frequently asked questions about this position

Continue to apply
Enter your email to continue. You’ll be redirected to the employer’s application.
By clicking Continue, you understand and agree to JobTarget's Terms of Use and Privacy Policy.