JobTarget Logo

Cloud Reliability & Recovery Engineer in India at Jobgether

NewJob Function: Information Technology
Jobgether
India, India
Posted on
New job! Apply early to increase your chances of getting hired.

Explore Related Opportunities

Job Description

Cloud Reliability & Recovery Engineer

This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for a Cloud Reliability & Recovery Engineer based in India.

This is a senior, hands-on cloud engineering role focused on building and maintaining highly resilient, always-available AWS environments. You will design and operate large-scale disaster recovery (DR) and business continuity (BCP) frameworks that ensure critical systems remain operational even during major disruptions. The role sits at the intersection of SRE, infrastructure engineering, and incident response, with a strong emphasis on automation, fault tolerance, and cloud-native architecture. You will work extensively with Kubernetes, Terraform, and AWS-native resilience services to engineer multi-region failover and recovery strategies. The environment is fast-paced, security-conscious, and highly collaborative, involving close partnership with infrastructure, security, and application teams. Your work will directly reduce downtime risk and strengthen global service reliability across mission-critical systems.

Accountabilities:
  • Design and implement highly available, multi-region and multi-AZ AWS architectures aligned with defined RTO/RPO objectives, ensuring system continuity under failure scenarios.
  • Build and maintain disaster recovery (DR) solutions including automated failover/failback mechanisms using services such as Route 53, Global Accelerator, CloudFront, and AWS Systems Manager.
  • Develop and execute backup, restore, and data replication strategies across AWS services (RDS, DynamoDB, S3, EFS, Aurora), ensuring integrity and recoverability.
  • Implement infrastructure as code using Terraform or CloudFormation to standardize and automate DR-ready environments.
  • Create and maintain CI/CD-driven DR testing pipelines, including chaos engineering practices to validate system resilience under real-world failure conditions.
  • Monitor system availability and resilience using CloudWatch, incident tooling, and AWS health services, participating in on-call rotations and leading incident response efforts.
  • Conduct DR drills, tabletop exercises, and post-incident reviews to continuously improve recovery readiness and compliance posture.
Requirements:
  • 5+ years of experience in cloud engineering, SRE, infrastructure, or disaster recovery roles, with at least 3+ years in AWS production environments at scale.
  • Proven experience designing and operating multi-region disaster recovery architectures with measurable RTO/RPO outcomes.
  • Strong expertise in AWS services related to resilience, including networking (VPC, DNS, VPN, Direct Connect) and storage/database replication.
  • Hands-on experience with Infrastructure as Code tools such as Terraform and/or CloudFormation.
  • Proficiency in scripting and automation using Python, Bash, or PowerShell.
  • Solid understanding of Kubernetes-based deployments, including scaling, self-healing, and multi-cluster strategies.
  • Experience with CI/CD tools and practices (e.g., GitHub Actions, CodePipeline, CodeBuild).
  • Strong communication skills with the ability to document DR strategies and present technical risks and recovery plans clearly.
  • Preferred: AWS certifications (Solutions Architect – Professional, DevOps Engineer – Professional, Advanced Networking Specialty).
Benefits:
  • Competitive compensation package aligned with senior-level cloud engineering roles.
  • Opportunity to work on large-scale, mission-critical cloud infrastructure with global impact.
  • Flexible and remote-friendly work arrangements (depending on team policy).
  • Strong focus on learning and upskilling in advanced AWS, resilience engineering, and cloud architecture.
  • Exposure to modern engineering practices including chaos engineering, SRE methodologies, and GitOps workflows.
  • Collaborative, high-autonomy environment with strong engineering ownership.
  • Health, wellness, and standard employee benefits in line with industry benchmarks.
How Jobgether works:
We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.
We appreciate your interest and wish you the best!
Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.
#LI-CL1

Job Location

India, India

Frequently asked questions about this position

Continue to apply
Enter your email to continue. You’ll be redirected to the employer’s application.
By clicking Continue, you understand and agree to JobTarget's Terms of Use and Privacy Policy.