J

Cloud Reliability & Recovery Engineer

icon building Company : Jobgether
icon briefcase Job Type : Full Time

Number of Applicants

 : 

000+

Click to reveal the number of candidates who applied for this job.
icon loader
Apply Now
icon loader Apply Now

Let AI Supercharge Your Job Hunt!

JobCopilot scans 500,000+ company career sites daily to find jobs for you

Never miss an opportunity Save hours by auto-filling applications forms Land more interviews with tailored applications
happy man
thunder iconActivate JobCopilot

Job Description - Cloud Reliability & Recovery Engineer










This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for a Cloud Reliability & Recovery Engineer based in India.


This is a senior, hands-on cloud engineering role focused on building and maintaining highly resilient, always-available AWS environments. You will design and operate large-scale disaster recovery (DR) and business continuity (BCP) frameworks that ensure critical systems remain operational even during major disruptions. The role sits at the intersection of SRE, infrastructure engineering, and incident response, with a strong emphasis on automation, fault tolerance, and cloud-native architecture. You will work extensively with Kubernetes, Terraform, and AWS-native resilience services to engineer multi-region failover and recovery strategies. The environment is fast-paced, security-conscious, and highly collaborative, involving close partnership with infrastructure, security, and application teams. Your work will directly reduce downtime risk and strengthen global service reliability across mission-critical systems.










Accountabilities:



  • Design and implement highly available, multi-region and multi-AZ AWS architectures aligned with defined RTO/RPO objectives, ensuring system continuity under failure scenarios.

  • Build and maintain disaster recovery (DR) solutions including automated failover/failback mechanisms using services such as Route 53, Global Accelerator, CloudFront, and AWS Systems Manager.

  • Develop and execute backup, restore, and data replication strategies across AWS services (RDS, DynamoDB, S3, EFS, Aurora), ensuring integrity and recoverability.

  • Implement infrastructure as code using Terraform or CloudFormation to standardize and automate DR-ready environments.

  • Create and maintain CI/CD-driven DR testing pipelines, including chaos engineering practices to validate system resilience under real-world failure conditions.

  • Monitor system availability and resilience using CloudWatch, incident tooling, and AWS health services, participating in on-call rotations and leading incident response efforts.

  • Conduct DR drills, tabletop exercises, and post-incident reviews to continuously improve recovery readiness and compliance posture.


Requirements:



  • 5+ years of experience in cloud engineering, SRE, infrastructure, or disaster recovery roles, with at least 3+ years in AWS production environments at scale.

  • Proven experience designing and operating multi-region disaster recovery architectures with measurable RTO/RPO outcomes.

  • Strong expertise in AWS services related to resilience, including networking (VPC, DNS, VPN, Direct Connect) and storage/database replication.

  • Hands-on experience with Infrastructure as Code tools such as Terraform and/or CloudFormation.

  • Proficiency in scripting and automation using Python, Bash, or PowerShell.

  • Solid understanding of Kubernetes-based deployments, including scaling, self-healing, and multi-cluster strategies.

  • Experience with CI/CD tools and practices (e.g., GitHub Actions, CodePipeline, CodeBuild).

  • Strong communication skills with the ability to document DR strategies and present technical risks and recovery plans clearly.

  • Preferred: AWS certifications (Solutions Architect – Professional, DevOps Engineer – Professional, Advanced Networking Specialty).


Benefits:



  • Competitive compensation package aligned with senior-level cloud engineering roles.

  • Opportunity to work on large-scale, mission-critical cloud infrastructure with global impact.

  • Flexible and remote-friendly work arrangements (depending on team policy).

  • Strong focus on learning and upskilling in advanced AWS, resilience engineering, and cloud architecture.

  • Exposure to modern engineering practices including chaos engineering, SRE methodologies, and GitOps workflows.

  • Collaborative, high-autonomy environment with strong engineering ownership.

  • Health, wellness, and standard employee benefits in line with industry benchmarks.


How Jobgether works:

We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.

We appreciate your interest and wish you the best!


 

Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.

 

 

#LI-CL1
We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.
Original job Cloud Reliability & Recovery Engineer posted on GrabJobs ©. To flag any issues with this job please use the Report Job button on GrabJobs.
Apply Now
Share Job
Share Job

Auto-Apply to Cloud Reliability & Recovery Engineer Jobs with your AI JobCopilot

thunder icon Auto-Apply with AI

Similar Cloud Reliability & Recovery Engineer Jobs in India

GrabJobs is the no1 job portal in India, connecting you to thousands of jobs fast! Find the best jobs in India, apply in 1 click and get a job today!

Mobile Apps

Copyright © 2026 Grabjobs Pte.Ltd. All Rights Reserved.