About the role

  • Technology Resiliency and Recovery Specialist ensuring IT infrastructure resilience through disaster recovery strategies and AWS services. Collaborating with teams to maintain operational readiness and minimize downtime.

Responsibilities

  • Design, implement, and maintain disaster recovery (DR) plans for the organizations IT infrastructure, ensuring business continuity.
  • Assess and analyze business impact, defining recovery objectives (RTO and RPO) and aligning them with organizational goals.
  • Regularly test disaster recovery procedures through simulations and mock drills to ensure operational readiness.
  • Work with different teams to identify critical systems and services that need to be included in the disaster recovery plan.
  • Evaluate DR tools and solutions, focusing on AWS-based services, to ensure a scalable and cost-effective recovery solution.
  • Ensure that all IT systems are designed with resiliency in mind, ensuring high availability and fault tolerance.
  • Implement and maintain cloud-based disaster recovery strategies using AWS services such as Amazon EC2, S3, RDS, Route 53, and more.
  • Collaborate with architecture teams to ensure resiliency and continuity measures are embedded into infrastructure design.
  • Oversee and optimize backup strategies, ensuring that systems can be quickly restored with minimal data loss.
  • Automate disaster recovery processes and workflows using modern DevOps tools such as AWS CloudFormation, Tidal, Terraform, Ansible, or other automation frameworks.
  • Implement Infrastructure as Code (IaC) practices to streamline the provisioning and management of recovery environments.
  • Use SumoLogic, Dynatrace, AWS Lambda, CloudWatch, and other automation tools to proactively monitor and respond to system events or failures.
  • Maintain clear and up-to-date documentation of disaster recovery plans, runbooks, and processes.
  • Provide detailed post-disaster recovery reports, outlining the effectiveness of the recovery process and any lessons learned.
  • Report on resiliency metrics, recovery objectives, and automation progress to senior leadership.
  • Lead the response during actual disaster recovery events, coordinating with IT and business units to ensure a smooth recovery process.
  • Perform post-incident analysis to identify root causes, implement corrective actions, and improve recovery plans.
  • Collaborate closely with cross-functional teams including IT operations, security, engineering, and business continuity.
  • Provide training and awareness on disaster recovery procedures to staff, helping them understand the importance of disaster recovery and their roles during recovery scenarios.

Requirements

  • Proven experience in designing, implementing, and managing disaster recovery plans for both on-premises and cloud-based infrastructure.
  • Experience with automation tools such as Tidal, Terraform, AWS CloudFormation, Ansible, or similar.
  • Proficiency in scripting languages (Python, Shell, etc.) to automate processes and workflows.
  • Excellent verbal and written communication skills for technical and non-technical stakeholders.
  • Ability to lead recovery efforts, coordinate between various teams, and communicate effectively during high-pressure situations.
  • AWS Certified Practitioner and Solutions Architect

Benefits

  • Health insurance
  • Flexible work arrangements
  • Professional development opportunities

Job title

Tech Operations Lead – Disaster Recovery

Job type

Experience level

Senior

Salary

Not specified

Degree requirement

Bachelor's Degree

Location requirements

Report this job

See something inaccurate? Let us know and we'll update the listing.

Report job