Technology Resiliency and Recovery Specialist ensuring IT infrastructure resilience through disaster recovery strategies and AWS services. Collaborating with teams to maintain operational readiness and minimize downtime.
Responsibilities
Design, implement, and maintain disaster recovery (DR) plans for the organizations IT infrastructure, ensuring business continuity.
Assess and analyze business impact, defining recovery objectives (RTO and RPO) and aligning them with organizational goals.
Regularly test disaster recovery procedures through simulations and mock drills to ensure operational readiness.
Work with different teams to identify critical systems and services that need to be included in the disaster recovery plan.
Evaluate DR tools and solutions, focusing on AWS-based services, to ensure a scalable and cost-effective recovery solution.
Ensure that all IT systems are designed with resiliency in mind, ensuring high availability and fault tolerance.
Implement and maintain cloud-based disaster recovery strategies using AWS services such as Amazon EC2, S3, RDS, Route 53, and more.
Collaborate with architecture teams to ensure resiliency and continuity measures are embedded into infrastructure design.
Oversee and optimize backup strategies, ensuring that systems can be quickly restored with minimal data loss.
Automate disaster recovery processes and workflows using modern DevOps tools such as AWS CloudFormation, Tidal, Terraform, Ansible, or other automation frameworks.
Implement Infrastructure as Code (IaC) practices to streamline the provisioning and management of recovery environments.
Use SumoLogic, Dynatrace, AWS Lambda, CloudWatch, and other automation tools to proactively monitor and respond to system events or failures.
Maintain clear and up-to-date documentation of disaster recovery plans, runbooks, and processes.
Provide detailed post-disaster recovery reports, outlining the effectiveness of the recovery process and any lessons learned.
Report on resiliency metrics, recovery objectives, and automation progress to senior leadership.
Lead the response during actual disaster recovery events, coordinating with IT and business units to ensure a smooth recovery process.
Perform post-incident analysis to identify root causes, implement corrective actions, and improve recovery plans.
Collaborate closely with cross-functional teams including IT operations, security, engineering, and business continuity.
Provide training and awareness on disaster recovery procedures to staff, helping them understand the importance of disaster recovery and their roles during recovery scenarios.
Requirements
Proven experience in designing, implementing, and managing disaster recovery plans for both on-premises and cloud-based infrastructure.
Experience with automation tools such as Tidal, Terraform, AWS CloudFormation, Ansible, or similar.
Proficiency in scripting languages (Python, Shell, etc.) to automate processes and workflows.
Excellent verbal and written communication skills for technical and non-technical stakeholders.
Ability to lead recovery efforts, coordinate between various teams, and communicate effectively during high-pressure situations.
AWS Certified Practitioner and Solutions Architect
Sales & Operations Execution Manager ensuring seamless S&OE across Commercial and Supply Chain for Kellanova in Belgium. Balancing supply/demand and enabling data - driven decisions with strong collaboration.
Operations Supervisor overseeing production and logistics at Kellanova Grand Rapids facility. Leading team in safety, quality, compliance, and continuous improvement initiatives to drive operational success.
Senior Associate, Process Manager for Capital One's GPN Portfolio Team. Collaborating across stakeholders to align product roadmaps with business priorities while driving operational execution.
Assistant Operations Supervisor at Firefighter PPE Gear Wash managing daily operations and supervising employees. Overseeing production schedules, safety, quality, and training of staff in Lakewood, CO.
Ingénieur de projets d'infrastructure dans un environnement éducatif et militaire. Pilotage d'opérations immobilières et suivi technique en Méditerranée.
Strategy & Operations Manager enhancing international banking operations with project oversight and strategic concept development. Collaborating with global teams and presenting data - driven insights.
Betriebsleiter für ein Catering - Unternehmen in Berlin, verantwortlich für den Gastronomiebetrieb und die Teamführung. Sicherstellen von Qualität und Gästebetreuung in der Betriebskantine.
Senior Supervisor overseeing day - to - day operations at Copper Mountain Mine. Ensuring compliance with safety protocols and managing open - pit operations with a focus on production.
Commercial Operations Specialist responsible for supporting sales and marketing activities at Mitsubishi Power. Developing proposals, coordinating inputs, and improving commercial processes to ensure competitive bids.
Manager of Incident Management leading incident response for GM’s autonomous driving software. Overseeing incident lifecycle, team management, and cross - functional coordination within AV operations.