Lead Director Engineering for SRE at CVS Health improving reliability of retail and pharmacy technology. Guiding a global technical team while executing strategic objectives and incident management.
Responsibilities
Lead a global team of technical professionals, providing guidance, mentorship, and support to ensure their success and professional growth
Align SRE strategies with enterprise goals, delivering resilient technology that enables world-class customer and patient experiences
Execute on a multi-year roadmap for observability, automation, and reliability improvements across distributed store environments
Define and implement standardization and process improvements across the SRE organization
Define and maintain Service Level Indicators (SLIs), Service Level Objectives (SLOs), and business KPIs to measure and enhance system reliability for critical store applications
Build and optimize dashboards, visualizations, and alerting systems to enable real-time insights and rapid incident response for edge nodes and remote facilities
Provide weekly and monthly reporting on KPIs and other operational-based metrics to cross-functional teams and senior leadership
Develop and implement strategic communication plans to support organizational goals, ensuring alignment with business objectives and stakeholder needs
Reduce operational costs and increase efficiency through automation and platform engineering to reduce toil and enable self-healing capabilities across thousands of locations
Lead major incident management, ensuring rapid detection, root-cause analysis, and resolution in collaboration with business and technology partners
Champion modern cloud, edge, and AI-driven monitoring solutions for store technology
Partner with architects, product engineering, and infrastructure teams to embed reliability practices throughout the software lifecycle
Represent CVS Health as a thought leader in SRE and operational resilience, both internally and externally
Mentor the SRE and technical teams on building, scaling, and operating highly available systems
Foster a culture of ownership, honesty, accountability, and continuous improvement within the organization
Contribute to long-term planning, technology adoption strategies, and innovation initiatives to drive digital modernization efforts
Requirements
10+ years of experience with cloud platform technologies such as: AWS, Microsoft Azure, Google Cloud
8+ years of experience in a technical leadership or people management role, with a proven ability to lead and grow technical teams, particularly within SRE or large-scale reliability organizations
8+ years of experience leading complex technical initiatives using Agile/continuous improvement methodologies
8+ years of managing distributed technology environments (retail, healthcare, or other multi-site operations strongly preferred)
5+ years of experience in container orchestration (Kubernetes) and using monitoring tools (Dynatrace, AppDynamics, Prometheus, Splunk, Grafana, etc.)
Strong understanding of cloud infrastructure components (compute, storage, networking, security)
Strong knowledge of Point of Sale (POS), pharmacy systems, handheld devices, store servers, and network infrastructure
Exceptional communication, decision-making, and problem-solving skills, with demonstrated ability to influence senior executives and cross-functional teams (technical and non-technical)
Experience leading performance reviews, career development planning, and team capacity management
Mastery of incident management, observability, automation, and operational excellence practices
Adept at resource planning, program delivery, and change leadership at enterprise scale
Adept at collaboration, teamwork, and fostering an inclusive engineering culture
Benefits
Affordable medical plan options
401(k) plan (including matching company contributions)
Employee stock purchase plan
No-cost programs including wellness screenings, tobacco cessation and weight management programs, confidential counseling and financial coaching
Paid time off
Flexible work schedules
Family leave
Dependent care resources
Colleague assistance programs
Tuition assistance
Retiree medical access and many other benefits depending on eligibility
Job title
Lead Director Engineering, SRE – Retail & Pharmacy
DevOps Engineer responsible for building and maintaining scalable AI systems on Azure cloud. Collaborating with teams to ensure operational excellence for enterprise - grade AI solutions.
Junior MLOps Engineer helping to design and maintain AI/ML systems at Bupa. Collaborating with teams to operationalize machine learning models and automate workflows.
DevOps Engineer developing and managing scalable AWS infrastructures for a PropTech startup. Collaborating within a growing tech team to achieve ambitious goals in the legal conveyancing space.
Senior DevOps Engineer leading the design and optimization of cloud infrastructure at Growth Acceleration Partners. Ensuring secure and cost - effective deployments within fast - paced product development environment.
Advanced Dev Ops Engineer optimizing infrastructure solutions for engineering teams at a consulting and technology services company. Ensuring secure and cost - effective deployments in a fast - paced environment.
Entry - level DevOps Engineer at Nokia focusing on building and maintaining CI environment for LTE and 5G solutions. Engage with high - end telecommunication technologies and support development workflows.
AI Security Control Developer/Site Reliability Engineer for RBC's enterprise AI ecosystem. Design, implement, and validate security controls to protect AI systems with 24/7 reliability.
Senior Site Reliability Engineer ensuring scalability and reliability for NGINX systems and SaaS platforms. Collaborating across teams to drive automation and system performance.
Site Reliability Engineer ensuring reliability and performance of data platform services for Veepee. Collaborating on cloud migration, Kubernetes operations, and observability best practices.
Senior Lead Site Reliability Engineer overseeing critical systems stability and incident management. Leading Java applications reliability and supporting a dynamic technology environment.