Hybrid Director, Reliability Engineering

Posted last month

Apply now

About the role

  • Support infrastructure operation teams focused on cloud and on-premise infrastructure.
  • Support sustainable engineering practices, including systematic intake, driving infrastructure management practices, and modern configuration management.
  • Support cloud platforms practices for a multi-cloud ecosystem including AWS, GCP, and Azure.
  • Establish cloud platform practices and help build technologies and practices that lower the barrier of entry for engineers using cloud infrastructure.
  • Support engineering enablement services and practices.
  • Help build a strategic roadmap for tooling and automation that will allow engineers to quickly, securely, and effectively build applications in cloud ecosystems.
  • Own critical ITIL practices including change management, problem management, and incident management.
  • Drive adoption of documented processes, using strong cross-organizational relationships to ensure success and support maturity assessments within Digital.
  • Build connectivity between process and practices, helping drive robust metrics and simplified strategies for turning documentation into engineering practices.
  • Own incident management responses and ensure communications and escalations to senior leadership are simple and effective.
  • Drive change management activities, including CAB, release management, and audit execution.
  • Build and blend modern observability practices with traditional NOC/SOC teams to create a lean and robust monitoring ecosystem between SaaS, cloud, and on-premise services.
  • Integrate incident management practices with automated observability tools and methodologies to drive visibility into system health and ensure service owners know about issues before their users.
  • Establish clear metrics such as Key Performance Indicators (KPIs), Service Level Objectives (SLOs), and Service Level Indicators (SLIs) to measure and continuously improve operational performance.
  • Foster a proactive culture of monitoring and early detection to identify and address system anomalies before they impact users or infrastructure reliability.
  • Build and lead high-performing global teams across infrastructure, ITSM, and observability teams.
  • Create strategic roadmaps and participate as delivery lead in large program level initiatives.
  • Collaborate closely with Security, Engineering, Compliance, and Legal organizations to ensure alignment and transparency.
  • Mentor, develop, and support technical teams, driving a culture of ownership, innovation, and continuous improvement.
  • Define KPIs and metrics to measure operational performance and developer productivity.
  • Drive vendor strategy and manage partner relationships for infrastructure platforms and developer tools.

Requirements

  • 12+ years of experience in infrastructure, platform engineering, ITIL, or developer tooling, with 5+ years in senior leadership roles.
  • Proven track record overseeing large-scale cloud environments and physical data centers in complex enterprise environments.
  • Expertise in Agile methodologies and driving team cultures through iterative improvement and technical excellence.
  • Expertise in infrastructure-as-code (Terraform, Ansible), cloud-native operations, and hybrid networking.
  • Deep understanding of developer platforms, including source control (GitHub, GitLab, Perforce, ADO), artifact repositories, CI/CD frameworks, and observability stacks.
  • Strong grasp of DevOps principles, platform engineering, and infrastructure automation practices.
  • Experience with NOC/SOC operations or observability practices and driving operational resilience through system health metrics.
  • Experience with compliance, risk management, and operational excellence frameworks (e.g., ITIL, SOC2, ISO).
  • Strategic thinker with excellent leadership, communication, and stakeholder management skills.
  • Bachelor's or Master’s degree in Computer Science, Engineering, or a related field.

Benefits

  • Incentive compensation
  • Bonus
  • Restricted stock units
  • Benefits

Job title

Director, Reliability Engineering

Job type

Experience level

Lead

Salary

$228,800 - $343,200 per year

Degree requirement

Bachelor's Degree

Location requirements

Report this job

See something inaccurate? Let us know and we'll update the listing.

Report job