About the role

  • Lead Site Reliability Engineer enhancing observability and reliability for Lloyds Banking Group's Public Cloud Platform. Collaborate with teams to embed SRE practices and drive automation and improvement.

Responsibilities

  • Lead, coach and develop a high‑performing SRE team, fostering autonomy, inclusion and continuous improvement.
  • Partner with Product Owners and Engineering Leads to embed reliability into roadmaps, backlogs and delivery decisions.
  • Apply SRE principles (SLIs, SLOs, error budgets) to ensure our services remain highly reliable, performant and scalable.
  • Drive improvements in observability—across metrics, logs, traces and events—ensuring services are observable by design.
  • Use Dynatrace as the primary observability platform for significant dashboards and customer‑centric alerting.
  • Own Infrastructure‑as‑Code and CI/CD‑based environments, implementing enhancements and responding to operational change.
  • Lead coordination of incident response and root cause analysis, supporting teams through major incidents, post‑incident reviews and prevention of recurrence.
  • Collaborate with multi‑disciplinary engineering teams to remove technical impediments, reduce toil and improve service operability.
  • Contribute hands‑on engineering where needed, validating technical decisions and guiding best practice.
  • Bring an approach of curiosity, experimentation, and first‑principles thinking to evolve our engineering culture.

Requirements

  • Proven experience applying SRE practices within Azure, GCP, or both.
  • Strong understanding of SLIs, SLOs, error budgets, and how to use these to guide product and engineering decisions.
  • Experience ensuring reliability of production services, including availability, performance and recoverability.
  • Hands‑on or leadership experience in incident and problem management, focused on reducing MTTR and avoiding repeat issues.
  • Background in software engineering or cloud engineering, with good understanding of modern SDLC practices.
  • Practical experience with DevOps, CI/CD and automation to improve service reliability.
  • Experience improving observability on complex, distributed systems.
  • Ability to use data to influence prioritisation and balance reliability with feature delivery.
  • Collaboration and communication skills, working effectively with product, engineering and platform teams.
  • Experience mentoring engineers and promoting inclusive, supportive team culture.

Benefits

  • A competitive salary and performance‑related bonus
  • 28 days holiday plus bank holidays
  • Generous pension contribution
  • Private medical insurance
  • Flexible benefits to suit your lifestyle
  • Hybrid working model and family‑friendly policies
  • Access to wellbeing support, training and career development

Job title

Lead Cloud Site Reliability Engineer

Job type

Experience level

Senior

Salary

£92,701 - £109,060 per year

Degree requirement

Bachelor's Degree

Location requirements

Report this job

See something inaccurate? Let us know and we'll update the listing.

Report job