Lead Site Reliability Engineer enhancing observability and reliability for Lloyds Banking Group's Public Cloud Platform. Collaborate with teams to embed SRE practices and drive automation and improvement.
Responsibilities
Lead, coach and develop a high‑performing SRE team, fostering autonomy, inclusion and continuous improvement.
Partner with Product Owners and Engineering Leads to embed reliability into roadmaps, backlogs and delivery decisions.
Apply SRE principles (SLIs, SLOs, error budgets) to ensure our services remain highly reliable, performant and scalable.
Drive improvements in observability—across metrics, logs, traces and events—ensuring services are observable by design.
Use Dynatrace as the primary observability platform for significant dashboards and customer‑centric alerting.
Own Infrastructure‑as‑Code and CI/CD‑based environments, implementing enhancements and responding to operational change.
Lead coordination of incident response and root cause analysis, supporting teams through major incidents, post‑incident reviews and prevention of recurrence.
Collaborate with multi‑disciplinary engineering teams to remove technical impediments, reduce toil and improve service operability.
Contribute hands‑on engineering where needed, validating technical decisions and guiding best practice.
Bring an approach of curiosity, experimentation, and first‑principles thinking to evolve our engineering culture.
Requirements
Proven experience applying SRE practices within Azure, GCP, or both.
Strong understanding of SLIs, SLOs, error budgets, and how to use these to guide product and engineering decisions.
Experience ensuring reliability of production services, including availability, performance and recoverability.
Hands‑on or leadership experience in incident and problem management, focused on reducing MTTR and avoiding repeat issues.
Background in software engineering or cloud engineering, with good understanding of modern SDLC practices.
Practical experience with DevOps, CI/CD and automation to improve service reliability.
Experience improving observability on complex, distributed systems.
Ability to use data to influence prioritisation and balance reliability with feature delivery.
Collaboration and communication skills, working effectively with product, engineering and platform teams.
Experience mentoring engineers and promoting inclusive, supportive team culture.
Benefits
A competitive salary and performance‑related bonus
28 days holiday plus bank holidays
Generous pension contribution
Private medical insurance
Flexible benefits to suit your lifestyle
Hybrid working model and family‑friendly policies
Access to wellbeing support, training and career development
DevOps Engineer designing and operating AWS infrastructure within industrial IoT environments. Working on systems that ensure security, resilience, and end - to - end observability.
Sr. Site Reliability Engineer (SRE) III providing technical solutions for the federal government. Collaborating in a high - performing team focused on reliability and application scalability.
Senior Linux System Engineer developing and maintaining Linux server infrastructure for Th. Geyer GmbH. Collaborating on ERP systems and CI/CD processes while ensuring system performance and security.
Platform Engineer leading the development of cloud application platforms for Allstate. Responsible for cloud infrastructure for ML experimentation and production deployments.
Cloud Platform Engineer (ML DevOps) developing and managing CI/CD pipelines for ML workflows in a leading insurance company. Collaborating with data scientists and ensuring infrastructure security and compliance.
DevOps Engineer developing and managing container platforms for client solutions at Booz Allen Hamilton. Utilizing cloud technologies to enhance capabilities and secure deployments.
Senior DevOps/Platform Engineer automating cloud infrastructure and optimizing delivery pipelines at S&P Global Mobility. Collaborating with teams to enhance product reliability and security.
DevOps Engineer responsible for maintaining and enhancing AWS/EKS platform for energy transition products. Ensuring platform stability, security compliance, and streamlined deployment processes.
Suspension Design and Release Engineer for Ford, impacting vehicle ride, handling, and NVH. Collaborating with cross - functional teams to deliver quality systems and components.
DevOps Engineer at TeamViewer driving DevOps excellence by building CI/CD pipelines and managing Kubernetes. Collaborate within a diverse team to optimize digital processes with cloud infrastructure.