Lead Site Reliability Engineer overseeing SRE practices across Azure and GCP platforms. Driving reliability improvements and leading a team at Lloyds Banking Group.
Responsibilities
Lead a team of SREs (up to ~15) and create a culture of continuous improvement, learning, and engineering excellence.
Work closely with application teams during application migrations to the Cloud
Work closely with Product Owners and Engineering Leads to balance new feature delivery with reliability, performance and system health.
Use data, observability tooling and SRE principles to detect issues early, improve system performance, and reduce operational toil.
Lead and mature incident and problem management practices, ensuring strong root‑cause analysis, learning, and reduction of MTTF/MTTR.
Champion error budgets, SLOs, and reliability‑first thinking across your aligned Cloud Labs.
Influence platform direction and engineering standards, helping shape how we build resilient cloud services at scale.
Requirements
Strong cloud engineering background — ideally across GCP and Azure — with experience designing or operating large‑scale, resilient cloud platforms.
Deep understanding of observability tooling (metrics, logs, traces) and how to drive reliability improvements using data.
Hands‑on experience of modern SRE practices: SLOs / SLIs Error budgets Reducing toil through automation Production readiness and post‑mortem best practice
Experience leading engineering teams and fostering an inclusive, high‑performing culture
Ability to navigate complex stakeholder groups and communicate technical topics in a clear, accessible way.
Benefits
A generous pension contribution of up to 15%
An annual performance-related bonus
Share schemes including free shares
Benefits you can adapt to your lifestyle, such as discounted shopping
30 days’ holiday, with bank holidays on top
A range of wellbeing initiatives and generous parental leave policies
Reliability Engineering Specialist utilizing reliability tools and models to improve asset performance at Enbridge. Collaborating across teams to guide investment decisions for safe operations.
DevOps Engineer responsible for structuring and supporting cloud DevOps architecture in Brazil. Working strategically on automation and CI/CD practices with development teams in Pernambuco.
DevSecOps Software Engineer developing secure CI/CD pipelines for Boeing's military software systems. Collaborate with cross - functional teams and implement automation and security best practices.
DevOps Manager responsible for managing a team for multi - cloud solutions supporting the USAF Cloud One project. Focus on scalable cloud - native solutions and CI/CD practices.
DevOps Engineer responsible for managing Microsoft Intune operations at Bundesdruckerei GmbH. Focused on ensuring secure digital solutions for identity and data protection in Berlin.
Senior Site Reliability Engineer driving observability and reliability for business - critical systems at Incedo. Collaborating with engineering teams to enhance system resilience and performance.
DevSecOps Specialist securing the software development lifecycle at Vanguard. Collaborating with teams to improve application security tooling and processes, and provide development guidance.
Site Reliability Engineer automating infrastructure deployment for Scaleway's sovereign cloud products. Collaborating with product teams to enhance observability and reliability of the platform.
Reliability Engineer responsible for equipment reliability and safety using data - driven analysis for Wood in Aberdeen. Focus on proactive maintenance and operational efficiency.