Site Reliability Engineer at Lloyds Banking Group on the Financial Wellbeing Platform, establishing SRE functions and improving service reliability.
Responsibilities
Create documentation that details the establishment of the SRE function within the platform, supported by procedures that outline the guidelines to be followed through the incorporation of existing documentation.
Provide a framework in which to operate the cloud systems under
Lead the transition to cloud infrastructure and improve observability across systems
Identify and eliminate toil through automation
Manage incidents and post-mortems to improve service reliability
Mentor engineers and support team development
Collaborate with Product Owners to balance operational and development priorities
Requirements
Proven experience as a Site Reliability Engineer in cloud environments (GCP or AWS)
Understanding of SRE principles including SLIs, SLOs, error budgets, and toil reduction.
Strong scripting and infrastructure-as-code (IaaC) skills (Terraform, Harness, GitHub)
Demonstrable experience in the Agile ways of working that focuses on delivering customer value and applying the Agile mindset; familiarity with tools like Jira
Ability to lead incident response and drive service improvements
Strong collaboration and mentoring skills
Azure cloud environment experience, including connectivity, data buckets, secrets management, migration, and governance challenges.
Familiarity with containerisation and orchestration tools like Docker, Jenkins, GitHub, and Terraform
Secure programming practices and experience of secure file transfer protocols, risk remediation, and audit actions
Technical operations and service engineering
Benefits
A generous pension contribution of up to 15%
An annual performance-related bonus
Share schemes including free shares
Benefits you can adapt to your lifestyle, such as discounted shopping
30 days’ holiday, with bank holidays on top
A range of wellbeing initiatives and generous parental leave policies
DevOps Engineer designing and operating AWS infrastructure within industrial IoT environments. Working on systems that ensure security, resilience, and end - to - end observability.
Sr. Site Reliability Engineer (SRE) III providing technical solutions for the federal government. Collaborating in a high - performing team focused on reliability and application scalability.
Senior Linux System Engineer developing and maintaining Linux server infrastructure for Th. Geyer GmbH. Collaborating on ERP systems and CI/CD processes while ensuring system performance and security.
Platform Engineer leading the development of cloud application platforms for Allstate. Responsible for cloud infrastructure for ML experimentation and production deployments.
Cloud Platform Engineer (ML DevOps) developing and managing CI/CD pipelines for ML workflows in a leading insurance company. Collaborating with data scientists and ensuring infrastructure security and compliance.
DevOps Engineer developing and managing container platforms for client solutions at Booz Allen Hamilton. Utilizing cloud technologies to enhance capabilities and secure deployments.
Senior DevOps/Platform Engineer automating cloud infrastructure and optimizing delivery pipelines at S&P Global Mobility. Collaborating with teams to enhance product reliability and security.
DevOps Engineer responsible for maintaining and enhancing AWS/EKS platform for energy transition products. Ensuring platform stability, security compliance, and streamlined deployment processes.
Suspension Design and Release Engineer for Ford, impacting vehicle ride, handling, and NVH. Collaborating with cross - functional teams to deliver quality systems and components.