Senior Site Reliability Engineer managing GCP and SRE technologies at Lloyds Banking Group. Collaborating with engineering teams to enhance service reliability and troubleshoot issues in critical banking services.
Responsibilities
Delivering against GCP and SRE Public Cloud technology roadmaps
Collaborating with engineering teams to release and evolve enterprise-class solutions
Managing operations of critical banking services, including 24x7 coverage via on-call rota
Enhancing resiliency and reliability of customer-facing services
Troubleshooting and diagnosing issues with an engineering mindset
Building tooling to support service reliability and code quality
Working across multiple labs and signature projects in the Digital space
Leading Chaos Engineering initiatives to stress test services
Requirements
Strong understanding of SRE & DevOps, including experience of Infrastructure as Code and CI/CD pipelines using tools such as Azure DevOps, Terraform, or Jenkins.
Proficiency with Incident Management software (ie ServiceNow)
Proficient in Dynatrace, Splunk, SRE GCP & Cloud Observability.
Demonstrable experience in using orchestrations tools such as Harness.
Knowledge of GCP and Azure cloud platforms.
Experience in identifying toil and design automated solutions to remove it.
Reliability & Performance Management: Design, implement and own the SLOs for critical platform services. Monitor system health, manage error budgets, and drive improvements in Mean Time to Failure (MTTF) and Mean Time to Recovery (MTTR).
Incident & Problem Management: Lead incident response and post-mortem analysis. Ensure root cause identification and long-term remediation strategies are implemented.
Platform Advocacy & Collaboration: Champion SRE principles across Segments & Propositions Lab. Collaborate with Lab Product Owners, Engineering Leads, and application teams to embed reliability into design and delivery.
Technical Leadership: Provide technical oversight across cloud infrastructure, CI/CD pipelines, observability tooling, and automation frameworks. Guide engineers in adopting scalable and resilient solutions.
Continuous Improvement: Identify and implement improvements in deployment, monitoring, and alerting processes. Drive automation to reduce toil and improve operational efficiency.
Governance & Compliance: Ensure platform services adhere to internal risk, security, and compliance standards. Support audit and regulatory reporting requirements.
Benefits
A generous pension contribution of up to 15%
An annual performance-related bonus
Share schemes including free shares
Benefits you can adapt to your lifestyle, such as discounted shopping
30 days’ holiday, with bank holidays on top
A range of wellbeing initiatives and generous parental leave policies
DevOps Developer managing cloud infrastructure and CI/CD pipelines for Volkswagen Group Services. Collaborating with teams to ensure stable and efficient software deployments in a hybrid work environment.
Analista Devops Pleno at Finnet managing cloud and infrastructure projects for client solutions. Involves architecture design, systems management, and team collaboration.
DevOps - Cloud Infrastructure Specialist designing, building and maintaining Azure and AWS infrastructure for Morgan Stanley. Requires strong expertise in cloud technologies and hands - on experience with Terraform and Kubernetes.
Senior Site Reliability Engineer leveraging modern Kubernetes and cloud - native technologies for high reliability and scalability. Solving platform challenges while contributing to improved managed services.
Lead DevOps Engineer at Incogni evolving infrastructure during monolith - to - microservices transitions. Building self - service platforms and ensuring observability in a fast - growing consumer privacy - tech product.
Senior Site Reliability Engineer maintaining reliability and user experience of AI services for Woven by Toyota. Collaborating with engineering teams to ensure service availability and performance.
GitHub Enterprise Specialist managing KONE's GitHub ecosystem, ensuring secure and scalable workflows. Collaborating with teams to enhance developer productivity through AI - powered capabilities.
DevOps Specialist supporting the engineering and operational enablement of next - gen data center platforms at KONE. Involves Infrastructure - as - Code deployments and daily DevOps workflows.
Senior Software Engineer responsible for designing microservices and enhancing LLM performance for Fortanix's Generative AI platform. Collaborating with data science and ML Infrastructure teams for security and optimization.
Reliability Engineer ensuring quality and reliability of products. Conducting various verification tests in a well - equipped laboratory in Mierzyn, Poland.