Principal DevOps Engineer at KingMakers focusing on coding and infrastructure within product squads. Leading technical improvements in observability, reliability, and performance across platforms.
Responsibilities
Provide technical leadership across platform and product engineering, setting the direction for observability, reliability, security, and platform architecture.
Contribute hands-on to critical product and platform codebases (primarily .NET and React), focusing on high-leverage improvements rather than feature throughput.
Define and drive the organisation-wide observability strategy, including standards and best practices for OpenTelemetry (OTEL), telemetry pipelines, and signal quality.
Partner with senior engineers and engineering leadership to design scalable, resilient infrastructure using Terraform, AKS, and Azure.
Establish and champion security-by-default practices across code, infrastructure, and CI/CD — influencing architecture, not just implementation.
Lead the definition of SLIs, SLOs, and alerting strategies, ensuring reliability is measurable, actionable, and aligned with business outcomes.
Set standards for monitoring and dashboards in Grafana, ensuring clarity, signal quality, and consistency across teams.
Design and build shared tooling, platforms, and libraries that materially improve developer productivity and reduce cognitive load.
Act as a trusted technical advisor and bridge between product engineering, platform engineering, and leadership — shaping decisions through influence rather than authority.
Mentor senior engineers and help raise the overall engineering bar through design reviews, architectural guidance, and pragmatic coaching.
Requirements
Deep, production-grade programming experience in at least one major language — C#/.NET strongly preferred; Go, Python, or similar also valued.
A track record of designing, reviewing, and evolving complex production systems at scale.
Extensive experience with cloud platforms (Azure preferred), infrastructure-as-code (Terraform), and Kubernetes-based platforms (AKS).
Strong understanding of CI/CD systems and deployment strategies, ideally using Azure DevOps.
Excellent grasp of networking fundamentals (TCP, HTTP, headers, tracing, and real-world debugging).
Proven experience designing secure systems, including identity, access management, secrets handling, and secure pipelines.
Experience operating in medium-to-large engineering organisations where reliability, scale, and developer experience matter.
The ability to operate autonomously, identify high-impact problems, and drive solutions end-to-end.
Clear communication skills — able to explain complex technical trade-offs to engineers and non-engineers alike.
**Nice to have**
Deep hands-on experience with observability platforms (Prometheus, Grafana, Elastic, OpenTelemetry).
Experience designing or operating large-scale telemetry, monitoring, or logging systems.
Strong database knowledge (SQL Server, NoSQL, performance and reliability considerations).
Experience with Nginx, ingress controllers, or load balancing technologies.
A security-first mindset balanced with pragmatism, performance, and developer velocity.
Experience working across distributed or remote teams and influencing without direct authority.
Benefits
Health Support: A monthly allowance to invest in your health and wellbeing
Comprehensive Insurance: Extra protection and peace of mind for you and your loved ones
Future Planning: Employer contributions towards your long-term financial security
Performance Bonus: Discretionary rewards that celebrate your impact
Convenience at Work: Free reserved parking at the office
Hybrid Work: A flexible arrangement with 2 days in the office and 3 days remote each week
Mainframe DevOps role focusing on data management and service delivery for Commerzbank. Join a customer - centric team dedicated to a data - driven enterprise.
Senior DevOps Engineer working on CI/CD setup, deployment security, and database maintenance for Bundesdruckerei GmbH. Collaborating on innovative secure digital solutions in Berlin.
Site Reliability Engineer operating on Confluent Cloud for government clients. Ensuring system reliability and compliance with FedRAMP standards in a hybrid working model.
Site Reliability Engineer at Plenful maintaining system performance and reliability. Collaborating with teams to improve operations and ensure system stability in a fast - paced environment.
Senior Site Reliability Engineer at LexisNexis working on cloud data applications and microservices. Collaborating within teams to enhance system reliability and automate recovery processes.
Reliability & Maintenance Engineer for Reckitt focusing on maintenance strategies and equipment optimization. Involves collaboration across production, quality, and maintenance teams to minimize downtime and extend asset life.
Associate SRE ensuring high availability and minimal disruption across business - critical systems through monitoring and automation. Collaborating with teams to boost workflow efficiency in a sustainable energy company.
DevOps Engineer transforming infrastructure to support GovTech solutions. Collaborating with development and test teams on projects, focusing on Infrastructure as Code and CI/CD pipelines.
DevOps Consultant at Opencast focused on building scalable systems for high - impact projects. Requires SC Clearance and involves collaboration with clients.