Senior DevOps Engineer modernising environment landscapes through IaC and SRE principles while collaborating across teams for a global engineering firm.
Responsibilities
Design, implement, and maintain Infrastructure-as-Code (IaC) for consistent and repeatable provisioning of development and test environments, primarily using Terraform.
Lead technical investigations and act as the escalation point for environment-related incidents, outages, configuration issues, and service degradation across non-production platforms.
Collaborate closely with development, QA, and platform teams to deliver scalable, automated, and resilient environment solutions.
Analyse and optimise performance of non-production systems, identifying and resolving environment bottlenecks.
Maintain environment fidelity and integrity through controlled configuration management, patching, visioning, and rollback strategies.
Support release and deployment planning, ensuring environment readiness, dependency alignment, and overall stability during release cycles.
Implement and maintain monitoring, observability, and logging frameworks, with a strong emphasis on Dynatrace and CNCF-aligned tooling.
Define meaningful, proactive alerting policies that reduce noise, highlight real issues, and accelerate response times.
Apply SRE principles such as SLIs/SLOs, automated remediation, and continuous feedback loops to improve environment uptime and reliability.
Mentor junior engineers, share best practices, and contribute to knowledge bases, documentation, and process maturity.
Support Disaster Recovery (DR) testing, validating end‑to‑end system recovery, integration behaviour, and service resilience during failover scenarios.
Champion automation and operational excellence, reducing manual effort and increasing the team’s ability to deliver environments at scale.
Requirements
Strong knowledge of VMware, vSphere, virtualisation platforms, and on‑premise infrastructure management.
Expertise in Terraform and experience defining an organisation-wide IaC strategy.
Proficient in scripting and automation (Python, Bash, PowerShell).
Strong communication, documentation, and collaborative problem-solving skills.
Hands-on experience with on-premise infrastructure, virtualisation, containerisation, and exposure to cloud platforms such as AWS or Azure.
Understanding of performance engineering, including load testing frameworks and performance analysis.
Experience supporting QA, development, and release management teams with reliable, well-controlled non-prod environments.
Ability to troubleshoot complex multi‑layered issues across infrastructure, networks, applications, middleware, and databases.
Familiarity with SRE principles and modern operational practices such as postmortems, runbooks, SLIs/SLOs, error budgets, and automated recovery patterns.
Experience with APM and observability tooling, ideally Dynatrace, including metrics, traces, dashboards, and alerting configuration.
Benefits
Collaborative working environment – we stand shoulder to shoulder with our clients and our peers through good times and challenges
We empower all passionate technology loving professionals by allowing them to expand their skills and take part in inspiring projects
Expleo Academy - enables you to acquire and develop the right skills by delivering a suite of accredited training courses
Competitive company benefits
Always working as one team, our people are not afraid to think big and challenge the status quo
As a Disability Confident Committed Employer we have committed to: Ensure our recruitment process is inclusive and accessible Communicating and promoting vacancies Offering an interview to disabled people who meet the minimum criteria for the job Anticipating and providing reasonable adjustments as required Supporting any existing employee who acquires a disability or long term health condition, enabling them to stay in work at least one activity that will make a difference for disabled people
“We are an equal opportunities employer and welcome applications from all suitably qualified persons regardless of their race, sex, disability, religion/belief, sexual orientation or age”. We treat everyone fairly and equitably across the organisation, including providing any additional support and adjustments needed for everyone to thrive
Vehicle Reliability Engineer identifying and resolving issues for Waabi, a leader in Physical AI for autonomous transportation. Collaborating across teams to enhance vehicle reliability and performance.
DevOps Engineer responsible for maintaining cloud infrastructure at the leading crypto brand in the Philippines. Collaborating with legal and compliance teams to ensure requirements are met while monitoring and troubleshooting systems.
Tech Lead SRE managing technology talent and connecting them to impactful projects in a healthy work environment. Seeking professionals with a solid technical foundation and product mindset.
DevOps Specialist at WayCarbon architecting and managing infrastructure for web applications. Focused on supporting a sustainable Net - Zero economy with a diverse tech team.
Intern assisting with cloud infrastructure automation for educational technology company UOL EdTech. Collaborating with teams on database operations and cloud deployment tasks.
IT Infrastructure Coordinator leading teams in DevOps, Azure, and Office 365 for Grupo Iter's IT infrastructure management. Ensuring operational efficiency and technology evolution.
DevOps / Platform Engineer managing AI infrastructure and deployment pipelines for Simply.TV. Collaborating in a flat AI team structure to optimize platforms and performance improvements.
Site Reliability Engineer at Reward Gateway transforming operational workloads to an SRE approach. Collaborating with Product Engineering teams and advocating for observability and reliability.
DevOps Engineer focusing on hybrid and multi - cloud networking, Infrastructure as Code at Ness Digital Engineering. Collaborating with senior architects and engineers to improve scalable cloud environments.