Senior Data Reliability Engineer ensuring software reliability and quality across enterprise applications. Collaborating with teams to implement robust on-call processes and maintain data fidelity.
Responsibilities
Evangelise SRE & DRE across engineering
Lead the charge on building out a framework for data quality that will provide our customers with strong guarantees about the fidelity of our data as well support our marketing and revenue functions
SRE as a function define and own the on-call process:
Quickly establishing a strong working knowledge of our systems
Commanding incidents
Running mop-ups
Ensuring follow-up actions are completed to your schedule
Evaluating and improving our existing E2E on-call process
Take part in the on-call rotation, one week every 4–5 weeks (24x7x365 coverage)
Evaluate, manage and maintain our existing solutions for monitoring, alerting, paging, response, documentation
Report on uptime, availability, performance, etc across our product suite
Write post-mortems for both internal and external consumption
Represent our SRE & DRE function on sales calls with tier one enterprise financial institutions
Work with product, sales and customer service to define SLAs for different products and use cases
Work with internal product teams to define SLOs for internal consumption and measurement
Work with our engineering teams directly to embed DRE practices
Requirements
Proven experience at leveling up the quality and reliability of large datasets not just services and APIs
Experience leading site reliability for a high volume SaaS product
Supported distributed systems in AWS
The presence and empathy required to hold teams to account
Defined SLAs / SLOs both internal and client facing
Offered post mortems to enterprise clients (verbal and written)
Benefits
Hybrid working and the option to work from *almost* anywhere for up to 90 days per year
£500 Remote working budget to set up your home office space
$1,000 Learning & Development budget to use on anything (agreed with your manager) that contributes to your growth and development
Holidays: 25 days of annual leave + bank holidays
An extra day for your birthday
Enhanced parental leave: we provide eligible employees, regardless of gender or whether they become a parent by birth or adoption, 16 weeks fully-paid leave and leave.
Private Health Insurance - we use Vitality!
Full access to Spill Mental Health Support
Life Assurance: we hope you will never need this - but our cover is for 4 times your salary to your beneficiaries
DevOps Engineer automating and optimizing software development lifecycle processes at COSMOTE Global Solutions. Designing and managing containerized infrastructure on Azure and implementing CI/CD.
Senior DevOps Engineer at Elliptic shaping DevOps culture and driving automation across engineering teams, providing expertise and leadership across the stack.
Infrastructure & Cloud Operations Engineer managing AWS and hybrid environments for CV - Library. Hands - on role focused on reliability, automation, and operational excellence.
Site Reliability Engineer building reliable and scalable infrastructure for fintech startup Pave Bank. Collaborating with internal teams to enhance banking platform performance and reliability.
Lead DevOps Engineer managing DevOps projects for high - quality strategy games at Twin Harbour Interactive. Collaborating with teams to optimize production systems and improve development workflows.
Software Engineer contributing to the observability team's development of visibility systems. Implementing a high - performance telemetry platform and supporting AI tools for engineering teams.
Site Reliability Engineer working on the post - RPA Agentic Automation Platform for enterprises. Responsible for developing scalable systems and improving operational reliability.
Senior DevOps Platform Engineer at Humana designing secure cloud infrastructure for healthcare technology. Responsible for CI/CD pipelines and compliance in regulated environments.
Cloud Operations Engineer handling advanced troubleshooting and system administration for secure cloud environments. Operating compliance controlled cloud environments and maintaining system stability.