Senior Data Reliability Engineer ensuring software reliability and quality across enterprise applications. Collaborating with teams to implement robust on-call processes and maintain data fidelity.
Responsibilities
Evangelise SRE & DRE across engineering
Lead the charge on building out a framework for data quality that will provide our customers with strong guarantees about the fidelity of our data as well support our marketing and revenue functions
SRE as a function define and own the on-call process:
Quickly establishing a strong working knowledge of our systems
Commanding incidents
Running mop-ups
Ensuring follow-up actions are completed to your schedule
Evaluating and improving our existing E2E on-call process
Take part in the on-call rotation, one week every 4–5 weeks (24x7x365 coverage)
Evaluate, manage and maintain our existing solutions for monitoring, alerting, paging, response, documentation
Report on uptime, availability, performance, etc across our product suite
Write post-mortems for both internal and external consumption
Represent our SRE & DRE function on sales calls with tier one enterprise financial institutions
Work with product, sales and customer service to define SLAs for different products and use cases
Work with internal product teams to define SLOs for internal consumption and measurement
Work with our engineering teams directly to embed DRE practices
Requirements
Proven experience at leveling up the quality and reliability of large datasets not just services and APIs
Experience leading site reliability for a high volume SaaS product
Supported distributed systems in AWS
The presence and empathy required to hold teams to account
Defined SLAs / SLOs both internal and client facing
Offered post mortems to enterprise clients (verbal and written)
Benefits
Hybrid working and the option to work from *almost* anywhere for up to 90 days per year
£500 Remote working budget to set up your home office space
$1,000 Learning & Development budget to use on anything (agreed with your manager) that contributes to your growth and development
Holidays: 25 days of annual leave + bank holidays
An extra day for your birthday
Enhanced parental leave: we provide eligible employees, regardless of gender or whether they become a parent by birth or adoption, 16 weeks fully-paid leave and leave.
Private Health Insurance - we use Vitality!
Full access to Spill Mental Health Support
Life Assurance: we hope you will never need this - but our cover is for 4 times your salary to your beneficiaries
Site Reliability Engineer ensuring reliability, automation, and observability across cloud infrastructures for Diligent. Leading initiatives to improve performance in fast - paced environments.
Senior DevOps Engineer leading DevOps design and implementation for gaming projects at Stillfront. Collaborating with international teams to enhance gaming infrastructure and reduce costs.
Mainframe DevOps Engineer at Kyndryl enhancing mainframe delivery practices and migrating SCM to Azure DevOps. Requires extensive Mainframe development experience and DevOps skills.
DevOps/MLOps Engineer designing, automating, and maintaining scalable infrastructure for federal client. Collaborating with software engineers and data scientists for resilient solutions.
Senior DevSecOps Engineer/Developer responsible for building Humana's software security platform. Modernizing architecture and managing CI/CD pipelines as part of core engineering team.
Senior Information Security Analyst focusing on DevSecOps for Unidas, a major mobility company in Brazil. Responsible for optimizing security governance processes and delivering secure software.
DevOps Manager overseeing scaling for Seekr's AI platform using Kubernetes, Terraform, and Ansible. Leading a hands - on team and collaborating with engineering for efficiency.
Back - End & DevOps Software Developer contributing to building digital products to change the world. Specializing in back - end development and command of DevOps ecosystem for robust infrastructure.