Site Reliability Engineer at Reward Gateway transforming operational workloads to an SRE approach. Collaborating with Product Engineering teams and advocating for observability and reliability.
Responsibilities
Integrating tightly with our Product Engineering teams
Following SRE practices and maintaining high standards of compliance
Implementing a new standard of observability utilising SLI/SLO/Error Budgets
Continually evolving our observability platforms for greater coverage
Using a code-first approach to build and changes to reduce TOIL
Advocating a strong focus on availability, reliability and uptime
Liaising and embedding with the Engineering teams for the constant evolution of metrics
Working towards planned roadmap goals
Actively taking part in the daily stand-ups and keeping sprints on track
Keeping up-to-date documentation in the JIRA & Confluence tools
Taking part in SRE Incident Management processes
Acting as a key Incident Commander within the Incident Management process
Taking part in SRE On Call
Ensuring a focus on cost efficiency for the platforms & services
Working with team members to foster collaboration and ongoing communication with stakeholders
Requirements
At least 5 years of experience in DevOps or SRE, with a keen interest in growing as a Site Reliability Engineer
Experience with AWS or other cloud providers
Enterprise experience in HA environments
Automation skills through Terraform, Python, Bash or similar
Wide-reaching SRE skills and a deep understanding of SRE practices
A strong understanding of SQL, PHP, Kubernetes, CI/CD
Observability product experience (e.g., Datadog)
Managing services using SLI/SLO & Error Budgets
Ability to work both independently and as part of a team
Ability to work under pressure and be highly reliable
Adaptability and flexibility to change in a fast-moving environment
An ability to learn new tools and processes quickly and impart that knowledge
Benefits
Screening interview with the Talent Partner and Head of SRE
Final interview with the Head of SRE and the Director of Infrastructure.
Be comfortable. Be you. At Reward Gateway, we want all our employees to feel comfortable bringing their passion, creativity and individuality to work. We value all cultures, backgrounds, and experiences, as we truly believe that diversity drives innovation. Express yourself, join our community and help us Make the World a Better Place to Work.
We hire BETTER. From perks to people, our BETTER approach to hiring earns us more trust, happier people and more world-class talent that helps us to make the world a better place to work.
SRE Linux/Unix System Administrator at Broadridge with strong Unix/Linux Bourne/Bash Scripting skills. Collaborating in a hybrid, fast - paced environment to manage critical systems.
Senior Site Reliability Engineer at Rootly embedding with teams to enhance service performance and reliability. Own CI/CD pipelines and drive capacity planning efforts in a fast - paced environment.
DevOps Engineer improving CI/CD pipelines and best practices for Datatonic's AI and data projects. Collaborate with clients to enhance infrastructure and drive innovation in tech.
Senior/Principal DevOps Engineer developing robust CI/CD pipelines for ClubWPT Gold at a hypergrowth startup. Collaborate globally to revolutionize online gaming experiences while maintaining high technical standards.
DevOps Engineer responsible for the health, performance, and automation of gaming platform services. Focused on CI/CD pipelines, infrastructure services, and application monitoring.
Senior Principal SRE at Northern Trust, ensuring reliability and performance of global systems. Leading observability and automation initiatives while collaborating across teams.
Site Reliability Engineer owning the internal developer platform reliability at e - conomic. Collaborating with a cross - functional DevEx team to enhance developer productivity in Copenhagen.