Site Reliability Engineer ensuring reliability and availability of critical gaming platforms at Flutter Entertainment. Collaborating with teams to implement monitoring and incident response procedures.
Responsibilities
Ensure the reliability, availability, and performance of critical gaming and betting platforms across global operations
Maintain 24/7/365 service availability for millions of customers worldwide
Implement automation, monitoring, and incident response procedures
Design and implement monitoring, alerting, and observability solutions using tools such as Grafana, Splunk & CloudWatch
Conduct capacity planning and performance optimization
Establish and maintain Service Level Objectives (SLOs) and Service Level Indicators (SLIs)
Support ProdOps and Service Management teams during P1/P2 incident response
Collaborate on post-incident reviews and contribute technical insights
Assist in developing and maintaining comprehensive runbooks and incident response procedures
Design, deploy, and maintain Grafana dashboards for real-time system visibility
Create custom Grafana panels and dashboards for business metrics
Requirements
Advanced experience with AWS, Azure, or Google Cloud Platform services and architecture
Proficiency with Docker and Kubernetes for container orchestration and management
Strong scripting abilities in Python, Go, Bash, or PowerShell; familiarity with Java or .NET advantageous
Hands-on experience with Prometheus, Grafana, ELK stack, or similar monitoring solutions
Proficiency with Jenkins, GitLab CI, Azure DevOps, or similar continuous integration tools
Working knowledge of SQL databases (PostgreSQL, MySQL) and NoSQL solutions
Understanding of load balancers, CDNs, DNS, and network security principles
DevOps Engineer at Aifano GmbH developing AI - driven enterprise solutions. Involves CI/CD pipeline management, cloud infrastructure setup, and collaboration with development teams.
Lead Infrastructure Engineer at U.S. Bank responsible for managing and configuring cloud systems and infrastructure technologies while promoting automation practices.
Site Reliability Engineer focused on automation and optimization of software application performance. Collaborating with cross - functional teams to enhance scalability and reliability in Chennai/Bangalore.
Site Reliability Engineer ensuring the availability and performance of services for autonomous vehicle operations. Collaborating on system design and automation in a robotics - focused environment.
DevOps Engineer automating continuous deployment and monitoring on AWS for Crown Equipment Corporation. Bridging developers, IT, and external providers for operational efficiency.
Senior DevOps Engineer responsible for leading CI/CD pipeline design and optimization. Collaborating with teams to drive DevOps maturity across the enterprise while managing infrastructure automation.
Cloud Operations Engineer ensuring reliable performance of cloud systems at 2Innovate. Focused on automation, incident management, cloud security, and infrastructure monitoring in cloud environments.
AWS DevOps Engineer responsible for delivering scalable digital experiences for EXL's MarTech ecosystem. Engaging in development, maintenance, and collaboration across stakeholders and services.
Senior Site Reliability Engineer managing critical infrastructure at Hornetsecurity. Collaborating with product teams to ensure performance and reliability across services.