Hybrid Site Reliability Engineer

Posted 3 months ago

Apply now

About the role

  • Site Reliability Engineer ensuring reliability and scalability of TEG's global live entertainment platforms. Collaborating with teams to enhance system reliability and prevent outages across ticketing platforms.

Responsibilities

  • Proactively guard the health, availability, and performance of TEG's critical global production systems.
  • Engineer and automate robust monitoring and auto-healing solutions to proactively prevent outages and meet service level objectives (SLOs).
  • Drive Infrastructure-as-Code (IaC) principles for provisioning and deploying our highly available, scalable platforms.
  • Lead critical incident response efforts, ensuring rapid resolution and restoration of platform stability
  • Provide technical leadership during major incidents, focusing on swift problem analysis and effective communication to stakeholders.
  • Transform incidents into progress by conducting deep post-mortems and driving the implementation of strategic preventative measures across various teams.
  • Build and maintain high-performing, fault-tolerant distributed systems emphasizing resiliency and efficiency.
  • Elevate operational maturity by continuously improving processes, tooling, and efficiency across the department.
  • Champion operational excellence and shared responsibility, collaborating with development and other teams to improve processes and tools.
  • Innovate system design by evaluating and integrating new technologies to enhance reliability, scalability, and security.
  • Mentor and coach colleagues, elevating the overall reliability engineering capability and maturity of the Technology department

Requirements

  • Mastery of highly available, fault-tolerant AWS system design and management.
  • Strong foundation in AWS networking (VPC, Route 53) and security best practices.
  • Proficiency in key scripting languages (Python, Bash, PowerShell) for automation.
  • Proven ability to perform effectively under pressure, managing high-volume tasks and meeting tight deadlines
  • Minimum of 3 years of prior SRE or DevOps experience.
  • Expert knowledge of fundamental infrastructure concepts (Networking, Containerisation, Virtualisation, DNS)
  • Working familiarity with key CI/CD and Infrastructure-as-Code tools (e.g., Terraform, Ansible, Jenkins)
  • Excellent verbal and written communication skills

Benefits

  • Complimentary event tickets
  • Birthday and volunteering leave
  • Wellbeing discounts & flu vaccinations
  • Paid parental leave & free employee support (EAP)
  • Global rewards and recognition
  • Learning, development & career pathways
  • A diverse, inclusive, and passionate team

Job title

Site Reliability Engineer

Job type

Experience level

Mid levelSenior

Salary

Not specified

Degree requirement

Bachelor's Degree

Location requirements

Report this job

See something inaccurate? Let us know and we'll update the listing.

Report job