Site Reliability Engineer responsible for building and maintaining reliable systems for cloud contact center software at Five9. Collaborating with multiple teams to ensure system reliability and performance.
Responsibilities
Design and implement comprehensive dashboards covering OS/platform level monitoring and application-level monitoring.
Establish and maintain SLIs, SLOs, and error budgets for the service.
Build alerting systems and performance monitoring to proactively identify and resolve issues.
Participate in on-call rotations and lead incident response efforts, including post-mortem analysis and remediation.
Maintain continuous integration and deployment pipelines.
Develop and maintain infrastructure using tools like Terraform, Ansible, or similar.
Automate system configuration and ensure consistency across environments.
Ensure security scanning systems are in place and review escalated vulnerabilities.
Monitor and optimize cloud resource usage and costs.
Requirements
3+ years managing large-scale production environments.
Comfortable with 24/7 on-call responsibilities and incident response.
Strong Linux/Unix system administration skills.
Understanding of TCP/IP, DNS, load balancing, and network security.
Experience with SQL and NoSQL databases in production environments.
Proficiency in at least two programming languages: Python, Shell, PHP, Java, or similar.
Experience with one of AWS, GCP, or Azure infrastructure and services.
Hands-on experience with Docker, Kubernetes, and container orchestration.
Experience with Prometheus, Grafana, ELK stack, or similar tools.
Proficiency with Terraform, CloudFormation, or similar tools.
Expert-level Git usage and collaborative development practices.
Experience defining and maintaining service level objectives.
Understanding of error budget concepts and implementation.
Track record of identifying and eliminating repetitive manual work.
Experience with performance testing and capacity management.
Bachelor's degree in Computer Science, Engineering, or equivalent experience.
Benefits
Health, dental, and vision coverage, beginning on the first day of employment.
Five9 covers 100% of the employee portion of the health, dental and vision coverage and shares a high portion of the dependent cost.
Short & Long-Term Disability, Basic Life Insurance, and a 401k saving plan with employer matching.
Access to an innovative mental health support platform that offers personalized care and resources.
Generous employee stock purchase plan.
Paid Time Off, Company paid holidays, paid volunteer hours and 12 weeks paid parental leave.
Sr. Site Reliability Engineer designing and automating robust technical infrastructure at Broadridge. Collaborating across teams for successful deployment and operational support of services.
Senior Fleet Reliability Engineer maintaining high fleet uptime for autonomous vehicle technology. Collaborating with technical teams to ensure peak operational performance in data collection efforts.
DevOps Lead at Leidos managing platform engineering, SRE, and application security functions. Driving operational excellence and ensuring scalability for federal government applications.
SRE Lead developing scalable cloud - native solutions for mission - critical systems supporting USAF. Managing teams, collaborating with cross - functional units, and ensuring high service reliability standards.
Junior DevOps / Platform Engineer at DieEnergiekoppler GmbH managing AWS/EKS platform operations. Collaborating with team members to improve platform functionalities and security compliance.
DevOps Engineer responsible for AWS infrastructures and backend development at Allguth GmbH. Engaging in greenfield projects with modern solutions in a collaborative team.
Cloud DevOps Specialist responsible for building scalable infrastructure solutions in AWS at SONDA. Focusing on automation, containerization, and data management in a collaborative environment.
DevOps Engineer maintaining and evolving deployment pipelines for Docebo’s AI - powered learning platform. Collaborating with cross - functional teams to ensure efficient software releases and infrastructure management.
DevOps Engineer optimizing CI/CD pipelines for Docebo, an AI - powered learning platform. Involves managing multi - tenant infrastructure using AWS, Docker, and Kubernetes.
DevOps Engineer maintaining and automating infrastructure and CI/CD processes for cybersecurity solutions by NordLayer. Collaborating with teams to ensure performance and scalability of cloud services.