Site Reliability Engineer responsible for building and maintaining reliable systems for cloud contact center software at Five9. Collaborating with multiple teams to ensure system reliability and performance.
Responsibilities
Design and implement comprehensive dashboards covering OS/platform level monitoring and application-level monitoring.
Establish and maintain SLIs, SLOs, and error budgets for the service.
Build alerting systems and performance monitoring to proactively identify and resolve issues.
Participate in on-call rotations and lead incident response efforts, including post-mortem analysis and remediation.
Maintain continuous integration and deployment pipelines.
Develop and maintain infrastructure using tools like Terraform, Ansible, or similar.
Automate system configuration and ensure consistency across environments.
Ensure security scanning systems are in place and review escalated vulnerabilities.
Monitor and optimize cloud resource usage and costs.
Requirements
3+ years managing large-scale production environments.
Comfortable with 24/7 on-call responsibilities and incident response.
Strong Linux/Unix system administration skills.
Understanding of TCP/IP, DNS, load balancing, and network security.
Experience with SQL and NoSQL databases in production environments.
Proficiency in at least two programming languages: Python, Shell, PHP, Java, or similar.
Experience with one of AWS, GCP, or Azure infrastructure and services.
Hands-on experience with Docker, Kubernetes, and container orchestration.
Experience with Prometheus, Grafana, ELK stack, or similar tools.
Proficiency with Terraform, CloudFormation, or similar tools.
Expert-level Git usage and collaborative development practices.
Experience defining and maintaining service level objectives.
Understanding of error budget concepts and implementation.
Track record of identifying and eliminating repetitive manual work.
Experience with performance testing and capacity management.
Bachelor's degree in Computer Science, Engineering, or equivalent experience.
Benefits
Health, dental, and vision coverage, beginning on the first day of employment.
Five9 covers 100% of the employee portion of the health, dental and vision coverage and shares a high portion of the dependent cost.
Short & Long-Term Disability, Basic Life Insurance, and a 401k saving plan with employer matching.
Access to an innovative mental health support platform that offers personalized care and resources.
Generous employee stock purchase plan.
Paid Time Off, Company paid holidays, paid volunteer hours and 12 weeks paid parental leave.
Graduate Reliability Engineer at GKN Aerospace enhancing operational excellence through data analysis and project participation within large structural assemblies.
Site Reliability Engineer at WRITER, ensuring 24/7 availability and performance of AI - powered workflows. Collaborating on scalable infrastructure solutions while impacting enterprise customer trust.
Engineer at Trading Technologies improving platform stability through coding and automation. Focus on building advanced monitoring tools for global trading operations.
Senior ML Ops/DevOps developing MLOps platform components at Capco Poland for financial digital transformation. Responsibilities include CI/CD, model deployment, monitoring, and team collaboration.
Senior DevOps Engineer at Verisk, focusing on AWS infrastructure and CI/CD pipeline automation. Ensuring high availability and security through collaboration with development and QA teams.
Senior DevOps & Infrastructure Engineer at IMAGO focusing on automation and infrastructure improvements. Building reliable infrastructure and leading CI/CD optimization in a dynamic environment.
DevOps Specialist creating and overseeing Azure hybrid cloud infrastructures for EVLO's battery energy storage solutions. Collaborating with teams to implement cutting - edge technologies in a dynamic environment.
Software Quality and Release Engineer developing and maintaining C++/Python software solutions for aerospace and defense industry. Collaborating on CI/CD automation and feedback documentation.
Senior DevOps Engineer building and managing big data platforms for clients in telecommunications and finance industries. Ensuring stability, scalability, and performance across cloud and on - premise environments.