Site Reliability Engineer responsible for building and maintaining reliable systems for cloud contact center software at Five9. Collaborating with multiple teams to ensure system reliability and performance.
Responsibilities
Design and implement comprehensive dashboards covering OS/platform level monitoring and application-level monitoring.
Establish and maintain SLIs, SLOs, and error budgets for the service.
Build alerting systems and performance monitoring to proactively identify and resolve issues.
Participate in on-call rotations and lead incident response efforts, including post-mortem analysis and remediation.
Maintain continuous integration and deployment pipelines.
Develop and maintain infrastructure using tools like Terraform, Ansible, or similar.
Automate system configuration and ensure consistency across environments.
Ensure security scanning systems are in place and review escalated vulnerabilities.
Monitor and optimize cloud resource usage and costs.
Requirements
3+ years managing large-scale production environments.
Comfortable with 24/7 on-call responsibilities and incident response.
Strong Linux/Unix system administration skills.
Understanding of TCP/IP, DNS, load balancing, and network security.
Experience with SQL and NoSQL databases in production environments.
Proficiency in at least two programming languages: Python, Shell, PHP, Java, or similar.
Experience with one of AWS, GCP, or Azure infrastructure and services.
Hands-on experience with Docker, Kubernetes, and container orchestration.
Experience with Prometheus, Grafana, ELK stack, or similar tools.
Proficiency with Terraform, CloudFormation, or similar tools.
Expert-level Git usage and collaborative development practices.
Experience defining and maintaining service level objectives.
Understanding of error budget concepts and implementation.
Track record of identifying and eliminating repetitive manual work.
Experience with performance testing and capacity management.
Bachelor's degree in Computer Science, Engineering, or equivalent experience.
Benefits
Health, dental, and vision coverage, beginning on the first day of employment.
Five9 covers 100% of the employee portion of the health, dental and vision coverage and shares a high portion of the dependent cost.
Short & Long-Term Disability, Basic Life Insurance, and a 401k saving plan with employer matching.
Access to an innovative mental health support platform that offers personalized care and resources.
Generous employee stock purchase plan.
Paid Time Off, Company paid holidays, paid volunteer hours and 12 weeks paid parental leave.
Join a Data Engineering Team as a Senior DevOps to support multiple Data & AI initiatives. Utilize cloud technologies and enhance data pipelines in a collaborative environment.
Principal Site Reliability Engineer at Early Warning designing performance and resiliency patterns for applications and infrastructure. Collaborating with development teams to improve systems and data integrity.
DevOps Engineer contributing to CI/CD setup and Azure services management. Collaborates with teams to ensure efficient project delivery in a hybrid environment.
IT DevOps Specialist at BMW responsible for analyzing requirements and implementing software solutions in AWS cloud environments. Collaborating internationally within agile teams for digital transformation projects.
DevOps Engineer at Vistra designing, implementing, and maintaining robust CI/CD pipelines and cloud infrastructure. Enabling software delivery across multiple technology stacks with a focus on AWS.
Manage complex customer rollouts and initial system deployments at Talex.ai. Bridging technical development with real - world application in robotics and AI systems.
Cloud Operations Engineer designing and implementing highly reliable cloud solutions. Leading cloud infrastructure initiatives for production operations and customer success in a growing team.
Quality Engineer supporting new product launches and reliability testing for SSD at Micron in Malaysia. Responsible for coordinating test activities and conducting failure analysis.
Reliability Engineer ensuring operational readiness of data centers at Rowan Digital Infrastructure. Overseeing commissioning, operational standards, and transitioning facilities into live operations.
Manager of Mechanical Engineering ensuring high - availability mechanical systems in data centers. Collaborating on lifecycle management and performance evaluation across missions - critical facilities in a hybrid role.