Site Reliability Engineer ensuring the availability and performance of services for autonomous vehicle operations. Collaborating on system design and automation in a robotics-focused environment.
Responsibilities
Design and implement highly scalable and reliable systems to support Zoox's autonomous vehicle platform.
Optimize system performance, reliability, and scalability.
Develop and maintain monitoring, alerting, and reporting systems to ensure proactive identification and resolution of issues.
Collaborate with software engineering teams to improve software architecture, deployment processes, and automation.
Conduct root cause analysis of production issues and implement corrective actions.
Implement disaster recovery and business continuity plans.
Requirements
5+ years of experience in site reliability engineering or a similar role, with a strong background in working with large-scale distributed systems.
Proven experience with cloud platforms such as AWS, GCP, or Azure.
Expertise in container orchestration technologies like Kubernetes.
Deep understanding of networking, storage, and database technologies.
Strong programming skills in languages such as Python, Go, C/C++, or Java.
Experience with infrastructure as code tools such as Terraform, Ansible, Salt, or CloudFormation.
Benefits
paid time off (e.g. sick leave, vacation, bereavement)
DevOps Engineer automating continuous deployment and monitoring on AWS for Crown Equipment Corporation. Bridging developers, IT, and external providers for operational efficiency.
Senior DevOps Engineer responsible for leading CI/CD pipeline design and optimization. Collaborating with teams to drive DevOps maturity across the enterprise while managing infrastructure automation.
Cloud Operations Engineer ensuring reliable performance of cloud systems at 2Innovate. Focused on automation, incident management, cloud security, and infrastructure monitoring in cloud environments.
AWS DevOps Engineer responsible for delivering scalable digital experiences for EXL's MarTech ecosystem. Engaging in development, maintenance, and collaboration across stakeholders and services.
Senior Site Reliability Engineer managing critical infrastructure at Hornetsecurity. Collaborating with product teams to ensure performance and reliability across services.
Site Reliability Engineer enhancing platform reliability for AI workflows at WRITER. Overseeing automated solutions and cloud infrastructure supporting high - trafficked AI systems.
Site reliability engineer ensuring 24/7 availability of AI - powered workflows at WRITER. Developing and automating robust platforms for high - traffic AI demands.
Site Reliability Engineer maintaining cloud infrastructure for Tricentis SaaS Products. Collaborating closely with engineers, focusing on observability and performance.