Site Reliability Engineer ensuring the availability and performance of services for autonomous vehicle operations. Collaborating on system design and automation in a robotics-focused environment.
Responsibilities
Design and implement highly scalable and reliable systems to support Zoox's autonomous vehicle platform.
Optimize system performance, reliability, and scalability.
Develop and maintain monitoring, alerting, and reporting systems to ensure proactive identification and resolution of issues.
Collaborate with software engineering teams to improve software architecture, deployment processes, and automation.
Conduct root cause analysis of production issues and implement corrective actions.
Implement disaster recovery and business continuity plans.
Requirements
5+ years of experience in site reliability engineering or a similar role, with a strong background in working with large-scale distributed systems.
Proven experience with cloud platforms such as AWS, GCP, or Azure.
Expertise in container orchestration technologies like Kubernetes.
Deep understanding of networking, storage, and database technologies.
Strong programming skills in languages such as Python, Go, C/C++, or Java.
Experience with infrastructure as code tools such as Terraform, Ansible, Salt, or CloudFormation.
Benefits
paid time off (e.g. sick leave, vacation, bereavement)
Senior Site Reliability Engineer focused on developing and maintaining OpenShift - based platform solutions at Red Hat. Responsible for software automation, onboarding new services, and maintaining service reliability.
Site Reliability Engineer at Red Hat designing Python and Golang solutions for managed services. Involves onboarding services, maintaining reliability, and fostering team excellence.
Development Operations Engineer supporting enterprise application development in Java and/or C. Ensuring high availability and operational excellence in modern payment solutions.
Site Reliability Engineer designing and supporting Kubernetes environments for F5's UDF platform. Collaborating with cross - functional teams to ensure reliability and operational excellence.
Senior Site Reliability Engineer ensuring operational excellence for multi - datacenter infrastructure at F5. Developing automation tools and APIs in Python and Go.
DevOps Engineer needed to develop a new OpenXDR solution on AWS, processing security data from multiple sources. Join a leading cybersecurity company in Slovakia.
DevOps Engineer at Castalia Systems automating and optimizing toolchain and CI/CD pipelines. Designing Azure infrastructure and ensuring collaboration between development and operations teams.
Senior DevOps Engineer managing Kubernetes and AI - driven workflows at Hex Trust. Supporting blockchain infrastructure while implementing best DevOps practices.