Senior Site Reliability Engineer developing and operating Azure Red Hat OpenShift managed cloud services. Collaborating with a global team to solve complex challenges in a blameless environment.
Responsibilities
Develop, scale, and operate Azure Red Hat OpenShift managed cloud services.
Contribute code to increase the scalability and reliability of the service.
Contribute software tests and participate in peer review to increase the quality of our codebase.
Help and develop peers’ capabilities through knowledge sharing, mentoring, and collaboration.
Participate in a regular on-call schedule, including occasional paid weekends and holidays.
Practice sustainable incident response and blameless postmortems.
Resolve customer issues escalated from the Red Hat Global Support team.
Work within a small agile team to develop and improve SRE software, support peers, plan and self-improve.
Requirements
Bachelor’s degree in Computer Science, Engineering, or related field; equivalent practical experience will also be considered.
Strong experience (5+ years) in at least one programming language (Golang, C, C++, Python, Java) and software life cycles
Hands-on experience with public cloud platforms (AWS, GCP, Azure). Preferably Azure
Direct experience with Kubernetes or OpenShift is a major plus.
4+ years desired debugging, optimizing code and automating routine tasks.
Experience with Docker based containers
Strong collaboration and problem-solving skills in distributed, team-based environments.
Experience troubleshooting as-a-service offerings (SaaS/PaaS) and working with complex distributed systems.
Working knowledge of Linux/Unix operating systems.
Proven ability to automate repetitive tasks and debug performance issues.
Ability to collaboratively troubleshoot and solve problems in a remote and distributed team setting.
Benefits
Red Hat relies on teamwork and openness for its success.
We learn from our failures in a blameless environment to support the continuous improvement of the team.
Professional development opportunities
Flexible work arrangements
Health insurance
Paid time off
Job title
Senior Site Reliability Engineer – OpenShift, Kubernetes, Azure, Golang, Linux
Sr. Site Reliability Engineer designing and automating robust technical infrastructure at Broadridge. Collaborating across teams for successful deployment and operational support of services.
Senior Fleet Reliability Engineer maintaining high fleet uptime for autonomous vehicle technology. Collaborating with technical teams to ensure peak operational performance in data collection efforts.
DevOps Lead at Leidos managing platform engineering, SRE, and application security functions. Driving operational excellence and ensuring scalability for federal government applications.
SRE Lead developing scalable cloud - native solutions for mission - critical systems supporting USAF. Managing teams, collaborating with cross - functional units, and ensuring high service reliability standards.
Junior DevOps / Platform Engineer at DieEnergiekoppler GmbH managing AWS/EKS platform operations. Collaborating with team members to improve platform functionalities and security compliance.
DevOps Engineer responsible for AWS infrastructures and backend development at Allguth GmbH. Engaging in greenfield projects with modern solutions in a collaborative team.
Cloud DevOps Specialist responsible for building scalable infrastructure solutions in AWS at SONDA. Focusing on automation, containerization, and data management in a collaborative environment.
DevOps Engineer maintaining and evolving deployment pipelines for Docebo’s AI - powered learning platform. Collaborating with cross - functional teams to ensure efficient software releases and infrastructure management.
DevOps Engineer optimizing CI/CD pipelines for Docebo, an AI - powered learning platform. Involves managing multi - tenant infrastructure using AWS, Docker, and Kubernetes.
DevOps Engineer maintaining and automating infrastructure and CI/CD processes for cybersecurity solutions by NordLayer. Collaborating with teams to ensure performance and scalability of cloud services.