Senior Site Reliability Engineer focused on developing and maintaining OpenShift-based platform solutions at Red Hat. Responsible for software automation, onboarding new services, and maintaining service reliability.
Responsibilities
Design, write, and maintain software (primarily in Python and Golang) that automates the deployment, monitoring, and maintenance of Red Hat managed services.
Onboarding of new services onto our OpenShift-based platform: adhering to cloud-native design principles & best practices to ensure reliability, scalability, and security; contribute to documents, like standard operating procedures (SOPs) and playbooks, that assist in issue resolution and new-service onboarding.
Proactively utilize AI-assisted development tools (e.g., GitHub Copilot, Cursor, Claude Code) for code generation, auto-completion, and intelligent suggestions to accelerate development cycles and enhance code quality.
Participate in an Agile Scrum team that scopes, prioritizes, and allocates work items.
Participate in an on-call rotation that is responsible for responding to service incidents.
Requirements
5+ years of relevant work experience
Background writing object-oriented automation software in Python, experience with Golang is only plus
Background administering production cloud-native services, preferably containerized and deployed via a container-orchestration system like Kubernetes or OpenShift
Experience diagnosing service failures and carrying out incident response procedures
Familiarity with Linux operating system and its configuration
Ability to effectively work in a globally distributed team
Understanding of computer networking and protocols, including TCP/IP and DNS
Understanding of computer security and cryptography basics, including certificates, TLS, and credential-storage systems like Vault is a plus
Familiarity with CI/CD pipeline concepts and systems, like Jenkins and Tekton/Argo is a plus
Familiarity with observability tools like Prometheus and Grafana, and how to define metrics that can be used to measure service health and reliability is a plus
Site Reliability Engineer at Red Hat designing Python and Golang solutions for managed services. Involves onboarding services, maintaining reliability, and fostering team excellence.
Development Operations Engineer supporting enterprise application development in Java and/or C. Ensuring high availability and operational excellence in modern payment solutions.
Site Reliability Engineer designing and supporting Kubernetes environments for F5's UDF platform. Collaborating with cross - functional teams to ensure reliability and operational excellence.
Senior Site Reliability Engineer ensuring operational excellence for multi - datacenter infrastructure at F5. Developing automation tools and APIs in Python and Go.
DevOps Engineer needed to develop a new OpenXDR solution on AWS, processing security data from multiple sources. Join a leading cybersecurity company in Slovakia.
DevOps Engineer at Castalia Systems automating and optimizing toolchain and CI/CD pipelines. Designing Azure infrastructure and ensuring collaboration between development and operations teams.
Senior DevOps Engineer managing Kubernetes and AI - driven workflows at Hex Trust. Supporting blockchain infrastructure while implementing best DevOps practices.