Site Reliability Engineer designing and supporting Kubernetes environments for F5's UDF platform. Collaborating with cross-functional teams to ensure reliability and operational excellence.
Responsibilities
Design, deploy, and manage Kubernetes clusters and ensure efficient container orchestration
Implement and maintain Kubernetes-based deployment pipelines
Optimize resource allocation within Kubernetes clusters
Develop and maintain high-availability and fault-tolerant Kubernetes architectures
Design and implement observability pipelines for real-time monitoring of Kubernetes clusters
Leverage tools such as Cloudwatch, DataDog, Grafana, or similar platforms
Establish logging, tracing, and alerting strategies
Automate infrastructure management tasks to support effective AI functionalities
Support Infrastructure-as-Code (IaC) methodologies
Collaborate with product teams and sales engineering to integrate F5 products into the UDF platform
Requirements
Bachelor’s degree in Computer Science, Software Engineering, or a related technical field (or equivalent experience)
4+ years of experience in Site Reliability Engineering (SRE), DevOps, or similar roles
Strong expertise in managing Kubernetes clusters and containerized workloads in production environments
Hands-on experience deploying and managing Kubernetes environments in AWS, especially using EKS
Proficient in monitoring and observability tools, including CloudWatch, Grafana, Fluentd, DataDog, or equivalent platforms
Expertise with Infrastructure-as-Code (IaC) tools such as Terraform, Helm, or CloudFormation
Solid understanding of networking, storage, and compute infrastructure within containerized environments
Proficiency in coding and scripting languages, including Python, Go, or Bash
Expertise in applying security best practices to Kubernetes environments
Familiarity with GPU-based workloads in Kubernetes environments and optimization strategies for AI-based workloads
Experience with orchestrating, troubleshooting, and optimizing complex network environments in AWS and GCP VPCs
Development Operations Engineer supporting enterprise application development in Java and/or C. Ensuring high availability and operational excellence in modern payment solutions.
Senior Site Reliability Engineer ensuring operational excellence for multi - datacenter infrastructure at F5. Developing automation tools and APIs in Python and Go.
DevOps Engineer needed to develop a new OpenXDR solution on AWS, processing security data from multiple sources. Join a leading cybersecurity company in Slovakia.
DevOps Engineer at Castalia Systems automating and optimizing toolchain and CI/CD pipelines. Designing Azure infrastructure and ensuring collaboration between development and operations teams.
Senior DevOps Engineer managing Kubernetes and AI - driven workflows at Hex Trust. Supporting blockchain infrastructure while implementing best DevOps practices.
Lead DevSecOps Software Developer at Leidos enhancing automation for air traffic operations. Collaborating on safety - critical systems within a hybrid work environment.
DevSecOps Engineer overcoming client challenges using the latest tools at Booz Allen Hamilton. Collaborating on clean code and infrastructure enhancements to build user - oriented solutions.