Hybrid Site Reliability Engineer – UDF

Posted 5 days ago

Apply now

About the role

  • Site Reliability Engineer designing and supporting Kubernetes environments for F5's UDF platform. Collaborating with cross-functional teams to ensure reliability and operational excellence.

Responsibilities

  • Design, deploy, and manage Kubernetes clusters and ensure efficient container orchestration
  • Implement and maintain Kubernetes-based deployment pipelines
  • Optimize resource allocation within Kubernetes clusters
  • Develop and maintain high-availability and fault-tolerant Kubernetes architectures
  • Design and implement observability pipelines for real-time monitoring of Kubernetes clusters
  • Leverage tools such as Cloudwatch, DataDog, Grafana, or similar platforms
  • Establish logging, tracing, and alerting strategies
  • Automate infrastructure management tasks to support effective AI functionalities
  • Support Infrastructure-as-Code (IaC) methodologies
  • Collaborate with product teams and sales engineering to integrate F5 products into the UDF platform

Requirements

  • Bachelor’s degree in Computer Science, Software Engineering, or a related technical field (or equivalent experience)
  • 4+ years of experience in Site Reliability Engineering (SRE), DevOps, or similar roles
  • Strong expertise in managing Kubernetes clusters and containerized workloads in production environments
  • Hands-on experience deploying and managing Kubernetes environments in AWS, especially using EKS
  • Proficient in monitoring and observability tools, including CloudWatch, Grafana, Fluentd, DataDog, or equivalent platforms
  • Expertise with Infrastructure-as-Code (IaC) tools such as Terraform, Helm, or CloudFormation
  • Solid understanding of networking, storage, and compute infrastructure within containerized environments
  • Proficiency in coding and scripting languages, including Python, Go, or Bash
  • Expertise in applying security best practices to Kubernetes environments
  • Familiarity with GPU-based workloads in Kubernetes environments and optimization strategies for AI-based workloads
  • Experience with orchestrating, troubleshooting, and optimizing complex network environments in AWS and GCP VPCs
  • Experience working with hypervisors in GCP VPCs

Benefits

  • Health insurance
  • 401(k) matching
  • Flexible work hours
  • Paid time off
  • Remote work options
  • Professional development opportunities
  • Bonus
  • Stock options
  • Equipment allowances
  • Wellness programs

Job title

Site Reliability Engineer – UDF

Job type

Experience level

Mid levelSenior

Salary

$137,600 - $206,400 per year

Degree requirement

Bachelor's Degree

Location requirements

Report this job

See something inaccurate? Let us know and we'll update the listing.

Report job