Hybrid Kubernetes Platform Engineer – Control Plane, AI Infrastructure

Posted yesterday

Apply now

About the role

  • Kubernetes Platform Engineer working with self-managed clusters and AI infrastructure. Collaborating with a team to design and operate Kubernetes solutions and automate operational tasks.

Responsibilities

  • Design, build, and operate self-managed Kubernetes clusters (OpenShift / Anthos)
  • Manage and maintain etcd (backup, restore, quorum management, defrag)
  • Perform control plane upgrades and lifecycle management
  • Tune API server, scheduler, and controller manager for performance and reliability
  • Debug node-level and control-plane issues across large clusters
  • Implement networking (CNI), storage (CSI), and ingress integrations
  • Implement and extend runbook automation frameworks to reduce operational toil
  • Integrate AI agents that monitor cluster telemetry, detect anomalies, and trigger automated workflows
  • Apply statistical or ML-based models on operational data to predict failures, capacity saturation, or workload misbehavior
  • Build self-healing controllers and automated remediation pipelines
  • Implement predictive capacity planning and intelligent alert suppression workflows
  • Build Kubernetes controllers and operators (Go + controller-runtime)
  • Develop CRDs and admission webhooks to extend platform functionality
  • Automate cluster lifecycle and multi-cluster operations
  • Implement policies for workload isolation, governance, and compliance
  • Enable GPU and high-performance infrastructure for AI/ML workloads
  • Optimize scheduler and resource allocation for memory- and compute-intensive workloads
  • Support orchestration of AI/ML pipelines

Requirements

  • 5+ years of software engineering experience
  • 3+ years operating Kubernetes in production with hands-on control plane experience
  • Experience managing etcd (backup, restore, recovery) and performing control plane upgrades
  • Strong Go programming skills
  • Experience building Kubernetes operators/controllers and developing CRDs/webhooks
  • Deep understanding of scheduler, API server, controller loops, and reconciliation
  • Experience debugging and troubleshooting large-scale distributed systems
  • Candidates without on-prem or self-managed Kubernetes control plane experience will not be considered.

Benefits

  • medical, dental and vision insurance
  • 401(k) plan with a Cisco matching contribution
  • paid parental leave
  • short and long-term disability coverage
  • basic life insurance
  • 10 paid holidays per full calendar year
  • 1 floating holiday for non-exempt employees
  • 1 paid day off for employee’s birthday
  • paid year-end holiday shutdown
  • 4 paid days off for personal wellness
  • 16 days of paid vacation time per full calendar year
  • flexible vacation time off program
  • 80 hours of sick time off provided on hire date and each January 1st thereafter
  • additional paid time away may be requested
  • 10 paid days per full calendar year to volunteer
  • potential grants of Cisco restricted stock units

Job title

Kubernetes Platform Engineer – Control Plane, AI Infrastructure

Job type

Experience level

Mid levelSenior

Salary

$126,500 - $182,000 per year

Degree requirement

Bachelor's Degree

Location requirements

Report this job

See something inaccurate? Let us know and we'll update the listing.

Report job