Kubernetes Platform Engineer working with self-managed clusters and AI infrastructure. Collaborating with a team to design and operate Kubernetes solutions and automate operational tasks.
Responsibilities
Design, build, and operate self-managed Kubernetes clusters (OpenShift / Anthos)
Manage and maintain etcd (backup, restore, quorum management, defrag)
Perform control plane upgrades and lifecycle management
Tune API server, scheduler, and controller manager for performance and reliability
Debug node-level and control-plane issues across large clusters
Implement networking (CNI), storage (CSI), and ingress integrations
Implement and extend runbook automation frameworks to reduce operational toil
Integrate AI agents that monitor cluster telemetry, detect anomalies, and trigger automated workflows
Apply statistical or ML-based models on operational data to predict failures, capacity saturation, or workload misbehavior
Build self-healing controllers and automated remediation pipelines
Implement predictive capacity planning and intelligent alert suppression workflows
Build Kubernetes controllers and operators (Go + controller-runtime)
Develop CRDs and admission webhooks to extend platform functionality
Automate cluster lifecycle and multi-cluster operations
Implement policies for workload isolation, governance, and compliance
Enable GPU and high-performance infrastructure for AI/ML workloads
Optimize scheduler and resource allocation for memory- and compute-intensive workloads
Support orchestration of AI/ML pipelines
Requirements
5+ years of software engineering experience
3+ years operating Kubernetes in production with hands-on control plane experience
Experience managing etcd (backup, restore, recovery) and performing control plane upgrades
Strong Go programming skills
Experience building Kubernetes operators/controllers and developing CRDs/webhooks
Deep understanding of scheduler, API server, controller loops, and reconciliation
Experience debugging and troubleshooting large-scale distributed systems
Candidates without on-prem or self-managed Kubernetes control plane experience will not be considered.
Benefits
medical, dental and vision insurance
401(k) plan with a Cisco matching contribution
paid parental leave
short and long-term disability coverage
basic life insurance
10 paid holidays per full calendar year
1 floating holiday for non-exempt employees
1 paid day off for employee’s birthday
paid year-end holiday shutdown
4 paid days off for personal wellness
16 days of paid vacation time per full calendar year
flexible vacation time off program
80 hours of sick time off provided on hire date and each January 1st thereafter
additional paid time away may be requested
10 paid days per full calendar year to volunteer
potential grants of Cisco restricted stock units
Job title
Kubernetes Platform Engineer – Control Plane, AI Infrastructure
Platform Engineer ensuring stable operations and excellent developer experience across a hybrid benefits platform. Join our fast - paced team to create impactful solutions in a collaborative environment.
Senior Azure Platform Engineer at Orderfox, evolving the Azure platform for AI agents. Focused on automation, CI/CD, and cost tracking for efficient operations.
Senior Staff Platform Engineer deploying and managing OpenStack environments at Cloudera. Collaborating with teams to integrate OpenStack with Kubernetes and ensure high performance.
Software Engineering Developer at Kyndryl designing and implementing software solutions for clients. Collaborating on complex projects using advanced technologies and methodologies.
AI Platform Engineer at Utica National Insurance Group responsible for evaluating, designing, and implementing AI/ML solutions. Collaborating with internal teams and ensuring effective use of AI - driven tools.
Platform Engineer focused on GitOps and cloud infrastructure for a global QSR retailer. Collaborating with teams to enhance Kubernetes delivery and deployment processes.
Platform Engineer - AI responsible for designing and prototyping AI - driven systems at Temedica. Collaborating on cloud infrastructure for modern applications and ensuring reliability and security in deployments.
Lead Platform Engineer developing enterprise - grade developer tooling at Capital Group. Evolving SDLC toolchain through hands - on adoption of AI - assisted development and collaboration across teams.
Staff Engineer leading CI/CD platform development for fintech solutions at Early Warning. Collaborating across teams to enhance software delivery capabilities in a hybrid work environment.