Principal Platform Engineer at Comcast responsible for Kubernetes infrastructure design and maintenance. Collaborating with development teams for cloud-native strategies and containerized application deployment.
Responsibilities
Serve as the technical lead and owner for the Compute platform (On Prem and EKS)
Responsible for the deployment of containerized applications across a cluster of bare-metal servers using ansible, terraform etc.
Facilitate automatic scaling of containerized applications based on demand.
Manage utilization of compute resources such as CPU, memory, and storage across the cluster.
Implement monitoring solutions (e.g., Prometheus, Grafana) to track the health and performance of bare metal clusters and infrastructure components.
Set up alerting mechanisms to detect and respond to issues proactively.
Recover from failures by restarting failed containers or reallocating workloads to healthy nodes.
Work closely with development and engineering teams to establish CI/CD pipelines for automating the deployment and rollout of Kubernetes services.
Support seamless rolling updates allowing new versions to be deployed gradually while maintaining application availability.
Identify performance bottlenecks in containerized environments and optimize resource utilization through capacity planning, auto-scaling, and performance tuning.
Document processes, procedures, and best practices related to the platform operations and share knowledge with team members.
Partner with SREs to define platform SLAs, uptime targets, resilience benchmarks, and alerting/monitoring.
Lead incident response and root cause analysis, automating recovery workflows and improving platform resiliency.
Requirements
Bachelor's degree in computer science or a related field, or equivalent experience, typically 12 years in a DevOps or Systems Engineering role.
Familiarity with containerized technologies such as Docker, Kubernetes etc.
Experience implementing continuous integration and continuous delivery (CI/CD) tools and systems.
Proficiency in programming languages such as Python, Java, Shell scripting (Bash).
Automation scripting with tools such as Ansible playbooks.
Deploying infrastructure via Terraform
Strong understanding of networking fundamentals, including TCP/IP, DNS, IPv4/IPv6 networking, Load Balancing, and protocols.
Familiarity with CNCF ecosystem tools and emerging trends in platform engineering.
Experience designing, building, deploying, and maintaining infrastructure, including Kubernetes clusters.
Experience upgrading Kubernetes clusters with no to minimal downtime.
Experience configuring service mesh, network policy controls, and multi-tenancy in Kubernetes.
Strong Kubernetes, cloud native, containerization expertise in a hybrid-cloud enterprise environment and as a solution architect.
Strong Spark skills.
Excellent analytical and problem-solving skills with the ability to effectively communicate complex technical information.
Strong written communication skills are essential, as well as the ability to create clear and informative documentation.
Ability to work effectively across internal and external organizations.
Flexibility to work off-hours for on-call duties.
Relevant certifications, such as Certified Kubernetes Administrator (CKA)
Benefits
Best-in-class Benefits
Expert guidance and always-on tools personalized to meet support needs
Cloud Operations Engineer responsible for ensuring operational stability of Saviynt’s cloud platform. Collaborating with teams to troubleshoot issues and implement improvements in a dynamic environment.
Senior Platform Engineer creating scalable and efficient cloud systems for clients. Join Qodea's Professional Services team focused on innovation at the intersection of technology and design.
Platform Engineer working at Qodea to design and implement cloud solutions using Google Cloud for global leaders. Collaborating with teams to ensure optimal cloud performance and security.
Engineer, Platform Engineering responsible for developing requirements, testing, and deploying VSAT platforms and modems. Collaborating across SES departments and with Platform Vendors.
Senior Platform Engineer in Crypto Security Engineering Team at TransUnion. Building secure, scalable infrastructure and collaborating with teams to maintain cryptographic services.
Platform Engineer at Radiance Technologies responsible for designing and implementing a Kubernetes platform. Collaborating to enhance scalability, security, and reliability of deployments while streamlining processes.
AI Agent Platform Engineer creating and maintaining infrastructure for AI agents at Binance. Focused on automation across trading, compliance, and customer service.
Data Platform Engineer developing cloud solutions for a leading retail company in Portugal and Spain. Collaborating on data pipelines and optimizing data integration solutions in a hybrid work environment.
Principal Software Engineer responsible for shaping Web Platform architecture and performance at Autodesk. Driving strategic initiatives for high - quality web experiences with cross - functional collaboration.
IT Project Manager managing a diverse portfolio of technology projects, focusing on software solutions in the financial industry. Leading project planning and aligning delivery teams with objectives and timelines.