Platform Engineer in MLOps team at EIT, building cloud infrastructure and enabling scientific breakthroughs. Seeking experienced individual to automate operational processes and enhance experimentation.
Responsibilities
**Day-to-day, you might:**
Architect, build, and operate our cloud platform, moving infrastructure beyond the initial setup to deliver resilient compute, network, and storage, including full-sized GPU clusters
Drive the implementation of highly structured, auditable delivery pipelines (CI/CD/GitOps) using to enforce automated, repeatable infrastructure changes
Design and deploy automated governance and security controls using Policy-as-Code (specifically Kyverno and YAML) to ensure strong isolation, protect data, and meet internal audit standards
Establish the foundational monitoring, alerting, and telemetry framework required for robust operations, defining clear SLOs, and setting the course for future SRE work
Partner with Research and Data teams to build self-service capabilities that efficiently support diverse workloads, from Python notebooks to distributed clusters
Requirements
**What makes you a great fit:**
Proven experience platform engineering, with a demonstrable track record of architecting and automating operational processes
A highly proactive attitude and a passion for introducing and automating operational structure
Expertise with at least one major cloud provider (OCI, AWS, GCP, or Azure)
Proficiency with Terraform for declarative, large-scale infrastructure provisioning
Comfortable with operating and managing large-scale, resilient Kubernetes clusters
Proficiency in at least one major language for system-level tools (e.g. Python, Go, or Java) with some scripting experience
**It would also be great if you had:**
Familiarity with modern Policy-as-Code tooling
A passion for introducing and automating operational rigour and structure
Experience supporting ML and Data Engineering workloads
Benefits
**We offer the following salary and benefits:**
Enhanced holiday pay
Pension
Life Assurance
Income Protection
Private Medical Insurance
Hospital Cash Plan
Therapy Services
Perk Box
Electric Car Scheme
-
**Why work for EIT:**
At the Ellison Institute, we believe a collaborative, inclusive team is key to our success. We are building a supportive environment where creative risks are encouraged, and everyone feels heard. Valuing emotional intelligence, empathy, respect, and resilience, we encourage people to be curious and to have a shared commitment to excellence. Join us and make an impact!
Platform Engineer focusing on AWS services and infrastructure modernization for a cloud - based POS provider. Responsibilities include design, deployment, and mentoring in engineering best practices.
Lead Platform Engineer enhancing Humana's advanced healthcare solutions. Overseeing enterprise platform services and driving modernization initiatives across teams and systems.
Senior Platform Engineer contributing to scalable and resilient healthcare technology and AI solutions at Humana. Focused on cloud infrastructure modernization and automation best practices for operational excellence.
Network Automation Platform Support Engineer focused on supporting and maintaining automation and data platforms at Fiserv. Involves collaboration with engineering teams for improved processes and solutions.
Senior AI Platform Engineer designing and implementing AI infrastructures at leading financial services company. Utilizing big data platforms and mentoring engineers in AI best practices.
Senior AI Product Platform Engineer at Kulu, an AI startup building onboarding agents. Responsible for product platform ownership and release - quality systems.
Intern assisting in modernization initiatives for agentic AI workflows and data platforms. Supporting the development and maintenance of data pipelines and prototyping AI use cases.
Senior Research and Development Engineer for transformer mechanical design at Hitachi Energy. Leading software development for innovative projects and collaborating within a global team.
Platform Engineer leading lifecycle management of MOM and AMHS systems across Kubernetes clusters in semiconductor industry. Collaborating with internal teams to ensure operational reliability in manufacturing.
Own product platform and release - quality systems for AI SaaS startup. Implement analytics, build dashboards, and ensure safe releases while maintaining high quality standards.