Staff Machine-Learning Infrastructure Engineer developing ML infrastructure for Voxel, enhancing workplace safety through AI and computer vision technology.
Responsibilities
Own data & labeling pipelines – architect scalable labeling services (storage, query, retrieval), design ontologies, automate annotation workflows, and build quality-tiered datasets that stay within cost constraints.
Build and operate training infrastructure – create multi-GPU / multi-node training frameworks (Ray, Spark, Kubernetes), optimize distributed jobs, and integrate accelerators (TensorRT, CUDA-graph, FP8, etc.).
Manage the full model lifecycle – stand up model registries, version control, evaluation suites, and continuous-learning loops that push updates from dev → staging → prod with zero-downtime rollbacks.
Provide technical leadership, mentorship, and lightweight project management to a small infra + research squad.
Establish DevOps-for-ML best practices (IaC, CI/CD, observability, cost monitoring) so researchers can iterate quickly and safely.
Partner with ML engineers on architecture decisions, from data schemas to inference optimizations, ensuring infra and research road-maps stay tightly aligned.
Requirements
Bachelor’s (or higher) in Computer Science, EE, or related field.
5+ years building and operating large-scale infrastructure, with at least 3 years focused on ML or data-intensive systems.
Proven record designing highly available, distributed systems on Kubernetes (EKS, GKE, or on-prem).
Deep expertise with orchestration (K8s operators, Argo, Kubeflow), and cluster-scale storage / compute (S3, GCS, Ray, Spark, Dask).
Hands-on experience automating data-labeling or ground-truth workflows and maintaining dataset versioning.
Strong software-engineering fundamentals; familiar with best practices for testing, observability, and secure coding.
Senior Software Engineer designing and operating ML infrastructure for Plaid's AI initiatives. Collaborating with product teams to accelerate AI - powered financial experiences and ensure scalable ML systems.
Senior ML Engineer serving as an individual contributor in generative AI at GEICO. Collaborating with teams to design, develop, and deploy AI systems that drive business value.
Senior Staff Machine Learning Engineer at GEICO, enhancing service productivity through AI technologies. Collaborating with dynamic teams to develop and deploy scalable AI workflows across Geico.
Staff AI Engineer at GEICO designing and deploying AI platforms for virtual agent workflows. Collaborating with teams to improve service for millions of customers.
Machine Learning Engineer at Tilt, developing personalisation solutions across various app surfaces. Collaborate with teams to enhance recommendation systems on a video - first shopping platform.
Senior Machine Learning Engineer architecting next - generation AI platforms for healthcare and fintech with Nitra's diverse team. Focused on data pipelines, ML infrastructure, and production - ready AI systems.
Senior Machine Learning Engineer architecting and building Nitra's data and AI platform. Driving intelligent products across healthcare and fintech industries with applied AI and platform engineering.
Machine Learning Engineer developing and implementing ML models for lending at Blue Whale Lending LLC. Collaborating with teams to enhance data insights and validate model performance.
Applied ML Engineer contributing to machine learning and perception tasks for edge - intelligent maritime systems. Collaborating with cross - functional teams to deliver real - world AI solutions.
AI/ML Engineer building data science and AI solutions for Pharma and MedTech clients on Azure. Collaborating with teams to deliver end - to - end machine learning projects.