Staff Machine-Learning Infrastructure Engineer developing ML infrastructure for Voxel, enhancing workplace safety through AI and computer vision technology.
Responsibilities
Own data & labeling pipelines – architect scalable labeling services (storage, query, retrieval), design ontologies, automate annotation workflows, and build quality-tiered datasets that stay within cost constraints.
Build and operate training infrastructure – create multi-GPU / multi-node training frameworks (Ray, Spark, Kubernetes), optimize distributed jobs, and integrate accelerators (TensorRT, CUDA-graph, FP8, etc.).
Manage the full model lifecycle – stand up model registries, version control, evaluation suites, and continuous-learning loops that push updates from dev → staging → prod with zero-downtime rollbacks.
Provide technical leadership, mentorship, and lightweight project management to a small infra + research squad.
Establish DevOps-for-ML best practices (IaC, CI/CD, observability, cost monitoring) so researchers can iterate quickly and safely.
Partner with ML engineers on architecture decisions, from data schemas to inference optimizations, ensuring infra and research road-maps stay tightly aligned.
Requirements
Bachelor’s (or higher) in Computer Science, EE, or related field.
5+ years building and operating large-scale infrastructure, with at least 3 years focused on ML or data-intensive systems.
Proven record designing highly available, distributed systems on Kubernetes (EKS, GKE, or on-prem).
Deep expertise with orchestration (K8s operators, Argo, Kubeflow), and cluster-scale storage / compute (S3, GCS, Ray, Spark, Dask).
Hands-on experience automating data-labeling or ground-truth workflows and maintaining dataset versioning.
Strong software-engineering fundamentals; familiar with best practices for testing, observability, and secure coding.
Machine Learning Engineer working on next - generation agentic AI platform at Salesforce. Collaborate with teams to innovate and design impactful AI systems for customers.
Data Analyst internship focusing on resource price forecasts with AI and machine learning methodologies at Fraunhofer Institute in Nürnberg. Collaborating on data analytics projects and applying modern techniques for predictive modeling.
Graduate Analyst in Data & Machine Learning Operations for Volkswagen Financial Services, focusing on machine learning, data integration, and business intelligence projects.
Senior ML Engineer responsible for designing scalable AI/ML infrastructure at General Motors. Collaborating with teams on advanced AI solutions for intelligent driving technologies.
Scientific AI & ML Engineer designing and deploying innovative AI - driven solutions. Collaborating with teams to solve complex scientific challenges through advanced machine learning techniques.
MLOps Engineer developing, testing, and maintaining machine learning models at Booz Allen. Collaborating with software developers and data scientists to deliver AI - powered solutions.
Senior Software Developer working on ML Infrastructure and Deployment at Verafin. Helping develop cutting - edge fraud detection tools alongside analytics teams using AWS and Terraform.
Machine Learning Engineer developing advanced SLAM systems for autonomous trucking environments at Bot Auto. Collaborating with cross - functional teams to optimize mapping solutions and ensure operational stability.
Graduate Deep Learning Algorithm Developer developing perception technologies for autonomous driving. Tackling challenges in object detection and 3D perception using state - of - the - art deep learning models.
Principal AI/ML Engineer leading the AI/ML infrastructure development for WEX's risk service needs. Focused on innovative engineering and technology solutions within a high - stakes environment.