ML Infrastructure Engineer at ChipStack responsible for building training pipelines for LLMs. Collaborating with chip designers and software engineers in a fast-moving startup environment.
Responsibilities
Build the core infrastructure that enables training, fine-tuning, evaluation, and deployment of LLMs across cloud and on-premise environments
Work alongside highly experienced chip designers, ML scientists, and other top-notch engineers
Contribute to solving some of the hardest problems in chip design
Requirements
5+ years of experience in ML infrastructure or adjacent roles
Deep expertise in Python and experience with training frameworks like PyTorch or TensorFlow
Strong systems engineering skills and experience with distributed training, data pipelines, and performance optimization
Experience deploying ML models to production (REST APIs, batch jobs, streaming pipelines)
Proficiency with cloud platforms (e.g., GCP, AWS) and containerized systems (Docker, Kubernetes)
Experience managing GPU/TPU workloads efficiently
Good communication skills and the ability to work directly with engineers and customers
Prior experience training or fine-tuning LLMs
Experience setting up observability, monitoring, and evaluation pipelines for ML models
Senior ML Engineer designing and developing machine learning models for national security. Collaborating with cross - functional teams to deliver scalable solutions in defense applications.
Machine Learning Engineer developing and deploying ML planning algorithms for autonomous trucks. Join Plus, a leader in AI - based virtual driver software for autonomous trucking.
Intern for Servo Engineering at Seagate, integrating AI/ML into precision servo design. Collaborating on research and optimization of control algorithms for hard disk systems.
Intern role focused on Machine Learning and Generative AI projects for Seagate's innovative data solutions. Contributing to precision - engineered storage initiatives in Singapore.
Senior ML Platform Engineer at GEICO focusing on building scalable machine learning infrastructure and managing AI applications. Responsible for design, implementation, and mentoring within the ML team.
Senior Staff Machine Learning Engineer developing and integrating ML systems for GEICO’s Claims organization. Collaborating on AI - powered capabilities to enhance decision - making and user experience.
Principal Machine Learning Engineer optimizing video recommendation systems for Snap. Collaborating with cross - functional teams to advance machine learning strategies and improve tech stack.
Machine Learning Engineering Manager at Snap Inc. leading engineering teams to develop models for value creation. Responsible for technical evaluations, product scalability, and engineering excellence.
Intern working on servo controller design and AI technologies for hard disk drives. Collaborating on projects involving cutting - edge control systems and presenting findings to engineering teams.
Senior/Principal Machine Learning Engineer designing ML systems for Workday’s AI agents. Overseeing full lifecycle from problem framing to deployment while collaborating with cross - functional teams.