Senior Machine Learning Engineer developing scalable ML systems for AI applications at Chalice. Designing and deploying production ML models to impact advertising strategies and business outcomes.
Responsibilities
Architect, train, and maintain scalable neural network systems for audience modeling and bid optimization using PyTorch and Ray distributed training (Ray Train, Ray Tune, DDP)
Build and optimize multi-GPU training pipelines on Databricks, including hyperparameter search with ASHA scheduling and early stopping
Develop feature engineering pipelines using PySpark, including embedding layers (EmbeddingBag, Embedding) for categorical and behavioral features
Implement model comparison workflows with champion/challenger evaluation on holdout data
Build resilient training and batch inference workflows with a focus on automation, reproducibility, and checkpoint recovery
Implement robust model monitoring and observability solutions (MLflow, Prometheus, Grafana, Datadog) to track drift, performance metrics (AUC, AUPRC, F1), and system health
Manage model versioning, experiment tracking, and artifact persistence using MLflow and Unity Catalog
Work closely with engineering teams to integrate model outputs into production systems and optimize dataflows for fault-tolerance
Partner with product stakeholders to align ML efforts with business impact, KPIs, and product strategy across AI Audiences, AI Allocator, CPA Algo, and Curate AI
Lead technical design reviews, contribute to internal Python packages, and enforce engineering best practices (testing, CI/CD, modularity)
Stay current on ML infrastructure advancements (distributed training, inference optimization, model serving patterns) and help guide adoption internally
Document system architectures, create runbooks, and enable team members to adopt and extend the ML framework
Requirements
Master's Degree or PhD in Computer Science, Statistics, Machine Learning, or related discipline with 5-10 years of industry experience
Strong proficiency in PyTorch for neural network development, including custom architectures with embedding layers, MLP backbones, and binary classification heads
Production experience with Databricks including Delta Lake, Unity Catalog, Asset Bundles, and cluster management
Strong grasp of MLOps best practices: experiment tracking (MLflow), model versioning, model serving, monitoring, and reproducibility
Expert-level Python and PySpark skills for data processing and feature engineering at scale
Experience building and maintaining batch inference pipelines with schema versioning and artifact management
Familiarity with cloud platforms (AWS: S3, EC2) and data warehousing (Snowflake)
Experience with CI/CD workflows including build automation, testing, and packaging using GitHub Actions and Make
Excellent collaboration and communication skills; ability to work effectively in a cross-functional environment with DS, Product, and Engineering teams.
Benefits
Medical, Dental, and Vision coverage
401(k) options
Unlimited PTO
11 Company Holidays
Office-wide closure between Christmas Eve and New Year's
Trainer at WeAndTheMany facilitating learning by sharing experiences and creating interactive sessions. Engaging with students to enhance their skills and knowledge through dynamic teaching methods.
Machine Learning Manager leading experienced team to drive data - driven AI/ML solutions at Ford. Overseeing strategies for product development focused on analytics in various domains.
Software Engineer I developing machine learning models and applications at Smart Data Solutions. Collaborating to improve infrastructure and automate processes using AI technology.
Intermediate Machine Learning Engineer at Aviva Canada implementing ML pipelines with required collaboration in AI/ML Operations. Join a team dedicated to operationalizing ML models for optimizing solutions.
MLOps Engineer designing and maintaining cloud infrastructure for large - scale computer vision model training. Collaborating with Data Scientists and AI Engineers to streamline model development lifecycle.
Machine Learning Engineer developing AI - first dating solutions at Hinge, enhancing user matchmaking and conversation experience. Collaborating with cross - functional teams to move ML models to production.
Senior ML Engineer designing and developing machine learning models for national security. Collaborating with cross - functional teams to deliver scalable solutions in defense applications.
Machine Learning Engineer developing and deploying ML planning algorithms for autonomous trucks. Join Plus, a leader in AI - based virtual driver software for autonomous trucking.
Intern for Servo Engineering at Seagate, integrating AI/ML into precision servo design. Collaborating on research and optimization of control algorithms for hard disk systems.
Intern role focused on Machine Learning and Generative AI projects for Seagate's innovative data solutions. Contributing to precision - engineered storage initiatives in Singapore.