Research Engineer developing infrastructure for Aldea's multi-modal AI research team. Building systems that support rapid experimentation at billion-parameter scale in language and speech domains.
Responsibilities
Build and maintain distributed training infrastructure supporting researchers across language and speech domains at a billion-plus-parameter scale.
Optimize training and inference performance across the stack, delivering significant speedups through framework optimization, custom kernels, and system-level improvements.
Design experiment infrastructure including automated evaluation pipelines, experiment tracking, and monitoring systems that enable rapid iteration.
Scale infrastructure from single-node to multi-node distributed training and deploy production inference systems for real-time applications.
Support researchers with fast turnaround on infrastructure issues and maintain high reliability across all systems.
Collaborate with research scientists, data engineers, and leadership to define technical priorities and infrastructure roadmap.
Requirements
Bachelor's degree in Computer Science, Engineering, or related field, or equivalent practical experience.
3+ years of experience with PyTorch and distributed training frameworks (DDP, FSDP, DeepSpeed, or similar).
Experience training large-scale deep learning models at 1B+ parameters.
Deep understanding of training optimization techniques including mixed precision, gradient checkpointing, and memory management.
Proven ability to build production-grade ML infrastructure with high reliability.
Track record of delivering significant performance optimizations in ML training or inference systems.
Benefits
Competitive base salary
Performance-based bonus aligned with research and model milestones
Lead Research Engineer developing AI solutions in a hybrid environment for Thomson Reuters. Provide technical leadership and innovate within a collaborative team focused on NLP and AI engineering.
Lead Research Engineer overseeing AI technology evolution and software delivery in collaborative environments. Requires extensive experience in NLP and Python as part of a dynamic engineering team.
Senior Expert Research Engineer developing ML models for animation systems in game development. Collaborating with developers to enhance player experience and team capabilities.
Lead Research Engineer in AI Solutions at Thomson Reuters. Responsible for technical leadership and developing high - quality AI methodologies and systems.
Research Engineer developing simulation tools for the ACHIL platform at ENAC in Toulouse, focusing on innovative solutions using AI and physiological measurements.
Research engineer analyzing safety events reported by airlines at ENAC - Airbus program. Utilizing NLP algorithms on complex datasets and collaborating with safety experts.