Research Engineer developing infrastructure for Aldea's multi-modal AI research team. Building systems that support rapid experimentation at billion-parameter scale in language and speech domains.
Responsibilities
Build and maintain distributed training infrastructure supporting researchers across language and speech domains at a billion-plus-parameter scale.
Optimize training and inference performance across the stack, delivering significant speedups through framework optimization, custom kernels, and system-level improvements.
Design experiment infrastructure including automated evaluation pipelines, experiment tracking, and monitoring systems that enable rapid iteration.
Scale infrastructure from single-node to multi-node distributed training and deploy production inference systems for real-time applications.
Support researchers with fast turnaround on infrastructure issues and maintain high reliability across all systems.
Collaborate with research scientists, data engineers, and leadership to define technical priorities and infrastructure roadmap.
Requirements
Bachelor's degree in Computer Science, Engineering, or related field, or equivalent practical experience.
3+ years of experience with PyTorch and distributed training frameworks (DDP, FSDP, DeepSpeed, or similar).
Experience training large-scale deep learning models at 1B+ parameters.
Deep understanding of training optimization techniques including mixed precision, gradient checkpointing, and memory management.
Proven ability to build production-grade ML infrastructure with high reliability.
Track record of delivering significant performance optimizations in ML training or inference systems.
Benefits
Competitive base salary
Performance-based bonus aligned with research and model milestones
Senior ML Research Engineer driving the research and development of multimodal embedding models at TwelveLabs. Collaborating on projects integrating video, audio, and text for innovative AI solutions.
Research Engineer developing and operating analytical infrastructure for Gridware’s edge grid monitoring product. Collaborating on measurement analysis and new product operationalization in R&D team.
Senior Research Engineer specializing in solid mechanics for Gridware's new sensing capabilities. Focused on mechanics modeling and algorithm design within a climate - tech environment.
Research Engineer developing agentic systems at Anthropic focused on LLMs and AI applications. Collaborating with researchers to enhance agent performance and tackle complex tasks.
System Modelling Innovation Engineer at Electrolux developing advanced product development system models. Enhancing modeling techniques and optimizing product development for better consumer experiences.
R&D Engineer developing estimation and control strategies for Electrolux appliances. Collaborating with global teams to innovate product features and drive sustainability in consumer electronics.
Principal Research Engineer leading engineering activities in behavior autonomy for Scientific Systems. Overseeing critical technology deliverables, team management, and proposal efforts.
Staff Research Engineer involved in creating a neurosymbolic AI agent at Onton. Focused on optimal decision - making processes and addressing challenges in current AI systems.