Research Engineer developing infrastructure for Aldea's multi-modal AI research team. Building systems that support rapid experimentation at billion-parameter scale in language and speech domains.
Responsibilities
Build and maintain distributed training infrastructure supporting researchers across language and speech domains at a billion-plus-parameter scale.
Optimize training and inference performance across the stack, delivering significant speedups through framework optimization, custom kernels, and system-level improvements.
Design experiment infrastructure including automated evaluation pipelines, experiment tracking, and monitoring systems that enable rapid iteration.
Scale infrastructure from single-node to multi-node distributed training and deploy production inference systems for real-time applications.
Support researchers with fast turnaround on infrastructure issues and maintain high reliability across all systems.
Collaborate with research scientists, data engineers, and leadership to define technical priorities and infrastructure roadmap.
Requirements
Bachelor's degree in Computer Science, Engineering, or related field, or equivalent practical experience.
3+ years of experience with PyTorch and distributed training frameworks (DDP, FSDP, DeepSpeed, or similar).
Experience training large-scale deep learning models at 1B+ parameters.
Deep understanding of training optimization techniques including mixed precision, gradient checkpointing, and memory management.
Proven ability to build production-grade ML infrastructure with high reliability.
Track record of delivering significant performance optimizations in ML training or inference systems.
Benefits
Competitive base salary
Performance-based bonus aligned with research and model milestones
Senior Lead eCrime Defense Research Engineer at Wells Fargo focusing on proactive detection and response to cyber threats while leading innovative analytics in fraud prevention.
Senior Research Engineer in Learning to Rank team developing machine learning models for search and ads. Committed to advancing understanding of machine learning through collaboration with the scientific community.
Research Engineer III at Hewlett Packard Enterprise conducting scientific research to innovate products and services. Collaborates with teams for experiments and validations in cutting - edge technology.
Research Engineer developing advanced technologies for safety - and security - critical systems at Galois. Engaging clients and applying formal methods to ensure system reliability and security.
Research Engineer focusing on information security, network security, and cloud architectures. Join infodas to innovate in cyber and IT security under the Airbus group.
Principal Investigator leading robotics and machine learning projects for Teledyne FLIR Defense. Develop machine learning stacks and contribute to software codebases while ensuring project compliance and timelines.
Senior Research Engineer/Specialist at GKN Aerospace developing EWIS product solutions for sustainable aircraft in Electric and Hybrid - Electric systems. Leading innovative projects focused on future aircraft technologies.
Research Engineer developing AI - driven prototypes for healthcare and entertainment at Barco Labs. Collaborating with teams and leveraging cutting - edge technologies in Visual Computing & AI.
Research Engineer in AI and Visual Computing at Barco Labs designing and prototyping innovative solutions for Healthcare, Enterprise and Entertainment divisions.
Innovation Engineer developing interactive digital solutions and analytics tools at Medtronic. Collaborating on projects spanning business strategy, technology, and enterprise transformation.