Hybrid Data Engineer

Posted last month

Apply now

About the role

  • Data Engineer building data infrastructure at Aldea, a multi-modal AI company. Designing and scaling data pipelines for language and speech domains at large token scales.

Responsibilities

  • Build and scale data pipelines for pretraining, midtraining, and post-training at trillion+ token scale across language and speech domains
  • Process and curate large-scale datasets including cleaning, deduplication, quality filtering, and optimization for distributed training
  • Generate synthetic data for model training and evaluation across diverse tasks and domains
  • Design efficient data loading systems achieving high throughput across multi-node training clusters
  • Build data versioning and reproducibility systems to track dataset compositions and enable reproducible experiments
  • Collaborate with ML engineers and researchers to optimize pipelines and improve data quality

Requirements

  • Bachelor's degree in Computer Science, Engineering, or related field, or equivalent practical experience
  • 3+ years of experience building large-scale data pipelines for machine learning or data-intensive applications
  • Strong programming skills in Python and experience with data processing frameworks (Spark, Dask, Ray, or similar)
  • Experience with data quality techniques including deduplication, filtering, and validation at scale
  • Proven ability to optimize data pipelines for performance and throughput in distributed systems
  • Experience working with large datasets (100GB-10TB+) and understanding of storage systems and data formats

Benefits

  • Competitive base salary
  • Performance-based bonus aligned with research and model milestones
  • Equity participation
  • Comprehensive health, dental, and vision coverage
  • Flexible paid time off

Job title

Data Engineer

Job type

Experience level

Mid levelSenior

Salary

Not specified

Degree requirement

Bachelor's Degree

Location requirements

Report this job

See something inaccurate? Let us know and we'll update the listing.

Report job