Hybrid AI Software Engineer – Model Evaluation

Posted 1 hour ago

Apply now

About the role

  • Senior AI Engineer responsible for end-to-end benchmarks and evaluations at Aleph Alpha Research in Heidelberg. Focus on ML models and German capabilities with ownership in a hybrid environment.

Responsibilities

  • Own benchmarks end-to-end: select, implement, and maintain the evaluation suite used during pre-training — from dataset curation to scoring infrastructure to result analysis.
  • Build evaluation infrastructure: develop and optimize the pipelines that run evaluations against training checkpoints, ensuring speed, reliability, and reproducibility.
  • Design aggregation and reporting: define how benchmark results translate into training decisions, and build the tooling that makes results interpretable.
  • Close capability gaps: work with product and post-training teams to identify where our models fall short, then create or integrate benchmarks that measure progress.
  • Own German evaluation: ensure rigorous assessment of German language capabilities — this is core to our value proposition, not an afterthought.
  • Correlate signals: establish which pre-training metrics actually predict downstream and system-level performance.

Requirements

  • Experience with LLM evaluation, benchmark design, evaluation dataset curation, and experimental design.
  • Familiarity with statistical methods for evaluation and experiment design.
  • Track record of shipping impactful technical work — whether that's research, infrastructure, or both.
  • Strong Python skills and comfort with ML tooling (PyTorch, evaluation frameworks, distributed systems).
  • Ability to reason about what an evaluation measures and whether it matters — not just run benchmarks, but understand them.
  • Ownership mentality: you see problems through from diagnosis to solution to deployment.
  • Willingness to relocate to Heidelberg or travel regularly (potentially weekly).

Benefits

  • 30 days of paid vacation
  • Access to a variety of fitness & wellness offerings via Wellhub
  • Mental health support through nilo.health
  • Substantially subsidized company pension plan for your future security
  • Subsidized Germany-wide transportation ticket
  • Budget for additional technical equipment
  • Flexible working hours for better work–life balance and hybrid working model
  • Virtual Stock Option Plan
  • JobRad® Bike Lease

Job title

AI Software Engineer – Model Evaluation

Job type

Experience level

Mid levelSenior

Salary

Not specified

Degree requirement

Postgraduate Degree

Location requirements

Report this job

See something inaccurate? Let us know and we'll update the listing.

Report job