AI Software Engineer – Model Evaluation at Aleph Alpha | Hybrid Hired

About the role

Senior AI Engineer responsible for end-to-end benchmarks and evaluations at Aleph Alpha Research in Heidelberg. Focus on ML models and German capabilities with ownership in a hybrid environment.

Responsibilities

Own benchmarks end-to-end: select, implement, and maintain the evaluation suite used during pre-training — from dataset curation to scoring infrastructure to result analysis.
Build evaluation infrastructure: develop and optimize the pipelines that run evaluations against training checkpoints, ensuring speed, reliability, and reproducibility.
Design aggregation and reporting: define how benchmark results translate into training decisions, and build the tooling that makes results interpretable.
Close capability gaps: work with product and post-training teams to identify where our models fall short, then create or integrate benchmarks that measure progress.
Own German evaluation: ensure rigorous assessment of German language capabilities — this is core to our value proposition, not an afterthought.
Correlate signals: establish which pre-training metrics actually predict downstream and system-level performance.

Requirements

Experience with LLM evaluation, benchmark design, evaluation dataset curation, and experimental design.
Familiarity with statistical methods for evaluation and experiment design.
Track record of shipping impactful technical work — whether that's research, infrastructure, or both.
Strong Python skills and comfort with ML tooling (PyTorch, evaluation frameworks, distributed systems).
Ability to reason about what an evaluation measures and whether it matters — not just run benchmarks, but understand them.
Ownership mentality: you see problems through from diagnosis to solution to deployment.
Willingness to relocate to Heidelberg or travel regularly (potentially weekly).

Benefits

30 days of paid vacation
Access to a variety of fitness & wellness offerings via Wellhub
Mental health support through nilo.health
Substantially subsidized company pension plan for your future security
Subsidized Germany-wide transportation ticket
Budget for additional technical equipment
Flexible working hours for better work–life balance and hybrid working model
Virtual Stock Option Plan
JobRad® Bike Lease

Similar roles

Browse all Full Stack Engineer jobs

48 minutes ago

NO

Senior Software Engineer

Nokia

Senior Software Engineer working on AI - augmented cloud - based solutions. Collaborating with a dynamic team to drive efficiency in service operations at Nokia.

Hybrid Role

Wroclaw Poland Full Stack Engineer

48 minutes ago

PR

Medical Device Software Engineer

ProVoyance

Medical Device Software Engineer designing and owning automated test ecosystems for surgical platforms. Collaborating on test frameworks, coding in C++, and ensuring product reliability.

Hybrid Role

Grand Rapids United States Full Stack Engineer

$110,000 - $115,000 per year

1 hour ago

BK

Full Stack Developer

BIOCEN HOLDING AG & Co. KGaA

Full Stack Developer enhancing Smartforester platform for forestry and timber industry through teamwork and innovation. Ensures high software stability and contributes to company development.

Hybrid Role

Putbus Germany Full Stack Engineer

3 hours ago

PY

Software Engineer Consultant

Pyyne

Software Engineer Consultant developing innovative solutions for clients in sectors like Green Energy and MedTech. Collaborating with talented individuals and shaping your impactful role through dynamic projects.

Hybrid Role

Stockholm Sweden Full Stack Engineer

6 hours ago

EG

Full Stack Developer, Nest.JS, Node JS

Egen

Full Stack Developer at Egen, focusing on Nest.js and Node.js for driving data insights. Building scalable applications on Google Cloud and ensuring high performance with frontend and backend technologies.

Hybrid Role

Hyderabad India Full Stack Engineer

7 hours ago

AL

Senior Software Engineer

Almabase

Senior Software Engineer developing and maintaining backend systems by integrating with various CRMs. Leading reliability and performance in a collaborative team environment in Bangalore.

Hybrid Role

Bangalore India Full Stack Engineer

7 hours ago

SL

Senior Software Engineer, Local Thai

Siam Makro Public Company Limited

Senior RPA Software Engineer developing automation solutions leveraging Python and AI at Lotus's. Key contributor to scaling the digital workforce through RPA and machine learning.

Hybrid Role

Bangkok Thailand Full Stack Engineer

9 hours ago

PB

Advisory Software Engineer

Pitney Bowes

Senior network security Engineer for Zero Trust and Network security architecture team at Pitney Bowes. Ensuring implementation, operation, and optimization of zero trust solutions.

Onsite Role

Pune India Full Stack Engineer

10 hours ago

SA

Full Stack Developer – Intermediate

Sandfield

Full Stack Developer working on impactful software solutions for top brands in Australasia. Join Sandfield where diverse projects await and personal growth is fostered.

Hybrid Role

Auckland New Zealand Full Stack Engineer

14 hours ago

DA

Senior Developer – Tech Lead, AI LAB

Datacom

Senior Developer/Tech Lead focusing on AI - driven software solutions at Datacom. Collaborate with teams to design and deliver innovative projects addressing complex challenges.

Hybrid Role

Auckland New Zealand Full Stack Engineer