Senior AI Engineer responsible for end-to-end benchmarks and evaluations at Aleph Alpha Research in Heidelberg. Focus on ML models and German capabilities with ownership in a hybrid environment.
Responsibilities
Own benchmarks end-to-end: select, implement, and maintain the evaluation suite used during pre-training — from dataset curation to scoring infrastructure to result analysis.
Build evaluation infrastructure: develop and optimize the pipelines that run evaluations against training checkpoints, ensuring speed, reliability, and reproducibility.
Design aggregation and reporting: define how benchmark results translate into training decisions, and build the tooling that makes results interpretable.
Close capability gaps: work with product and post-training teams to identify where our models fall short, then create or integrate benchmarks that measure progress.
Own German evaluation: ensure rigorous assessment of German language capabilities — this is core to our value proposition, not an afterthought.
Correlate signals: establish which pre-training metrics actually predict downstream and system-level performance.
Requirements
Experience with LLM evaluation, benchmark design, evaluation dataset curation, and experimental design.
Familiarity with statistical methods for evaluation and experiment design.
Track record of shipping impactful technical work — whether that's research, infrastructure, or both.
Strong Python skills and comfort with ML tooling (PyTorch, evaluation frameworks, distributed systems).
Ability to reason about what an evaluation measures and whether it matters — not just run benchmarks, but understand them.
Ownership mentality: you see problems through from diagnosis to solution to deployment.
Willingness to relocate to Heidelberg or travel regularly (potentially weekly).
Benefits
30 days of paid vacation
Access to a variety of fitness & wellness offerings via Wellhub
Mental health support through nilo.health
Substantially subsidized company pension plan for your future security
Subsidized Germany-wide transportation ticket
Budget for additional technical equipment
Flexible working hours for better work–life balance and hybrid working model
Senior Software Engineer working on AI - augmented cloud - based solutions. Collaborating with a dynamic team to drive efficiency in service operations at Nokia.
Medical Device Software Engineer designing and owning automated test ecosystems for surgical platforms. Collaborating on test frameworks, coding in C++, and ensuring product reliability.
Full Stack Developer enhancing Smartforester platform for forestry and timber industry through teamwork and innovation. Ensures high software stability and contributes to company development.
Software Engineer Consultant developing innovative solutions for clients in sectors like Green Energy and MedTech. Collaborating with talented individuals and shaping your impactful role through dynamic projects.
Full Stack Developer at Egen, focusing on Nest.js and Node.js for driving data insights. Building scalable applications on Google Cloud and ensuring high performance with frontend and backend technologies.
Senior Software Engineer developing and maintaining backend systems by integrating with various CRMs. Leading reliability and performance in a collaborative team environment in Bangalore.
Senior RPA Software Engineer developing automation solutions leveraging Python and AI at Lotus's. Key contributor to scaling the digital workforce through RPA and machine learning.
Senior network security Engineer for Zero Trust and Network security architecture team at Pitney Bowes. Ensuring implementation, operation, and optimization of zero trust solutions.
Full Stack Developer working on impactful software solutions for top brands in Australasia. Join Sandfield where diverse projects await and personal growth is fostered.
Senior Developer/Tech Lead focusing on AI - driven software solutions at Datacom. Collaborate with teams to design and deliver innovative projects addressing complex challenges.