AI Evaluation Engineer at Factorial | Hybrid Hired

About the role

AI Evaluation Engineer developing methodologies for assessing advanced AI systems' performance and reliability. Collaborating in a hybrid role in Ghent, Belgium.

Responsibilities

Design and Develop Evaluation Frameworks: Create scalable, reproducible evaluation pipelines for large-scale AI systems, including LLMs and multi-agent architectures, covering both automated and human-in-the-loop testing strategies.
Metric Innovation: Define and implement novel evaluation metrics that capture model capabilities beyond traditional benchmarks.
Benchmarking & Performance Analysis: Conduct benchmarking of AI models across domains, tasks modalities, analyzing their skills and behavior under different setups.
Safety, Reliability & Alignment Testing: Develop tools and experiments to probe model safety, robustness, interpretability, and bias.
Cross-functional Collaboration: Work closely with model finetuning and optimization teams to evaluate end-to-end system effectiveness, efficiency. Identify trade-offs between model performance, latency, and energy footprint.
Continuous Improvement & Reporting: Monitor model performance over time, automate regression detection, and contribute to the continuous evaluation infrastructure that supports Openchip’s AI research and product roadmap.

Requirements

MSc or PhD in Computer Science, Artificial Intelligence, Machine Learning, Statistics, or a related field.
A publication record in ML evaluation, benchmarking, or interpretability is a plus.
3+ years of experience developing, evaluating, or optimizing AI systems.
Strong programming skills in Python, with experience using PyTorch, TensorFlow, or JAX.
Experience in designing evaluation protocols for LLMs, multi-agent systems, or reinforcement learning environments.
Deep understanding of ML metrics, evaluation methodologies, and statistical analysis.
Experience with data quality, annotation workflows, and benchmark dataset creation is a plus.
Fluent in English; proficiency in additional European languages (German, Dutch, Spanish, French, or Italian) is a plus.

Benefits

The opportunity to build a cloud AI deployment platform that will power next generation AI systems.
A collaborative, innovation-driven environment with significant autonomy and ownership.
Hybrid work model with flexible scheduling.
A chance to join one of Europe’s most ambitious companies at the intersection of AI and silicon engineering.

Similar roles

Browse all Artificial Intelligence jobs

21 hours ago

MA

Director, AI

Manulife

Director of AI responsible for vision and strategy at Manulife, driving AI/ML adoption and impact. Leading a high - performing team to develop AI - driven solutions for revenue growth and customer retention.

Hybrid Role

Toronto Canada Artificial Intelligence

CA$113,260 - CA$210,340 per year

yesterday

MM

Communications Manager – AI Transformation, Adoption

Marsh McLennan

Communications Manager managing internal communications and adoption strategy for AI transformation at Oliver Wyman. Coordinating messaging and shaping narratives with senior stakeholders across teams.

Hybrid Role

London United Kingdom Artificial Intelligence

yesterday

FC

Lead Gen AI

Ford Motor Company

Gen AI Lead Engineer designing, building, and deploying AI solutions using technologies like LangChain and LangGraph. Spearheading innovations for the FBS Marketplace platform.

Hybrid Role

Chennai India Artificial Intelligence

yesterday

II

AI Intern

ITA Group, Inc.

AI Intern supporting AI use in business contexts by developing workflows and solutions. Collaborating with various teams on AI MVPs and practical applications.

Onsite Role

West Des Moines United States Artificial Intelligence

yesterday

II

Awards Services AI Specialist Intern

ITA Group, Inc.

Internship supporting award program storefronts and AI engagement solutions for leading clients. Ideal for students majoring in marketing or related fields with AI and automation interests.

Onsite Role

West Des Moines United States Artificial Intelligence

yesterday

JE

AI Operations Lead

Jeeves

AI Operations Lead designing and implementing agent - based automation for a global financial platform. Assessing workflows, deploying AI agents, and ensuring operational efficiency at Jeeves.

Hybrid Role

Bogota Colombia Artificial Intelligence

yesterday

OM

Dual Degree in Artificial Intelligence – Engineering Sciences

Omexom

Duales Studium in Künstliche Intelligenz innerhalb der Ingenieurswissenschaften. Studienbeginn am 01.10.2026, Abschluss als Bachelor of Engineering.

Onsite Role

Mannheim Germany Artificial Intelligence

yesterday

CO

Applied Researcher II – AI Foundations

Capital One

Applied Researcher II developing AI - powered banking solutions at Capital One. Collaborating with cross - functional teams to bring AI innovations to customer experiences.

Hybrid Role

San Francisco United States Artificial Intelligence

$262,500 - $326,800 per year

yesterday

GU

Managing Consultant, AI & Data

Guidehouse

Managing Consultant for AI and Data Team supporting utility clients with data - driven analysis on regulatory and operational challenges. Leading analytic work and mentoring junior staff while ensuring high - quality delivery.

Onsite Role

New York City United States Artificial Intelligence

$141,000 - $235,000 per year

yesterday

ZG

Head of Secure Data, AI Transformation

ZEISS Group

Leading global AI transformation and data governance initiatives at ZEISS while ensuring security and compliance. Managing cross - functional teams and driving innovation in AI technologies and cyber security.

Onsite Role

Oberkochen Germany Artificial Intelligence