AI Evaluation Engineer developing methodologies for assessing advanced AI systems' performance and reliability. Collaborating in a hybrid role in Ghent, Belgium.
Responsibilities
Design and Develop Evaluation Frameworks: Create scalable, reproducible evaluation pipelines for large-scale AI systems, including LLMs and multi-agent architectures, covering both automated and human-in-the-loop testing strategies.
Metric Innovation: Define and implement novel evaluation metrics that capture model capabilities beyond traditional benchmarks.
Benchmarking & Performance Analysis: Conduct benchmarking of AI models across domains, tasks modalities, analyzing their skills and behavior under different setups.
Safety, Reliability & Alignment Testing: Develop tools and experiments to probe model safety, robustness, interpretability, and bias.
Cross-functional Collaboration: Work closely with model finetuning and optimization teams to evaluate end-to-end system effectiveness, efficiency. Identify trade-offs between model performance, latency, and energy footprint.
Continuous Improvement & Reporting: Monitor model performance over time, automate regression detection, and contribute to the continuous evaluation infrastructure that supports Openchip’s AI research and product roadmap.
Requirements
MSc or PhD in Computer Science, Artificial Intelligence, Machine Learning, Statistics, or a related field.
A publication record in ML evaluation, benchmarking, or interpretability is a plus.
3+ years of experience developing, evaluating, or optimizing AI systems.
Strong programming skills in Python, with experience using PyTorch, TensorFlow, or JAX.
Experience in designing evaluation protocols for LLMs, multi-agent systems, or reinforcement learning environments.
Deep understanding of ML metrics, evaluation methodologies, and statistical analysis.
Experience with data quality, annotation workflows, and benchmark dataset creation is a plus.
Fluent in English; proficiency in additional European languages (German, Dutch, Spanish, French, or Italian) is a plus.
Benefits
The opportunity to build a cloud AI deployment platform that will power next generation AI systems.
A collaborative, innovation-driven environment with significant autonomy and ownership.
Hybrid work model with flexible scheduling.
A chance to join one of Europe’s most ambitious companies at the intersection of AI and silicon engineering.
AI Support Experience Manager driving performance and improvement in customer support at Bumble through AI tools. Focus on workflow optimization and team integration.
Engage in a three - year PhD program focused on developing AI frameworks to enhance automotive product innovations. Collaborate with industry partners and research institutions in advanced engineering.
AI Trainer responsible for delivering training on AI and Cloud technologies with flexible scheduling. Engaging participants through webinars and workshops for an international media organization.
Principal Director leading applied AI initiatives at Aerospace Corporation focused on national security and advanced infrastructure. Driving innovation through AI and cloud integration for complex space systems.
Busperson role at ai Pazzi restaurant in Las Vegas, maintaining dining room standards and assisting servers. Providing excellent guest service and adhering to department policies.
Specialist in Gen AI Development at Sun Life working on innovative technologies and solutions. Collaborating with teams to implement GenAI technologies and improve existing processes.
Werkstudent in Condition Monitoring mit Embedded AI at Fraunhofer in Nürnberg. Developing solutions for embedded AI systems in a flexible part - time role.
Supports planning and organization of research projects in Human AI - Interaction at Fraunhofer Institute. Involves interdisciplinary teamwork and various research methodologies.
Specialist in Gen AI Development at Sun Life, a leading financial services company. Focusing on app development and design with cloud technologies and Gen AI applications.
AI Project Delivery Manager leading the delivery of meaningful AI, Analytics, and Reporting projects at Orica. Transforming business needs into real solutions and guiding data delivery teams.