Senior Data Scientist leading design and execution of evaluation frameworks for generative AI systems at Resaro. Focusing on large language models, applying scientific methods to ensure AI safety and effectiveness.
Responsibilities
Lead the design, implementation, and execution of robust frameworks to evaluate the performance of generative AI systems, including text and multi-modal models
Establish and refine metrics and benchmarks for model quality, including output fidelity, diversity, creativity, and bias detection
Perform technical AI evaluations, benchmarking and “red-team” tests on large language models to assess robustness, embedded biases, vulnerabilities
Work with clients and junior team members to design custom evaluation approaches
Develop a suite of technical and analytical AI evaluation frameworks and tools assessing robustness, explainability, fairness, privacy, safety, and security of AI
Lead design and implementation of evaluation frameworks for Large Language Models (LLMs)
Define and refine metrics for evaluating model performance
Curate and manage large, high-quality datasets for evaluating LLMs
Mentor junior data scientists in best practices for LLM evaluation
Stay up-to-date with the latest advancements in Natural Language Processing (NLP) and LLM evaluation
Requirements
Extensive experience as a data scientist training or deploying deep learning based natural language models/large language models in real-world contexts
About 5-8 years of working experience or a relevant postgraduate degree with 2+ years of working experience building and deploying LLMs
Strong experience in evaluating LLMs using metrics such as perplexity, BLEU, ROUGE, and human-centered evaluation techniques
Proven track record of managing and analyzing large, complex language datasets, including text preprocessing and tokenization
Excellent written and verbal communication skills, with the ability to clearly explain complex technical concepts to diverse audiences, including non-technical stakeholders
Solid programming skills in Python and experience building automated pipelines for continuous model evaluation
Passion and interest in applied research on the safe and responsible use of AI and with large language models.
Leading R&D initiatives for AI at VO2 Group, a tech consulting leader in France. Driving scientific research and managing technical teams for innovative AI solutions.
Machine Learning Researcher working with a diverse team to integrate ML techniques into fusion technology. Collaborating with scientists and engineers to drive innovative energy solutions.
Senior AI Researcher developing Aurora, an AI system for guiding financial outcomes at Moneybox. Collaborating with ML engineers and scientists to design scalable architecture and prototype solutions.
Lead AI Research Engineer at AVEVA, focusing on prototyping and validating emerging AI technologies. Collaborating with teams to drive innovation and real - world applications in industrial software.
AI Scientist developing state - of - the - art AI solutions for drug discovery and proximity - inducing molecules. Collaborating with cross - disciplinary teams on unsolved scientific challenges in machine learning.
AI Researcher / Engineer at Constructor Knowledge Labs focusing on autonomous scientific discovery and AI systems for scientific computing and materials research.
Finance Intern supporting AI research by building and executing financial models. Collaborating with senior professionals to enhance AI's understanding of financial markets.
Machine Learning Researcher enhancing AI in K - 12 education at Kiddom. Driving significant improvements in teaching experiences and student outcomes with innovative AI applications.
Build neural networks for autonomous vehicle technology at Mobileye, focusing on deep learning model design and deployment. Collaborate with teams to ensure high - impact research solutions.