Senior Data Scientist leading design and execution of evaluation frameworks for generative AI systems at Resaro. Focusing on large language models, applying scientific methods to ensure AI safety and effectiveness.
Responsibilities
Lead the design, implementation, and execution of robust frameworks to evaluate the performance of generative AI systems, including text and multi-modal models
Establish and refine metrics and benchmarks for model quality, including output fidelity, diversity, creativity, and bias detection
Perform technical AI evaluations, benchmarking and “red-team” tests on large language models to assess robustness, embedded biases, vulnerabilities
Work with clients and junior team members to design custom evaluation approaches
Develop a suite of technical and analytical AI evaluation frameworks and tools assessing robustness, explainability, fairness, privacy, safety, and security of AI
Lead design and implementation of evaluation frameworks for Large Language Models (LLMs)
Define and refine metrics for evaluating model performance
Curate and manage large, high-quality datasets for evaluating LLMs
Mentor junior data scientists in best practices for LLM evaluation
Stay up-to-date with the latest advancements in Natural Language Processing (NLP) and LLM evaluation
Requirements
Extensive experience as a data scientist training or deploying deep learning based natural language models/large language models in real-world contexts
About 5-8 years of working experience or a relevant postgraduate degree with 2+ years of working experience building and deploying LLMs
Strong experience in evaluating LLMs using metrics such as perplexity, BLEU, ROUGE, and human-centered evaluation techniques
Proven track record of managing and analyzing large, complex language datasets, including text preprocessing and tokenization
Excellent written and verbal communication skills, with the ability to clearly explain complex technical concepts to diverse audiences, including non-technical stakeholders
Solid programming skills in Python and experience building automated pipelines for continuous model evaluation
Passion and interest in applied research on the safe and responsible use of AI and with large language models.
Senior AI Research Engineer focusing on practical applications of AI technologies for industrial intelligence. Involves building and operationalising AI models in a hybrid work environment.
Lead AI research initiatives at Galileo, focusing on generative AI and machine learning models. Collaborate with cross - functional teams to enhance AI - driven products and tools in an innovative environment.
Staff AI Scientist specializing in machine learning fundamentals at HackerRank. Leading rigorous AI evaluation and dataset construction efforts across teams in a hybrid setting.
Lead Data & AI Scientist delivering innovative AI solutions and insights for Business & Commercial Banking. Responsible for shaping the capability roadmap and mentoring the Data Science & AI team.
Lead Data & AI Scientist at Lloyds Banking Group delivering AI - driven solutions for Banking. Shaping technical roadmaps and mentoring teams in advanced AI practices and deployments.
AI Research Scientist at Lendbuzz developing conversational AI and agentic AI systems. Leading research direction and collaborating with cross - functional teams in a hybrid work environment.
Machine Learning Researcher at Longshot Systems designing and implementing predictive models for sports betting analytics. Involvement in all aspects of R&D from design to implementation.
Lead AI Researcher at Lloyds Banking Group advancing AI transformation through innovative technologies and ethical practices. Collaborates across teams to solve complex financial challenges.
Machine Learning Researcher developing novel AI solutions for impactful products at RBC Borealis. Conducting publishable research and collaborating with development teams to transfer research to production.
Lead AI Scientist at Lloyds Banking Group leading technical delivery of AI solutions. Collaborate with diverse teams to solve financial challenges and create innovative banking solutions.