Hybrid Senior Research Scientist, Reward Models

Posted 4 weeks ago

Apply now

About the role

  • Senior Research Scientist leading research efforts on reward models for AI. Shaping how models understand and optimize for human preferences with a focus on AI safety and capability.

Responsibilities

  • Lead research on novel reward model architectures and training approaches for RLHF
  • Develop and evaluate LLM-based grading and evaluation methods, including rubric-driven approaches that improve consistency and interpretability
  • Research techniques to detect, characterize, and mitigate reward hacking and specification gaming
  • Design experiments to understand reward model generalization, robustness, and failure modes
  • Collaborate with the Finetuning team to translate research insights into improvements for production training pipelines
  • Contribute to research publications, blog posts, and internal documentation
  • Mentor other researchers and help build institutional knowledge around reward modeling

Requirements

  • A track record of research contributions in reward modeling, RLHF, or closely related areas of machine learning
  • Experience training and evaluating reward models for large language models
  • Comfortable designing and running large-scale experiments with significant computational resources
  • Work effectively across research and engineering, iterating quickly while maintaining scientific rigor
  • Enjoy collaborative research and can communicate complex ideas clearly to diverse audiences
  • Care deeply about building AI systems that are both highly capable and safe.
  • Strong candidates may also have published research on reward modeling, preference learning, or RLHF
  • Experience with LLM-as-judge approaches including calibration and reliability challenges
  • Worked on reward hacking, specification gaming, or related robustness problems
  • Experience with constitutional AI, debate, or other scalable oversight approaches
  • Contributed to production ML systems at scale
  • Familiarity with interpretability techniques as applied to understanding reward model behavior.

Benefits

  • Competitive compensation and benefits
  • Optional equity donation matching
  • Generous vacation and parental leave
  • Flexible working hours

Job title

Senior Research Scientist, Reward Models

Job type

Experience level

Senior

Salary

$340,000 - $425,000 per year

Degree requirement

Bachelor's Degree

Location requirements

Report this job

See something inaccurate? Let us know and we'll update the listing.

Report job