Hybrid AI Researcher – Multimodal Perception Models

Posted last month

Apply now

About the role

  • Conduct research on Foundational Multimodal Models in the context of Conversational Avatars (e.g., Neural Avatars, Talking-Heads).
  • Model video, audio, and language sequences using Autoregressive, Predictive Architectures (e.g., V-JEPA), and/or Diffusion paradigms with an emphasis on temporal and sequential data rather than static images.
  • Collaborate with the Applied ML team to bring your work to life in production systems.
  • Stay at the cutting edge of multimodal learning and help us define what “cutting edge” means next.

Requirements

  • A PhD (or near completion) in a relevant field, or equivalent hands-on research experience.
  • Experience modeling human behavior and generation (facial expressions, affect, or speech). Ideally in conversational or interactive settings.
  • Deep understanding of sequence modeling in video/audio/language domains.
  • Familiarity with large model training, especially LLMs or VLMs.
  • Strong background in Deep Learning (from Transformers to Diffusion Models) and how to make them work in practice.
  • Excellent programming skills, especially in PyTorch.

Benefits

  • flexible work schedule
  • unlimited PTO
  • competitive healthcare
  • gear stipends

Job title

AI Researcher – Multimodal Perception Models

Job type

Experience level

Mid levelSenior

Salary

Not specified

Degree requirement

Postgraduate Degree

Tech skills

Location requirements

Report this job

See something inaccurate? Let us know and we'll update the listing.

Report job