Data Scientist working on enhancing AI agent performance metrics and experiments. Analyzing data and collaborating with cross-functional teams to drive product improvements.
Responsibilities
Design and analyze experiments to measure agent improvements—from model changes to UX variations—with statistical rigor and practical tradeoffs.
Define success metrics that connect agent trace data (prompts, responses, code changes, execution outcomes) to user outcomes like successful deploys, retention, and revenue.
Build the semantic layer for agent data in partnership with data engineering—defining the tables, metrics, and models that enable self-serve analysis across the AI team.
Surface insights from trace analysis that identify failure modes, successful patterns, and opportunities to improve agent effectiveness.
Partner with AI engineering, product, and leadership to translate data into roadmap decisions; you'll have a seat at the table for critical agent strategy discussions.
Create dashboards and reporting that surface agent performance metrics (task completion, latency, quality scores, user satisfaction) for the AI team and executives.
Requirements
5+ years of experience in data science, analytics, or a quantitative role with a focus on product, growth, or experimentation.
Deep experimentation expertise: A/B testing, experiment design, power analysis, handling skewed data, interpreting results beyond p-values.
Strong SQL skills and experience designing data models for high-volume event data; experience with dbt or similar transformation tools.
Proficiency in Python and data science libraries (pandas, scipy, statsmodels, etc.).
Ability to translate ambiguous questions into structured analysis and communicate findings clearly to both technical and non-technical stakeholders.
Bias toward action: you ship insights that influence decisions, not just dashboards.
Data Scientist at Assembly building and automating media intelligence models. Collaborating with consultancy leadership to define analytical approaches to business challenges.
Senior Data Scientist focusing on ML systems for Walmart's Trust and Safety. Collaborating on compliance models and overseeing the full model lifecycle.
Head of Data leading development and execution of data strategy at Verity. Mentoring a team to deliver insights and drive business growth while collaborating with multiple departments.
Senior Data Scientist responsible for credit modeling at Clair, utilizing machine learning to assess risk and optimize decisions. Collaborating with cross - functional teams and deploying models in production environments.
Senior Product Manager responsible for transforming user needs into scalable products for Seyna. Collaborating with internal teams to enhance insurance programs and streamline workflows.
Founding AI Data Scientist helping Grand build core AI and data systems for decision - making. Collaborating with teams to leverage complex data for impactful AI solutions.
Data Scientist responsible for AI model development and deployment across client projects. Building reliable AI applications ensuring client success and operational excellence.
Data Science Senior Consultant at Squarcle advising on data science and digital challenges. Collaborating with clients to enhance business efficiency and profitability through data - driven strategies.
Head of Analytics leading company's analytics function in a B2C AI education company. Focused on data - driven decision - making and collaboration with Product, Marketing, and Engineering teams.
Data Scientist at Shift building complex models for credit risk and fraud. Collaborating with teams to innovate financial solutions for Australian SMEs.