Hybrid Project Manager

Posted 2 weeks ago

Apply now

About the role

  • AI Agent Evaluation Engineer developing evaluation frameworks for AI agents with an emphasis on safety and ethical standards. Collaborating with AI teams to ensure high-quality performance metrics.

Responsibilities

  • Evaluation (Evals) Development: Develop synthetic testing environments and simulation strategies to stress-test agents under various real-world conditions.
  • Responsible AI and Safety Evals (New Focus): Develop and execute adversarial testing, jailbreaking, and red-teaming methodologies to identify potential harm, bias, toxicity, and unauthorized behavior in agent responses.
  • Test Strategy & Execution: Define comprehensive QA strategies, including functional, integration, regression, and user acceptance testing (UAT) specifically for conversational and goal-oriented AI agents.
  • Bug Detection & Management: Identify, document, prioritize, and track bugs using Jira, performance degradations, and alignment failures in agent behavior.
  • Automation & Tools: Integrate evaluation pipelines into the CI/CD process to enable continuous quality assurance and fast iteration cycles.

Requirements

  • Experience: 6+ years in Software QA, with at least 2 years focused on testing or evaluating AI/ML systems, conversational agents, or Large Language Models (LLMs).
  • Safety Evals Expertise (Mandatory): Direct experience in designing and executing safety evaluations (red teaming, adversarial testing), bias detection, and measuring toxicity/harmful content in generative AI models.
  • Agent/LLM Evals: Proven experience developing and running general evaluations (Evals) for LLM-powered applications knowing libraries like PyTest (Must)
  • Google ADK Familiarity (Mandatory): Direct or strong conceptual understanding of the Google Agent Development Kit (ADK) and its components.
  • Programming: Strong proficiency in Python is mandatory for script development, data processing, and automation.
  • Cloud & MLOps: Familiarity with Google Cloud Platform (GCP) services relevant to AI/ML (e.g., Vertex AI) and integrating testing into MLOps workflows.
  • Tools and Libraries: Langsmith, DeepEval, Ragas, Giskard, Hugging face.

Job title

Project Manager

Job type

Experience level

Mid levelSenior

Salary

Not specified

Degree requirement

Bachelor's Degree

Location requirements

Report this job

See something inaccurate? Let us know and we'll update the listing.

Report job