AI Agent Evaluation Engineer developing evaluation frameworks for AI agents with an emphasis on safety and ethical standards. Collaborating with AI teams to ensure high-quality performance metrics.
Responsibilities
Evaluation (Evals) Development: Develop synthetic testing environments and simulation strategies to stress-test agents under various real-world conditions.
Responsible AI and Safety Evals (New Focus): Develop and execute adversarial testing, jailbreaking, and red-teaming methodologies to identify potential harm, bias, toxicity, and unauthorized behavior in agent responses.
Test Strategy & Execution: Define comprehensive QA strategies, including functional, integration, regression, and user acceptance testing (UAT) specifically for conversational and goal-oriented AI agents.
Bug Detection & Management: Identify, document, prioritize, and track bugs using Jira, performance degradations, and alignment failures in agent behavior.
Automation & Tools: Integrate evaluation pipelines into the CI/CD process to enable continuous quality assurance and fast iteration cycles.
Requirements
Experience: 6+ years in Software QA, with at least 2 years focused on testing or evaluating AI/ML systems, conversational agents, or Large Language Models (LLMs).
Safety Evals Expertise (Mandatory): Direct experience in designing and executing safety evaluations (red teaming, adversarial testing), bias detection, and measuring toxicity/harmful content in generative AI models.
Agent/LLM Evals: Proven experience developing and running general evaluations (Evals) for LLM-powered applications knowing libraries like PyTest (Must)
Google ADK Familiarity (Mandatory): Direct or strong conceptual understanding of the Google Agent Development Kit (ADK) and its components.
Programming: Strong proficiency in Python is mandatory for script development, data processing, and automation.
Cloud & MLOps: Familiarity with Google Cloud Platform (GCP) services relevant to AI/ML (e.g., Vertex AI) and integrating testing into MLOps workflows.
Tools and Libraries: Langsmith, DeepEval, Ragas, Giskard, Hugging face.
Senior Civil Project Manager at Langan, leading design, permitting, and client management for diverse land development projects. Collaborating with industry leaders in a supportive work environment while ensuring project success.
Project Manager providing project management leadership for Teradyne's vital projects in Italy's Solution Engineering Group. Focus on cross - functional integration and communication with key stakeholders.
Technical Project Manager leading and supporting IT projects from initiation to completion at HII's Mission Technologies. Collaborating with engineering and customer teams to ensure project success.
Project Manager leading planning, execution, and delivery of insurance projects ensuring alignment with business objectives and stakeholder expectations.
Project Coordinator managing Nonviolent Peaceforce’s Programme to reduce violence and enhance security in Ninewa, Iraq. Overseeing the implementation of Unarmed Civilian Protection for social cohesion and safety.
Project Manager managing social media and influencer projects at a global customer experience company. Driving operational excellence and ensuring project delivery across markets.
Project Manager managing IT and Business Applications for Origina, a growing international company. Supporting project management practices and delivering internal changes in a hybrid environment.
Sales professional handling emergency service calls for restoration needs. Responding to urgent situations and converting leads into signed jobs in a high - pressure environment.
Channel Lead Project Analyst Specialist at Schneider Electric, analyzing data and performing audits for business efficiency. Collaborating with teams to meet corporate policies and regulatory requirements
Project Leader driving business strategies for Robot Platform at Woven by Toyota. Focusing on customer - centric solutions, market research, and partnership development.