AI Agent Evaluation Engineer developing evaluation frameworks for AI agents with an emphasis on safety and ethical standards. Collaborating with AI teams to ensure high-quality performance metrics.
Responsibilities
Evaluation (Evals) Development: Develop synthetic testing environments and simulation strategies to stress-test agents under various real-world conditions.
Responsible AI and Safety Evals (New Focus): Develop and execute adversarial testing, jailbreaking, and red-teaming methodologies to identify potential harm, bias, toxicity, and unauthorized behavior in agent responses.
Test Strategy & Execution: Define comprehensive QA strategies, including functional, integration, regression, and user acceptance testing (UAT) specifically for conversational and goal-oriented AI agents.
Bug Detection & Management: Identify, document, prioritize, and track bugs using Jira, performance degradations, and alignment failures in agent behavior.
Automation & Tools: Integrate evaluation pipelines into the CI/CD process to enable continuous quality assurance and fast iteration cycles.
Requirements
Experience: 6+ years in Software QA, with at least 2 years focused on testing or evaluating AI/ML systems, conversational agents, or Large Language Models (LLMs).
Safety Evals Expertise (Mandatory): Direct experience in designing and executing safety evaluations (red teaming, adversarial testing), bias detection, and measuring toxicity/harmful content in generative AI models.
Agent/LLM Evals: Proven experience developing and running general evaluations (Evals) for LLM-powered applications knowing libraries like PyTest (Must)
Google ADK Familiarity (Mandatory): Direct or strong conceptual understanding of the Google Agent Development Kit (ADK) and its components.
Programming: Strong proficiency in Python is mandatory for script development, data processing, and automation.
Cloud & MLOps: Familiarity with Google Cloud Platform (GCP) services relevant to AI/ML (e.g., Vertex AI) and integrating testing into MLOps workflows.
Tools and Libraries: Langsmith, DeepEval, Ragas, Giskard, Hugging face.
Werkstudent im SAP Project Management bei einem agilen IT - Beratungsunternehmen. Unterstützung bei SAP - Großprojekten, Prozessanalysen und Dokumentationen.
Project Manager overseeing child support program management support for OCSE initiatives. Ensuring effective planning, execution, and quality control across program management activities.
Project Manager delivering cost - effective IT infrastructure solutions aligned to global IT strategies. Ensuring project's delivery support and quality implementation throughout the development process.
Lead end - to - end execution of expansion projects for Bosta, disrupting logistics with technology. Own critical paths to deliver on strategic program objectives.
Project Manager overseeing client portfolios and large construction projects for ETAVIS Romandie. Ensuring quality and compliance while collaborating with technical teams and managing financial aspects.
Project Manager in construction managing heat and cold protection projects and leading project teams. Technical consulting for customers in process industry and building technology.
Junior Project Manager in the construction industry managing warmth, cold, and fire protection projects. Involves project planning, team leadership, and technical consulting for clients.
Development Project Manager leading telecom quality initiatives across multiple campuses for QTS Data Centers. Interacting with contractors and stakeholders to ensure project compliance and success.
Project Manager at Highmark Health leading planning and execution of healthcare projects. Ensuring timely delivery within budget and managing cross - functional team dynamics.