Support Agentforce performance evaluation at Salesforce, analyzing AI systems and refining operational frameworks. Collaborating with teams to ensure quality and effectiveness of new features.
Responsibilities
Support the Agentforce baselining program, using synthetic and automated tooling to continuously measure and improve performance.
Analyze evaluation results independently, identifying root causes, surfacing trends, and translating insights into actionable recommendations for models, implementations, and processes.
Maintain and evolve evaluation frameworks, scoring rubrics, and guidelines to ensure consistent, defensible, and scalable assessments.
Deliver clear, influential reporting and business reviews that inform stakeholders and drive product and operational decisions.
Define, monitor, and interpret key evaluation metrics, proactively identifying risks, regressions, and improvement opportunities.
Enable internal partners on evaluation processes and findings, building trust and shared understanding across teams.
Strengthen the evaluation feedback loop across automated testing, LLM-judge prompts, and golden datasets to continuously improve testing sophistication.
Perform targeted evaluations for new features and urgent initiatives, ensuring quality and market readiness.
Audit and refine the utterance repository to keep testing relevant, high quality, and aligned with evolving product capabilities.
Synthesize customer and internal feedback into actionable insights, helping shape product direction and operational improvements.
Advocate for tooling, process, and workflow improvements that increase evaluation efficiency, scalability, and reliability.
Proactively surface risks and partner on mitigations, ensuring issues are addressed before they impact customers.
Requirements
1+ years of professional experience working in Salesforce environments (program, analyst, operations, or product context).
Demonstrated ability to take ownership of tasks and drive outcomes independently.
Strong analytical mindset: comfortable reviewing conversational AI outputs, identifying failure patterns, conducting root cause analysis, and translating findings into actionable recommendations.
Operational rigor and attention to detail: able to execute repeatable evaluation workflows accurately and consistently in a fast-paced, ambiguous environment.
Clear written communication skills: able to document findings, produce internal documentation, and communicate insights concisely for cross-functional audiences.
Comfort working with data: proficiency in spreadsheets (e.g., Google Sheets), reporting, and basic dashboard interpretation to derive insights and track trends.
High reading comprehension and critical thinking: able to evaluate nuanced generative AI responses against quality standards and expected behaviors.
Tool fluency: ability to work confidently in Salesforce reporting environments (Agentforce, Tableau, Testing Center, Observability) or quickly ramp on similar tools.
Curiosity and learning agility: resourceful in exploring new tools, understanding evolving AI behaviors, and continuously improving evaluation approaches.
Execution reliability: responsive, accountable, and dependable in delivering accurate outputs and supporting operational needs.
Project Manager in electrical engineering managing projects for Messe Berlin. Focusing on technical evaluations and project leadership with team collaboration.
Assistent Programmamanager ter ondersteuning van de programmamanager in Amsterdam. Werken aan duurzame en efficiënte afval - en grondstoffenbeheerprocessen.
Omgevingsmanager coördineren werkzaamheden binnen het project Afval & Grondstoffen. Relaies onderhouden met stakeholders voor de kwaliteit van de dienstverlening.
Key Customer Development Manager responsible for strengthening partnerships with Walmart. Focusing on strategic alignment to drive profitable growth in a competitive retail environment.
Shopper Development Manager coordinating execution of commercial strategies for a global FMCG company. Focusing on connecting strategy with performance at the shelf in a dynamic retail environment.
Business Unit Leader driving Microsoft Dynamics 365 initiatives with profit responsibility. Leading projects, managing teams, and ensuring delivery quality in a hybrid environment.
Pflegemanager educating and advising members in the field of health and social services. Focused on solutions for specific questions and representing interests to authorities.
Responsable d'animation handling animation activities at VTF vacation village in Agde for holiday seasons. Leading both adult and children animations, ensuring a creative and joyful environment.
Responsable de magasin at La Mie de Pain ensuring excellent customer service by managing store operations. Leading a team and optimizing sales while maintaining store quality.
Responsable d'équipe de production gérant l'activité administrative chez GESTFORM, un prestataire de gestion documentaire. Supervisant le travail d'opérateurs administratifs et garantissant la satisfaction client.