Staff AI Software Engineer creating trustworthy AI solutions for enterprises. Leading development of observability metrics and tools for AI applications in a hybrid work environment.
Responsibilities
Design and build core services and components of a world-class cloud platform to help enterprises develop, monitor and improve their full suite of AI based applications (covering predictive models, LLMs, GenAI models and agentic applications)
Lead the design and implementation of distributed systems, microservices and applications that compute, persist, and expose new ML + agentic observability metrics (e.g., response relevancy, hallucination scores) from raw trace data
Spearhead the development of new types of metrics and evaluation capabilities to satisfy evolving customer needs around agentic applications. Take part in conversations with customers around discovery and support
Developer in-house AI Agents and GenAI capabilities to augment the Fiddler observability products
Define and evolve the operational maturity (reliability, observability, SLOs, observability) of core services and components, establish best practices and champion improvements across the team
Team & Culture Building: you will take an active role in building a world-class engineering team and actively participate in the talent acquisition process through interviewing, candidate evaluation and coaching**
Requirements
Masters or Bachelors degree in Computer Science or related field, combined with 7+ years of industry experience, with demonstrated solid foundation in software development.
Experience with deploying and working with ML/LLM models in production.
Experience building, deploying and monitoring agentic applications using common frameworks like Langchain, Google ADK, Amazon Strands, OpenAI and building and integrating MCP Servers
Hands-on experience working with OpenTelemetry, distributed tracing and LLM-as-a-judge techniques
Deep proficiency with Python and a strong command of essential backend technologies like Postgres, Redis, Kafka, RabbitMQ, Ray. This includes the ability to design, build, and debug complex, large-scale systems.
Adaptability & Ownership: proven ability to thrive in ambiguity and a fast-paced environment. We need a self-motivated initiator who can take ownership of projects with a high degree of autonomy, confidently filling in the gaps when the full picture isn't available.
System Design & Optimization: A strong grasp of distributed systems and the capacity to troubleshoot production issues.
Technical Leadership & Collaboration: Demonstrated ability to plan, execute, and deliver projects by effectively breaking down complex problems into manageable tasks, and guiding a small team of engineers. Must be adept at cross-functional collaboration across a geographically distributed team, working closely with product managers, designers, frontend developers, and data scientists to ensure alignment and successful project outcomes
Coaching & Mentorship: you should be an excellent collaborator and a mentor to other team members, raising the technical bar for the entire team and regularly engage in code and design reviews.
Ability to work in our Palo Alto office 3 days a week.
Full - Stack Developer building scalable web applications using React.js and Python frameworks at Expleo. Collaborating with designers and developers to deliver high - quality software solutions.
Software Engineer delivering features and fixing issues in an engineering team for eCommerce automation leader. Engaging in quality collaboration and proactively contributing to team improvement.
UI Senior Software Engineer developing modern web applications for S&P Global Mobility. Collaborating with cross - functional teams to enhance user experience and maintain high - quality delivery.
Principal Engineer in HBM Design - Technology Enablement at Micron Technology, focusing on semiconductor design and mentoring. Collaborating on HBM design/product roadmaps and addressing scaling challenges.
Software Developer (BI with Qlik Sense/View) focused on operational support at Hitss. Engaging in data integration, performance monitoring, and user assistance.
Lead Software Engineer overseeing software engineering practices at Capgemini. Applying scientific methods to solve software engineering problems and responsible for the development of software solutions.
Software Engineer developing, maintaining, and optimizing software solutions/applications at Capgemini. Collaborating with other engineers and solving complex software problems in a team environment.
Staff Engineer, Hardware Design developing electrical systems for product development at Celestica. Leading technical solutions for complex projects involving cross - functional teams in multiple domains.
Senior Software Engineer at NetApp designing and implementing StorageGRID object storage solutions. Collaborating in a flexible hybrid work environment to tackle challenges in AI data lakes.
Senior Software Engineer developing AWS cloud compatible StorageGRID object storage at NetApp. Involves architecture, development, and mentoring within a flexible hybrid work environment.