AI Platform Systems Software Engineer responsible for designing core infrastructure for AI/ML workloads. Join eBay in building a next-generation AI platform for millions of users.
Responsibilities
Design and scale services to orchestrate AI/ML clusters across cloud and on-prem environments
Develop and optimize intelligent scheduling and resource management systems for heterogeneous compute clusters
Integrate Ray Train/Tune for large-scale distributed training workflows and Ray Serve for low-latency, autoscaled inference
Build features to improve reliability, performance, observability, and cost-efficiency of AI workloads at scale
Enhance the control plane to support secure multi-tenancy and enterprise-grade governance
Implement systems for container management, dependency resolution, and large-scale model distribution
Collaborate with ML researchers, applied scientists, and distributed systems engineers to drive platform innovation
Provide production support and work closely with field teams to resolve infrastructure issues
Requirements
Bachelor’s or Master’s degree in Computer Science, Engineering, or related field (or equivalent experience)
8-10 years of experience building and maintaining infrastructure for highly available, scalable, and performant distributed systems
Proven expertise with cloud-native technologies (AWS, GCP, Azure) and Kubernetes-based deployments
Hands-on experience running ML training and inference with Ray (ray.io)
Deep understanding of networking, security, authentication, and identity management in distributed/cloud environments
Hands-on experience with observability stacks (Prometheus, Grafana, OpenTelemetry, etc.)
Strong coding skills in Go and/or Python; familiarity with other systems-level languages is a plus
Knowledge of Linux internals, containers, and storage systems
Experience optimizing for GPU/accelerator integration (NVIDIA, AMD, TPU, etc.) is highly desirable
Benefits
Full range of medical benefits
Financial benefits
Various paid time off benefits, such as PTO and parental leave
Staff AI Engineer developing first AI Engineering Co - Pilot for Black Semiconductor's process and device engineering. Utilizing complex datasets to produce insights and predictive models for improved processes.
Senior Full Stack Engineer developing scalable SaaS solutions for logistics at Aspire Software. Focusing on React, TypeScript, and Jakarta EE for end - to - end product development.
AI Product Lead responsible for identifying and building AI - powered product solutions at Aspire Software. Engaging directly with customers to ensure real outcomes and value creation.
Junior AI Engineer at WEP Clinical applying Microsoft tools to support AI solutions. Collaborating with stakeholders to improve AI workflows and drive automation initiatives.
Senior Director of Data & AI Engineering leading enterprise platforms for SLC Management. Overseeing strategy, architecture, and execution while fostering a performance - driven culture.
Develop AI solutions utilizing language models at Grupo Iter for enhancing products and decision - making. Collaborate across teams to integrate AI technologies effectively.
Intern in AI Engineering focused on LLMs and automation in a tech - driven team. Working on innovative AI projects and contributing to the development of automated systems.
Junior AI Engineer developing agentic systems for AI fintech solutions in healthcare. Collaborating in agile team to create impactful and innovative AI applications.
AI Engineer at Nightfall developing AI systems to prevent data leaks for leading organizations. Collaborating with engineers to enhance AI models and drive operational excellence in data protection.
AI Engineer role in the gaming industry focusing on building and deploying generative AI solutions. Collaborate with data, IT, and business teams to integrate AI capabilities.