AI Platform Systems Software Engineer responsible for designing core infrastructure for AI/ML workloads. Join eBay in building a next-generation AI platform for millions of users.
Responsibilities
Design and scale services to orchestrate AI/ML clusters across cloud and on-prem environments
Develop and optimize intelligent scheduling and resource management systems for heterogeneous compute clusters
Integrate Ray Train/Tune for large-scale distributed training workflows and Ray Serve for low-latency, autoscaled inference
Build features to improve reliability, performance, observability, and cost-efficiency of AI workloads at scale
Enhance the control plane to support secure multi-tenancy and enterprise-grade governance
Implement systems for container management, dependency resolution, and large-scale model distribution
Collaborate with ML researchers, applied scientists, and distributed systems engineers to drive platform innovation
Provide production support and work closely with field teams to resolve infrastructure issues
Requirements
Bachelor’s or Master’s degree in Computer Science, Engineering, or related field (or equivalent experience)
8-10 years of experience building and maintaining infrastructure for highly available, scalable, and performant distributed systems
Proven expertise with cloud-native technologies (AWS, GCP, Azure) and Kubernetes-based deployments
Hands-on experience running ML training and inference with Ray (ray.io)
Deep understanding of networking, security, authentication, and identity management in distributed/cloud environments
Hands-on experience with observability stacks (Prometheus, Grafana, OpenTelemetry, etc.)
Strong coding skills in Go and/or Python; familiarity with other systems-level languages is a plus
Knowledge of Linux internals, containers, and storage systems
Experience optimizing for GPU/accelerator integration (NVIDIA, AMD, TPU, etc.) is highly desirable
Benefits
Full range of medical benefits
Financial benefits
Various paid time off benefits, such as PTO and parental leave
Senior AI Engineer contributing to the technical architecture and implementation of multi - agent AI systems at Chip. Collaborating with teams to build and scale AI infrastructure for customer - facing products.
Senior AI Specialist leading the delivery of Generative AI solutions at Fiera Capital. Focused on architecture, design, and cross - functional collaboration for scalable deployment.
Senior AI Engineer developing AI - powered features for SecurityScorecard's cyber health solutions. Collaborating on building intelligent automation and delivering customer - facing product enhancements.
Staff AI Engineer leading AI capabilities development for security professionals at SecurityScorecard. Designing and shipping AI - powered initiatives with ownership in a dynamic team environment.
AI Lead managing AI Unit focused on IT/OT - Security at R&C Request GmbH. Leading business impact projects and collaborating cross - functionally in a hybrid work environment.
AI Solutions & Engineering Lead at Fourth overseeing AI engineering decisions and building production - grade AI systems. Joining the Enterprise Data & Intelligence team reshaping business operations through AI solutions.
AI Engineer designing and developing agentic and multimodal AI systems at WongDoody. Collaborating with cross - functional teams to create engaging AI - driven experiences.