AI Platform Systems Software Engineer responsible for designing core infrastructure for AI/ML workloads. Join eBay in building a next-generation AI platform for millions of users.
Responsibilities
Design and scale services to orchestrate AI/ML clusters across cloud and on-prem environments
Develop and optimize intelligent scheduling and resource management systems for heterogeneous compute clusters
Integrate Ray Train/Tune for large-scale distributed training workflows and Ray Serve for low-latency, autoscaled inference
Build features to improve reliability, performance, observability, and cost-efficiency of AI workloads at scale
Enhance the control plane to support secure multi-tenancy and enterprise-grade governance
Implement systems for container management, dependency resolution, and large-scale model distribution
Collaborate with ML researchers, applied scientists, and distributed systems engineers to drive platform innovation
Provide production support and work closely with field teams to resolve infrastructure issues
Requirements
Bachelor’s or Master’s degree in Computer Science, Engineering, or related field (or equivalent experience)
8-10 years of experience building and maintaining infrastructure for highly available, scalable, and performant distributed systems
Proven expertise with cloud-native technologies (AWS, GCP, Azure) and Kubernetes-based deployments
Hands-on experience running ML training and inference with Ray (ray.io)
Deep understanding of networking, security, authentication, and identity management in distributed/cloud environments
Hands-on experience with observability stacks (Prometheus, Grafana, OpenTelemetry, etc.)
Strong coding skills in Go and/or Python; familiarity with other systems-level languages is a plus
Knowledge of Linux internals, containers, and storage systems
Experience optimizing for GPU/accelerator integration (NVIDIA, AMD, TPU, etc.) is highly desirable
Benefits
Full range of medical benefits
Financial benefits
Various paid time off benefits, such as PTO and parental leave
Gen AI Engineer focusing on the design and development of Gen AI applications on cloud platforms. Seeking candidates with experience in data science and programming.
Staff AI Engineer creating scalable AI functionality for WEX. Collaborating with engineering teams to integrate AI components and maintain ML pipelines.
AI engineer developing scalable AI solutions for leading enterprises with a focus on machine learning and natural language processing. Join a team at the forefront of enterprise generative AI.
AI Engineer at WRITER shaping how enterprises harness superintelligence. Building AI solutions for productivity and innovation with enterprise - grade LLMs.
Customer AI architect role transforming enterprise needs into AI solutions while collaborating with customers and internal teams. Leadership role in developer complex applications leveraging AI technology.
AI Engineer designing and building frameworks for dynamic interactions with public health data. Collaborating with teams to ensure compliance and optimize performance in AI systems.
Trainee / Junior AI Engineer supporting real customer projects at WE BUILD AI. Working with AI applications including automation solutions and machine learning models in a hybrid environment.
Principal Software Engineer leading infrastructure initiatives for Workday AI Platform. Collaborating with teams to optimize and enhance an AI - focused technology stack.
Product Manager with Dell Technologies delivering new AI applications and building product - quality proof - of - concepts. Collaborating with cross - functional teams in a fast - paced, innovative environment.