ML Inference Router Engineer designing scalable inference systems at eBay. Aiming to support billions of daily requests with a focus on reliability and efficiency.
Responsibilities
Design and build an LLM inference gateway that scales to billions of daily requests with millisecond-level latency.
Develop intelligent request routing, load balancing, and fallback mechanisms across heterogeneous LLM backends (internal and external).
Optimize throughput, cost, and reliability of inference workloads in multi-tenant environments.
Collaborate with platform, research, and product teams to integrate new models and agentic capabilities into the gateway.
Implement observability, tracing, and autoscaling for inference traffic across Kubernetes-based clusters.
Conduct design and code reviews to ensure high standards in distributed systems architecture.
Stay current with advances in LLM serving, inference acceleration, and model APIs to continuously evolve the platform.
Requirements
10+ years of experience building large-scale, fault-tolerant, high-performance distributed systems.
Strong programming skills in one or more of Java, Go, Rust, or C++ (Java preferred for gateway services).
Deep understanding of networking, concurrency, memory management, and performance tuning in production systems.
Proven experience designing and operating low-latency APIs at very large scale (10M+ QPS).
Hands-on experience with Kubernetes, service meshes, and container orchestration at scale.
Strong background in cloud infrastructure (AWS, GCP, Azure) and distributed system design.
Benefits
full range of medical benefits
financial benefits
various paid time off benefits, such as PTO and parental leave
Design and implement advanced clinical systems for ICON plc, ensuring alignment with study protocols and driving innovation to meet regulatory requirements.
Join Atos as a Data Engineer Snowflake & DBT, working on innovative data solutions. Engage in projects for diverse clients with a focus on Snowflake architecture optimization.
Process Engineer optimizing industrial water treatment plants at Xylem. Focusing on troubleshooting, innovation, and support for safety and efficiency improvements.
Senior Electric Propulsion Engineer at ICEYE shaping the future of Earth - Observation satellite propulsion systems. Leading design and integration of Hall Effect Thruster systems for space mobility.
Customer Support Engineer supporting Gas Power assets in Europe, focusing on technical resolutions and customer experience. Collaborating with engineering teams and managing customer relationships.
Senior IT Services Engineer managing critical incidents and leading IT projects for hybrid environments. Ensuring service quality and implementing automation to enhance IT support efficiency.
Process Engineer driving evolution on lime production technologies and CO₂ innovations within the Decarbonization Team. Ownership of carbon capture technologies development cycle applied to lime kilns.
Electrical Engineer conducting power network studies within Stantec for energy transition projects. Involves collaboration on challenging electrical network projects contributing to sustainable solutions.
Engineer in Training supporting multi - disciplinary teams on water resources projects by assisting in civil engineering assignments and fieldwork for various projects.