Hybrid AI Systems Engineer – Inference Frameworks

Posted 3 hours ago

Apply now

About the role

  • AI Systems Engineer designing and building inference and optimization systems for core product. Work directly with founders, thriving in a zero-to-one environment.

Responsibilities

  • You’ll work directly with our founders to design and build the inference and optimization systems that power our core product.
  • This role bridges research and production, combining deep exploration of inference techniques with hands-on ownership of scalable, high-performance serving infrastructure.
  • You’ll own the full lifecycle of LLM inference—from experimentation and performance analysis to deployment and iteration in production—thriving in a zero-to-one environment and helping define the technical foundations of our inference stack.
  • design and build our LLM inference stack from zero to one, exploring and implementing advanced techniques for low-latency, high-throughput serving of language and multimodal models.
  • develop and optimize inference using modern frameworks (e.g., vLLM, SGLang, TensorRT-LLM), experimenting with batching strategies, KV-cache management, parallelism, and GPU utilization to push performance and cost efficiency.
  • collaborate closely with founders and model developers to analyze bottlenecks across the stack, co-optimizing model execution, infrastructure, and deployment pipelines.

Requirements

  • Strong experience building and optimizing LLM inference systems in production or research environments
  • Hands-on expertise with inference frameworks such as vLLM, SGLang, TensorRT-LLM, or similar
  • Deep performance mindset with experience in GPU-backed systems, latency/throughput optimization, and resource efficiency
  • Solid understanding of transformer inference, serving architectures, and KV-cache–based execution
  • Strong programming skills in Python; experience with CUDA, Triton, or C++ a plus
  • Comfort working in ambiguous, zero-to-one environments and driving research ideas into production systems
  • Nice to have: experience with model quantization or pruning, speculative decoding, multimodal inference, open-source contributions, or prior work in systems or ML research labs

Benefits

  • Flexible work: In-person collaboration in the Bay Area, a distributed global-first team, and quarterly offsites.
  • Adaption Passport: Annual travel stipend to explore a country you've never visited. We're building intelligence that evolves alongside you, so we encourage you to keep expanding your horizons.
  • Lunch Stipend: Weekly meal allowance for take-out or grocery delivery.
  • Well-Being: Comprehensive medical benefits and generous paid time off.

Job title

AI Systems Engineer – Inference Frameworks

Job type

Experience level

Mid levelSenior

Salary

Not specified

Degree requirement

No Education Requirement

Tech skills

Location requirements

Report this job

See something inaccurate? Let us know and we'll update the listing.

Report job