Hybrid AI Inference Engineer – Model Optimization, Deployment

Posted 6 hours ago

Apply now

About the role

  • Model Optimization & Deployment Engineer optimizing large-scale ML models for Zoox's autonomous vehicle technology. Focused on deployment for efficient real-time execution in vehicles.

Responsibilities

  • Optimize large-scale models (LLMs, VLMs) using advanced quantization (PTQ, QAT), mixed-precision inference workflows, and parameter-efficient fine-tuning (LoRA, QLoRA).
  • Architect and implement model conversion and compilation pipelines using TensorRT and TensorRT-LLM for edge deployment.
  • Perform rigorous parity checking, accuracy recovery, and latency benchmarking between PyTorch frameworks and compiled edge binaries.
  • Write and optimize custom CUDA kernels and TensorRT Plugins to maximize memory bandwidth and minimize latency on AI accelerators.
  • Write production-level, highly concurrent, and memory-safe C++ and Python code for real-time inference on vehicle SOCs.

Requirements

  • Deep expertise in model quantization (PTQ, QAT) and mixed-precision inference workflows (INT8, FP8, INT4, BF16/FP16).
  • Proven experience optimizing large-scale models (LLMs, VLMs, or VLAs) utilizing KV-cache optimization (e.g., PagedAttention), Speculative Decoding, and Efficient Attention mechanisms (FlashAttention, Linear Attention).
  • Extensive experience with model conversion/compilation pipelines (TensorRT, TensorRT-LLM) and performing rigorous parity/latency benchmarking.
  • Proficiency in low-level programming for AI accelerators, specifically writing and optimizing custom CUDA kernels and TensorRT Plugins.
  • Production-level C++ (14/17/20) and Python programming skills, with experience writing concurrent, memory-safe, real-time inference code for edge devices.

Benefits

  • Paid time off (e.g. sick leave, vacation, bereavement)
  • Unpaid time off
  • Zoox Stock Appreciation Rights
  • Amazon RSUs
  • Health insurance
  • Long-term care insurance
  • Long-term and short-term disability insurance
  • Life insurance

Job title

AI Inference Engineer – Model Optimization, Deployment

Job type

Experience level

Mid levelSenior

Salary

$242,000 - $290,000 per year

Degree requirement

Bachelor's Degree

Tech skills

Location requirements

Report this job

See something inaccurate? Let us know and we'll update the listing.

Report job