Hybrid AI Inference Engineer

Posted last month

Apply now

About the role

  • AI Inference Engineer developing AI model optimizations for Quadric's GPNPU platforms. Porting and benchmarking AI models to enhance performance in edge devices.

Responsibilities

  • Quantize, prune and convert models for deployment
  • Port models to Quadric platform using Quadric toolchain
  • Optimize inference deployment for latency, speed
  • Benchmark and profile model performance and accuracy
  • Develop tools to scale and speed up the deployment
  • Make Improvement to SDK and runtime
  • Provide technical support and documents to customers and developer community

Requirements

  • Bachelor’s or Master’s in Computer Science and/or Electric Engineering.
  • 5+ years of experience in AI/LLM model inference and deployment frameworks/tools
  • experience with model quantization (PTQ, QAT) and tools
  • experience with model accuracy measures
  • experience with model inference performance profiling
  • experience with at least one of the following frameworks: onnxruntime, Pytorch, vLLM, huggingface-transformer, neural-compressor, llamacpp
  • Proficiency in C/C++ and Python
  • Demonstrate good capability in problem solving, debug and communication

Benefits

  • Health Care Plan (Medical, Dental & Vision)
  • Retirement Plan (401k, IRA)
  • Life Insurance (Basic, Voluntary & AD&D)
  • Paid Time Off (Vacation, Sick & Public Holidays)
  • Family Leave (Maternity, Paternity)
  • Short Term & Long Term Disability
  • Training & Development
  • Work From Home
  • Free Food & Snacks
  • Stock Option Plan

Job title

AI Inference Engineer

Job type

Experience level

Mid levelSenior

Salary

Not specified

Degree requirement

Bachelor's Degree

Tech skills

Location requirements

Report this job

See something inaccurate? Let us know and we'll update the listing.

Report job