Hybrid Senior AI Software Engineer, LLM Inference Performance Analysis

Posted 2 weeks ago

Apply now

About the role

  • Analyze the performance of LLMs on NVIDIA GPUs by employing advanced profiling and projection tools.
  • Find opportunities for performance improvements in the IR-based compiler middle end optimizer and/or in precompiled kernel optimizations driven by Graph IR transformations.
  • Build and develop new compiler passes and optimization techniques to deliver outstanding, robust, and maintainable compiler infrastructure and tools.
  • Collaborate closely with architecture teams to influence and co-design future hardware features that improve compiler and runtime efficiency.
  • Work with geographically distributed teams across compiler, hardware, kernel, and framework domains to drive performance improvements and resolve complex issues.
  • Contribute to a core team at the forefront of deep learning and LLM inference technology, spanning hardware architecture development, kernel optimization, and integration with higher-level deep learning frameworks.

Requirements

  • Master’s or PhD in Computer Science, Computer Engineering, or a related field, or equivalent experience.
  • 5+ years relevant experience.
  • Strong hands-on programming expertise in C++ and Python, with solid software engineering fundamentals.
  • Skilled in innovative LLM architectures, covering inference optimization, profiling, and compiler-level performance tuning.
  • Significant background in optimizing kernels through information retrieval techniques and generating code, including graph transformations, fusion, scheduling, and developing custom kernel generation frameworks like OpenAI Triton or other compiler-based code generation pipelines.
  • Hands-on experience with deep learning frameworks like TensorRT-LLM, vLLM, SGLang, Jax/XLA, or related compiler/runtime environments.
  • Proven ability to analyze and optimize LLM performance bottlenecks across model development, kernel execution, and runtime systems.
  • Excellent communication and collaboration skills, with the ability to work independently and effectively across distributed teams in a fast-paced environment.
  • Display a robust determination to continuously improve software and hardware performance by engaging in profiling, analysis, and optimization.
  • Proficiency in CUDA programming and familiarity with GPU-accelerated deep learning frameworks and performance tuning techniques.

Benefits

  • equity
  • benefits

Job title

Senior AI Software Engineer, LLM Inference Performance Analysis

Job type

Experience level

Senior

Salary

$148,000 - $287,500 per year

Degree requirement

Postgraduate Degree

Tech skills

Location requirements

Report this job

See something inaccurate? Let us know and we'll update the listing.

Report job