Senior AI Software Engineer focused on optimizing LLM inference performance at NVIDIA. Collaborating with teams to assess bottlenecks and validate improvements to compiler and runtime efficiency.
Responsibilities
Analyze the performance of LLMs on NVIDIA GPUs by employing advanced profiling and projection tools.
Find opportunities for performance improvements in the IR-based compiler middle end optimizer and/or in precompiled kernel optimizations driven by Graph IR transformations.
Build and develop new compiler passes and optimization techniques to deliver outstanding, robust, and maintainable compiler infrastructure and tools.
Collaborate closely with architecture teams to influence and co-design future hardware features that improve compiler and runtime efficiency.
Work with geographically distributed teams across compiler, hardware, kernel, and framework domains to drive performance improvements and resolve complex issues.
Contribute to a core team at the forefront of deep learning and LLM inference technology, spanning hardware architecture development, kernel optimization, and integration with higher-level deep learning frameworks.
Requirements
Master’s or PhD in Computer Science, Computer Engineering, or a related field, or equivalent experience.
5+ years relevant experience.
Strong hands-on programming expertise in C++ and Python, with solid software engineering fundamentals.
Skilled in innovative LLM architectures, covering inference optimization, profiling, and compiler-level performance tuning.
Significant background in optimizing kernels through information retrieval techniques and generating code, including graph transformations, fusion, scheduling, and developing custom kernel generation frameworks like OpenAI Triton or other compiler-based code generation pipelines.
Hands-on experience with deep learning frameworks like TensorRT-LLM, vLLM, SGLang, Jax/XLA, or related compiler/runtime environments.
Proven ability to analyze and optimize LLM performance bottlenecks across model development, kernel execution, and runtime systems.
Excellent communication and collaboration skills, with the ability to work independently and effectively across distributed teams in a fast-paced environment.
Display a robust determination to continuously improve software and hardware performance by engaging in profiling, analysis, and optimization.
Proficiency in CUDA programming and familiarity with GPU-accelerated deep learning frameworks and performance tuning techniques.
Benefits
equity
benefits
Job title
Senior AI Software Engineer, LLM Inference Performance Analysis
Staff Engineer developing Saviynt's AI - powered identity platform for enterprise security solutions. Collaborating on software design, development, and deployment with engineering teams in a hybrid setup.
Principal Engineer developing AI - powered identity solutions at Saviynt. Managing complex applications while collaborating with cross - functional teams and adhering to agile principles.
Software Engineer developing innovative technology solutions for Oliver Bernard. Collaborating with teams to build applications and enhance client experiences while working in London.
Fullstack Software Engineer at Cloudflare designing, building, and scaling domain management tools. Join a passionate engineering team for innovative product creation.
Software Developer for medical imaging and data processing solutions in clinical trials at Antaros Medical. Collaborating with clinical teams to deliver compliant software for MR and PET images.
Product Engineer working on air handling units at Johnson Controls. Engaging in engineering work and recommending solutions for product design and development.
Staff Software Engineer leading the design and development of an AI - powered Banker Workbench feature for CBA. Focused on front - end leadership and modernizing banking technology.
Intern role in software engineering at Airwallex providing hands - on project experience and personal mentorship while collaborating with innovative team.
Software Engineer developing and implementing automation systems at Actemium Controlmatic. Collaborating in interdisciplinary teams and supporting project execution in Berlin.
Software Engineer developing software for thermal management systems on GM’s electric vehicles. Collaborating in an Agile team responsible for control and diagnostics software development.