Senior AI Software Engineer focused on optimizing LLM inference performance at NVIDIA. Collaborating with teams to assess bottlenecks and validate improvements to compiler and runtime efficiency.
Responsibilities
Analyze the performance of LLMs on NVIDIA GPUs by employing advanced profiling and projection tools.
Find opportunities for performance improvements in the IR-based compiler middle end optimizer and/or in precompiled kernel optimizations driven by Graph IR transformations.
Build and develop new compiler passes and optimization techniques to deliver outstanding, robust, and maintainable compiler infrastructure and tools.
Collaborate closely with architecture teams to influence and co-design future hardware features that improve compiler and runtime efficiency.
Work with geographically distributed teams across compiler, hardware, kernel, and framework domains to drive performance improvements and resolve complex issues.
Contribute to a core team at the forefront of deep learning and LLM inference technology, spanning hardware architecture development, kernel optimization, and integration with higher-level deep learning frameworks.
Requirements
Master’s or PhD in Computer Science, Computer Engineering, or a related field, or equivalent experience.
5+ years relevant experience.
Strong hands-on programming expertise in C++ and Python, with solid software engineering fundamentals.
Skilled in innovative LLM architectures, covering inference optimization, profiling, and compiler-level performance tuning.
Significant background in optimizing kernels through information retrieval techniques and generating code, including graph transformations, fusion, scheduling, and developing custom kernel generation frameworks like OpenAI Triton or other compiler-based code generation pipelines.
Hands-on experience with deep learning frameworks like TensorRT-LLM, vLLM, SGLang, Jax/XLA, or related compiler/runtime environments.
Proven ability to analyze and optimize LLM performance bottlenecks across model development, kernel execution, and runtime systems.
Excellent communication and collaboration skills, with the ability to work independently and effectively across distributed teams in a fast-paced environment.
Display a robust determination to continuously improve software and hardware performance by engaging in profiling, analysis, and optimization.
Proficiency in CUDA programming and familiarity with GPU-accelerated deep learning frameworks and performance tuning techniques.
Benefits
equity
benefits
Job title
Senior AI Software Engineer, LLM Inference Performance Analysis
Software Developer Intern developing custom software solutions for Uline. Collaborate with IT experts in an Agile Scrum team and gain valuable technical experience.
Senior Full Stack Developer responsible for software products using modern technologies in a hybrid role at Emerson. Collaborating with teams to deliver high - quality deliverables and ensure efficient software development processes.
Software Engineer creating and maintaining AI shopping assistant systems at Skroutz. Collaborating with various teams to enhance the marketplace experience through AI.
Full - Stack Developer working on enterprise - grade solutions in fintech. Collaborating with global teams on complex product development in a caring and innovative environment.
Senior Full - Stack Software Engineer developing enterprise products for global fintech company. Collaborating with teams across Europe while driving automation and best practices in development.
Senior Tester ensuring the success of financial systems at SimCorp by developing and supporting enterprise products in investment operations. Collaborating with scrum teams to design and execute test strategies for complex financial applications.
Lead Developer role at SimCorp, guiding a Scrum team on enterprise - grade fintech solutions. Involve in technical architecture, mentoring, and automated testing leadership.
Principal Software Engineer at SimCorp developing and supporting enterprise products in the Collateral Product Area. Collaborating with teams across multiple countries to drive technical excellence and product quality.
Full - stack Engineer developing client - facing platform for land restoration projects at Cultivo. Join a motivated team on a mission to tackle climate and biodiversity crises.
Lead Software Engineer spearheading the development team at sustainability software company. Collaborating with various stakeholders to architect and enhance software for impact measurement.