Senior AI Software Engineer focused on optimizing LLM inference performance at NVIDIA. Collaborating with teams to assess bottlenecks and validate improvements to compiler and runtime efficiency.
Responsibilities
Analyze the performance of LLMs on NVIDIA GPUs by employing advanced profiling and projection tools.
Find opportunities for performance improvements in the IR-based compiler middle end optimizer and/or in precompiled kernel optimizations driven by Graph IR transformations.
Build and develop new compiler passes and optimization techniques to deliver outstanding, robust, and maintainable compiler infrastructure and tools.
Collaborate closely with architecture teams to influence and co-design future hardware features that improve compiler and runtime efficiency.
Work with geographically distributed teams across compiler, hardware, kernel, and framework domains to drive performance improvements and resolve complex issues.
Contribute to a core team at the forefront of deep learning and LLM inference technology, spanning hardware architecture development, kernel optimization, and integration with higher-level deep learning frameworks.
Requirements
Master’s or PhD in Computer Science, Computer Engineering, or a related field, or equivalent experience.
5+ years relevant experience.
Strong hands-on programming expertise in C++ and Python, with solid software engineering fundamentals.
Skilled in innovative LLM architectures, covering inference optimization, profiling, and compiler-level performance tuning.
Significant background in optimizing kernels through information retrieval techniques and generating code, including graph transformations, fusion, scheduling, and developing custom kernel generation frameworks like OpenAI Triton or other compiler-based code generation pipelines.
Hands-on experience with deep learning frameworks like TensorRT-LLM, vLLM, SGLang, Jax/XLA, or related compiler/runtime environments.
Proven ability to analyze and optimize LLM performance bottlenecks across model development, kernel execution, and runtime systems.
Excellent communication and collaboration skills, with the ability to work independently and effectively across distributed teams in a fast-paced environment.
Display a robust determination to continuously improve software and hardware performance by engaging in profiling, analysis, and optimization.
Proficiency in CUDA programming and familiarity with GPU-accelerated deep learning frameworks and performance tuning techniques.
Benefits
equity
benefits
Job title
Senior AI Software Engineer, LLM Inference Performance Analysis
Software Engineering Lead providing technical leadership for the AI Platform at Elsevier. Guiding design and delivery of shared AI services while mentoring engineers in a hybrid work environment.
Platform Engineer on Rancher team managing app integrations and Kubernetes management system at SUSE. Collaborating on deployments, configuration, and support with open - source tooling.
Software Developer creating innovative digital solutions for clients using Dynamics 365 CRM and Power Platform technologies. Engaging in project teamwork and providing technical guidance for client success.
Internship opportunity in Cloud Software Development offering Full - Stack responsibilities using Microsoft technologies. Collaboration with experienced colleagues in a hybrid work environment.
Software Developer creating modern software solutions using Microsoft technologies and cloud architectures. Collaborating with clients and working in agile teams on innovative projects.
Senior Software Developer creating scalable software solutions with Microsoft technologies. Collaborates with clients to analyze requirements and develops tailored cloud architectures in a hybrid working environment.
Software Engineer designing and operating infrastructure solutions for a fintech company. Collaborating with engineering teams to implement secure, scalable cloud environments.
Software Engineer developing internal tools and AI solutions for the User Ops team at Anysphere. Collaborating with operations teams to improve support efficiency using data - driven insights.
Intern assisting engineers in designing, developing, and implementing AI/ML solutions at pSemi Corporation. Supporting creation of Agent frameworks and automating RFIC design workflows.
Project Engineering Lead overseeing engineering activities in defense projects at Leonardo UK. Lead a multi - disciplinary team to ensure project management, technical quality, and customer collaboration.