Senior AI Software Engineer focused on optimizing LLM inference performance at NVIDIA. Collaborating with teams to assess bottlenecks and validate improvements to compiler and runtime efficiency.
Responsibilities
Analyze the performance of LLMs on NVIDIA GPUs by employing advanced profiling and projection tools.
Find opportunities for performance improvements in the IR-based compiler middle end optimizer and/or in precompiled kernel optimizations driven by Graph IR transformations.
Build and develop new compiler passes and optimization techniques to deliver outstanding, robust, and maintainable compiler infrastructure and tools.
Collaborate closely with architecture teams to influence and co-design future hardware features that improve compiler and runtime efficiency.
Work with geographically distributed teams across compiler, hardware, kernel, and framework domains to drive performance improvements and resolve complex issues.
Contribute to a core team at the forefront of deep learning and LLM inference technology, spanning hardware architecture development, kernel optimization, and integration with higher-level deep learning frameworks.
Requirements
Master’s or PhD in Computer Science, Computer Engineering, or a related field, or equivalent experience.
5+ years relevant experience.
Strong hands-on programming expertise in C++ and Python, with solid software engineering fundamentals.
Skilled in innovative LLM architectures, covering inference optimization, profiling, and compiler-level performance tuning.
Significant background in optimizing kernels through information retrieval techniques and generating code, including graph transformations, fusion, scheduling, and developing custom kernel generation frameworks like OpenAI Triton or other compiler-based code generation pipelines.
Hands-on experience with deep learning frameworks like TensorRT-LLM, vLLM, SGLang, Jax/XLA, or related compiler/runtime environments.
Proven ability to analyze and optimize LLM performance bottlenecks across model development, kernel execution, and runtime systems.
Excellent communication and collaboration skills, with the ability to work independently and effectively across distributed teams in a fast-paced environment.
Display a robust determination to continuously improve software and hardware performance by engaging in profiling, analysis, and optimization.
Proficiency in CUDA programming and familiarity with GPU-accelerated deep learning frameworks and performance tuning techniques.
Benefits
equity
benefits
Job title
Senior AI Software Engineer, LLM Inference Performance Analysis
Full - Stack Software Engineer at GovWell building AI - powered solutions for government services. Working across the stack to deliver features that improve public service efficiency.
Senior Manager managing CNB integration initiatives at RBC, focusing on engineering delivery and program governance. Engaging with technology teams to ensure successful project execution and reporting.
Software Engineer II at Carelon optimizing large - scale healthcare data solutions using Snowflake and Microsoft Data Fabric. Collaborating with stakeholders to develop impactful data solutions.
Senior Software Engineer designing and developing scalable data solutions using Snowflake and Microsoft Data Fabric at Carelon. Collaborating on healthcare data projects with technical data solutions.
Software Engineer working on scalable LLM and AI systems at Carelon Global Solutions. Responsibilities include building LLM model pipelines, collaborating with various teams, and mentoring junior engineers.
Dashboard Product Engineer overseeing the AIX Dashboard product at Applied Materials. Driving roadmap clarity and stakeholder alignment while ensuring adoption and collaboration across teams.
Senior Software Engineer driving AI innovation for Fortune 500 energy leader and AI Fund. Building systems to optimize the operation and management of critical assets in energy supply.
Intermediate Software Developer joining Aspire Software for cloud platform development in Lebanon. Responsible for full stack coding and collaboration with teams on project implementation.