Hybrid Principal Software Engineer – Scale-Up Networking, GPU-Centric

Posted yesterday

Apply now

About the role

  • Principal Software Engineer developing GPU-aware networking solutions for HPE, leading architecture and performance optimization efforts in high-performance computing.

Responsibilities

  • Architect & Deliver Scale-Up Networking
  • Design and implement GPU-aware networking paths for high-bandwidth, low-latency intra-node communication
  • Develop and optimize GPU → NIC → GPU data movement, shared memory models, and DMA pathways
  • Work with NVIDIA CUDA, NVLink, NCCL, and AMD ROCm, InfinityFabric, RCCL teams to integrate and optimize scale-up communication semantics
  • Drive improvements to DMA engines, BAR mappings, ATS/IOMMU, and GPU memory registration workflows
  • Enhance and extend Libfabric, UCX, CXI, SHMEMX, OpenMPI for GPU-accelerated scale-up workflows
  • Optimize communication collectives, transport layers, and GPU-direct capabilities
  • Characterize and tune multi-NIC per socket, NUMA-zone mapping, GPU locality, CQ/queue design, and CPU/GPU topology optimization
  • Lead upstream contributions to open-source projects (OFI, UCX, OpenMPI, RCCL/NCCL enablement)
  • Partner with HPC/AI ecosystem teams to shape future architectures
  • Own complex debugging across driver, runtime, GPU, kernel, and user-space boundaries
  • Develop profiling workflows using Nsight, ROCm tools, eBPF, perf, etc.

Requirements

  • 10–15+ years building high-performance networking, GPU, or kernel-level software
  • Deep expertise in C/C++, Linux internals, memory management, RDMA, PCIe, IOMMU, ATS, DMA engines
  • Strong understanding of CUDA, ROCm, GPU memory models, P2P, GDS (GPUDirect Storage), GDR (GPUDirect RDMA)
  • Hands-on experience with MPI, SHMEM, Libfabric, UCX, or similar communication stacks
  • Proven experience driving architecture, cross-org technical decisions, and upstream contributions
  • Ability to mentor senior engineers, influence multi-team designs, and own end-to-end delivery.

Benefits

  • Health & Wellbeing
  • Personal & Professional Development
  • Unconditional Inclusion

Job title

Principal Software Engineer – Scale-Up Networking, GPU-Centric

Job type

Experience level

Lead

Salary

Not specified

Degree requirement

No Education Requirement

Tech skills

Location requirements

Report this job

See something inaccurate? Let us know and we'll update the listing.

Report job