Hybrid Principal Software Engineer – Libfabric, User-Space Networking

Posted yesterday

Apply now

About the role

  • Principal Software Engineer developing innovative libfabric solutions within HPE's high performance computing unit. Collaborating on cutting-edge technology to redefine data-intensive workloads.

Responsibilities

  • Architect and implement libfabric CXI providers (user-space and kernel-assisted paths): Endpoint, MR, CQ, and queue models SR-IOV-aware abstractions and resource sharing
  • Develop and maintain libcxi and related user-space libraries: Efficient interaction with the CXI User Driver Retry handling, error propagation, and performance tuning
  • Enable and optimize CXI support across ecosystem components: MPI (OpenMPI, MPICH) NCCL / RCCL / GPU-aware communication SHMEM, storage, and AI frameworks
  • Drive performance engineering: Latency, bandwidth, and scaling across multi-NIC and multi-GPU systems Benchmarking, profiling, and customer workload analysis
  • Collaborate with kernel driver teams to co-design clean, scalable APIs
  • Contribute to upstream communities (libfabric, OpenMPI, NCCL) as appropriate
  • Provide technical leadership through design reviews, mentoring, and roadmap definition

Requirements

  • 10+ years of experience in systems or user-space networking software
  • Strong expertise in: libfabric, RDMA concepts, or high-performance communication APIs
  • C/C++ systems programming
  • Deep understanding of user/kernel interaction models and performance tradeoffs
  • Experience debugging complex distributed and multi-node systems
  • Experience with HPC or AI communication stacks (MPI, NCCL, SHMEM)
  • GPU-aware networking and GPUDirect-style architectures
  • Familiarity with virtualization or SR-IOV impacts on user-space libraries
  • Hands-on development experience with carrier-grade or data-center NOS platforms, such as: Cisco IOS-XR Juniper Junos Arista EOS Or equivalent Linux-based network operating systems
  • Strong understanding of Linux kernel and user-space interactions in networking stacks: Netlink, netdev, sockets, offload paths
  • Kernel modules, drivers, or platform abstraction layers
  • Experience working with high-performance data plane components, including: Packet processing pipelines Queueing, scheduling, QoS, and congestion management Hardware offloads and ASIC-facing software layers
  • Alternate / Equivalent Skill Set (Network Operating Systems Background): Candidates with a strong Network Operating System (NOS) background may be considered, provided they demonstrate deep systems expertise and the ability to work close to hardware and performance-critical paths.

Benefits

  • Health & Wellbeing: We strive to provide our team members and their loved ones with a comprehensive suite of benefits that supports their physical, financial and emotional wellbeing.
  • Personal & Professional Development: We also invest in your career because the better you are, the better we all are. We have specific programs catered to helping you reach any career goals you have — whether you want to become a knowledge expert in your field or apply your skills to another division.
  • Unconditional Inclusion: We are unconditionally inclusive in the way we work and celebrate individual uniqueness. We know varied backgrounds are valued and succeed here. We have the flexibility to manage our work and personal needs. We make bold moves, together, and are a force for good.

Job title

Principal Software Engineer – Libfabric, User-Space Networking

Job type

Experience level

Lead

Salary

Not specified

Degree requirement

No Education Requirement

Tech skills

Location requirements

Report this job

See something inaccurate? Let us know and we'll update the listing.

Report job