Hybrid AI Trace Generation Engineer

Posted 1 hour ago

Apply now

About the role

  • AI Trace Generation Engineer designing and implementing trace collection systems for LLM workloads. Analyzing distributed AI workload behavior across multi-GPU and multi-node setups.

Responsibilities

  • Design and implement a trace collection system for distributed LLM workloads
  • Validate that collected traces accurately reflect real workload behavior
  • Integrate with and instrument major LLM frameworks to extract meaningful execution data
  • Use collected traces as input to discrete event simulations
  • Analyze trace data to surface bottlenecks and inefficiencies across the stack

Requirements

  • 3+ years of experience in AI systems, ML infrastructure, or a closely related area
  • Hands-on experience with at least one major LLM serving or training framework
  • Strong proficiency in Python and C++
  • Solid understanding of GPU architecture, memory bandwidth, and the difference between compute-bound and memory-bound operations
  • Solid understanding of distributed communication
  • Familiarity with parallelism strategies and how they shape execution behavior across large clusters
  • Open source contributions or published research in relevant areas will definitely be appreciated
  • Previous startup experience is a plus

Benefits

  • Competitive compensation with a performance-based incentive
  • Subsidized Deutschlandticket
  • Access to a discount portal
  • Flexible hours with hybrid and remote-friendly options
  • Relocation support

Job title

AI Trace Generation Engineer

Job type

Experience level

Mid levelSenior

Salary

Not specified

Degree requirement

Bachelor's Degree

Location requirements

Report this job

See something inaccurate? Let us know and we'll update the listing.

Report job