Hybrid Principal Systems Engineer – At-Scale

Posted 7 hours ago

Apply now

About the role

  • Principal Systems At-Scale Engineer deploying strategies to improve large-scale data center clusters. Collaborating with visionary professionals to optimize systems in AI and GPU computing.

Responsibilities

  • Deploy strategies to analyze and collect debugging and anomaly signals from large fleets of clusters to improve quality and experience.
  • Build and expand debugging tools to identify, diagnose, and recover out-of-service systems, growing customer-available capacity.
  • Author and deploy "fault signatures" and automated recovery rules.
  • Lead cross-team task forces to address undefined failure modes in high-value AI/GPU systems, cutting backlogs through data-driven isolation.
  • Leverage AI, analytics, and efficiency tools to scale debug efforts, turning manual triage into productized, automated code.
  • Act as a technical leader and cultural anchor.
  • Mentor junior and senior engineers.
  • Encourage organizational health initiatives.
  • Promote innovation through hackathons and sharing sessions.

Requirements

  • 15+ years of experience in systems debugging at scale and debugging components of large fleets.
  • BS/MS Computer Science or related field (or equivalent experience)
  • Proven understanding of performance clusters, infrastructure, and workload patterns.
  • Knowledge and experience with telemetry and at-scale analytics for large platforms.
  • Experience using and installing fleets of Linux-based server platforms.
  • C/Python/Bash/Lua programming/scripting experience.
  • Experience working with engineering or academic research community supporting performance engineering or deep learning.
  • Strong teamwork and both verbal and written communication skills.

Benefits

  • equity
  • benefits

Job title

Principal Systems Engineer – At-Scale

Job type

Experience level

Lead

Salary

$272,000 - $431,250 per year

Degree requirement

Bachelor's Degree

Tech skills

Location requirements

Report this job

See something inaccurate? Let us know and we'll update the listing.

Report job