Hybrid Capacity Ops Associate

Posted 3 hours ago

Apply now

About the role

  • Capacity Operations Associate managing global infrastructure for AI companies. Collaborating with customer success and engineering teams on hardware lifecycles and operational excellence.

Responsibilities

  • Fleet Maintenance: Manage daily node operations including tainting/untainting, node draining, and PVC repairs to ensure GPU fleet health and operational cost control
  • GTM & Capacity Fulfillment: Partner with Sales and account teams to scope and fulfill customer capacity requests, translating complex timelines into concrete infrastructure actions and clear ETAs
  • Process & Observability Engineering: Identify recurring gaps in the capacity lifecycle (intake, triage, comms) and drive fixes by defining lightweight processes and improving system observability
  • Technical Orchestration: Act as the operational bridge between SRE and Infra teams, executing discrete changes and verifying system status during high-stakes maintenance windows
  • Technical Documentation: Contribute to the internal knowledge base for GPU-specific issues (H100/A100/B200) to accelerate future incident resolution
  • Automation & Tooling: Identify repetitive workflows and partner with engineering to build scripts, dashboards, and internal tools that reduce manual intervention and shorten time-to-mitigation
  • Knowledge Excellence: Maintain a living database of GPU-specific intelligence (H100/B200) and market moves to accelerate incident resolution and support strategic briefings for leadership

Requirements

  • Bachelor's or Master's degree in Computer Science, Engineering, or a related field
  • 2+ years of professional work experience, ideally in a customer-facing technical role or as a junior SRE/Cloud Engineer
  • Strong familiarity with Kubernetes and the lifecycle of cloud-based container orchestration
  • Strong ownership mindset and attention to detail, demonstrated through fast detection, clear communication, and reliable follow-through
  • Demonstrated ability to communicate complex technical blockers clearly to both internal engineering teams and external vendors
  • Preference for SF or NYC-based candidates to foster a close-knit "family" atmosphere in the office

Benefits

  • Competitive compensation, including meaningful equity.
  • 100% coverage of medical, dental, and vision insurance for employee and dependents
  • Generous PTO policy including company wide Winter Break (our offices are closed from Christmas Eve to New Year's Day!)
  • Paid parental leave
  • Company-facilitated 401(k)
  • Exposure to a variety of ML startups, offering unparalleled learning and networking opportunities.

Job title

Capacity Ops Associate

Job type

Experience level

JuniorMid level

Salary

$120,000 - $160,000 per year

Degree requirement

Bachelor's Degree

Location requirements

Report this job

See something inaccurate? Let us know and we'll update the listing.

Report job