Capacity Operations Associate managing global infrastructure for AI companies. Collaborating with customer success and engineering teams on hardware lifecycles and operational excellence.
Responsibilities
Fleet Maintenance: Manage daily node operations including tainting/untainting, node draining, and PVC repairs to ensure GPU fleet health and operational cost control
GTM & Capacity Fulfillment: Partner with Sales and account teams to scope and fulfill customer capacity requests, translating complex timelines into concrete infrastructure actions and clear ETAs
Process & Observability Engineering: Identify recurring gaps in the capacity lifecycle (intake, triage, comms) and drive fixes by defining lightweight processes and improving system observability
Technical Orchestration: Act as the operational bridge between SRE and Infra teams, executing discrete changes and verifying system status during high-stakes maintenance windows
Technical Documentation: Contribute to the internal knowledge base for GPU-specific issues (H100/A100/B200) to accelerate future incident resolution
Automation & Tooling: Identify repetitive workflows and partner with engineering to build scripts, dashboards, and internal tools that reduce manual intervention and shorten time-to-mitigation
Knowledge Excellence: Maintain a living database of GPU-specific intelligence (H100/B200) and market moves to accelerate incident resolution and support strategic briefings for leadership
Requirements
Bachelor's or Master's degree in Computer Science, Engineering, or a related field
2+ years of professional work experience, ideally in a customer-facing technical role or as a junior SRE/Cloud Engineer
Strong familiarity with Kubernetes and the lifecycle of cloud-based container orchestration
Strong ownership mindset and attention to detail, demonstrated through fast detection, clear communication, and reliable follow-through
Demonstrated ability to communicate complex technical blockers clearly to both internal engineering teams and external vendors
Preference for SF or NYC-based candidates to foster a close-knit "family" atmosphere in the office
Benefits
Competitive compensation, including meaningful equity.
100% coverage of medical, dental, and vision insurance for employee and dependents
Generous PTO policy including company wide Winter Break (our offices are closed from Christmas Eve to New Year's Day!)
Paid parental leave
Company-facilitated 401(k)
Exposure to a variety of ML startups, offering unparalleled learning and networking opportunities.
Betriebsleiter responsible for the daily operations and team management at HANS IM GLÜCK location in Münster. Focusing on productivity, team motivation, and compliance with regulations.
Operations Manager leading staff across various sites in Germany. Responsibilities include personnel management, operation coordination, and performance monitoring.
Betriebsleiter managing operations and processes in waste management services at KNETTENBRECH + GURDULIC. Leading teams and ensuring compliance with environmental standards in Mannheim.
Product Operations Manager at Kazaar improving offline marketing execution processes and collaborating across multiple teams. Aiming to ensure clarity, structure, and efficient communication in product operations.
Operationstechnische:r Assistent:in supporting surgical teams in Frankfurt. Preparing surgical environments and assisting students during their training.
Operations Manager leading core processes and ensuring compliance at Ear to the Ground. Driving strategic goals into tangible results and optimizing agency operations in Manchester.
Operations Specialist managing payments and bookings for the European boat rental platform Click&Boat. Overseeing operations, cash flow, and assisting with finance - related tasks in a hybrid role based in Barcelona.
Neuropsychologist providing therapy and diagnostics in clinical settings for neurological patients. Collaborating interprofessionally and maintaining comprehensive documentation for treatment.
Subject matter expert responsible for management of supply chain compliance at Avnet. Overseeing trade compliance and regulations in the region with leadership responsibilities.
Senior Director leading Digital Transformation initiatives at Regeneron. Focusing on strategy, project management, and collaborations in life sciences.