Onsite Datacenter Hardware Engineer, HPC

Posted last week

Apply now

About the role

  • Diagnose & operate core server/cluster components - Investigate and resolve compute/storage hardware issues (CPU, memory, drives, NICs, GPUs, PSUs) and interconnect faults (switches, cables, transceivers; Ethernet/InfiniBand). Perform safe interventions (power-off/lockout, ESD) to replace, re-seat, or re-cable components and restore service.
  • Safety & procedures - Apply lockout/tagout (LOTO) and ESD discipline; follow pre/post-work checklists; maintain tidy, safe work areas.
  • First-line diagnostics - Triage using LEDs, POST, beep codes and basic tests; capture evidence (photos, serial numbers, test results); open/update/close tickets with clear notes.
  • Preventive maintenance - Provide feedback and ideas to improve proactive activities, monitoring, and targeted follow-ups on recurring or specific anomalies; help turn ad-hoc checks into SOPs, alerts, and dashboards.
  • Parts & logistics - Receive and track parts, keep labeled inventory accurate, manage simple RMAs, and coordinate with vendors.
  • Collaboration & escalation - Partner with senior hardware/firmware owners on complex or multi-node issues; communicate status and next steps clearly and concisely.
  • Documentation & quality - Keep SOPs/checklists current; ensure zero undocumented changes and maintain consistent, audit-ready records.

Requirements

  • Hands-on mindset in datacenters/server hardware: able to install/re-seat/swap GPU/PCIe cards, NICs, PSUs, drives, and work neatly in racks (rails, cabling, labeling). Candidates with strong Linux fundamentals (boot/check, logs) and scripting (Python/Bash) who want to learn hardware are welcome; you’ll be trained and mentored by a senior hardware engineer.
  • Disciplined and meticulous: follows checklists, ESD/LOTO procedures; handles high-value server components with care.
  • Practical electrical basics: power-off procedures, PPE use, awareness of short-circuit risks.
  • Comfortable working in racks: cooling, network, storage, PDU, cable management; able to lift and mount equipment safely (within HSE limits).
  • Clear communicator: provides short factual updates; reliable teammate; punctual and process-oriented.
  • Hardware-passionate, professionally grounded: strong curiosity and a craft-oriented mindset.
  • Nice to have: HPC/AI/Cloud at-scale experience (production environments), large-fleet/server installation & maintenance in datacenters.
  • Basic networking (Ethernet/InfiniBand) and basic Linux (boot/check; no advanced coding required).
  • Coding/automation skills (Python/Bash): ability to create small tools/scripts to improve checklists, photo/serial capture, inventory sync, or simple monitoring/reporting.
  • Experience with inventory/RMA tools and vendor coordination.
  • Exposure to HPC, research, or industrial environments.

Benefits

  • Competitive salary and equity package
  • Health insurance
  • Transportation allowance
  • Sports allowance
  • Meal vouchers
  • Private pension plan
  • Generous parental leave policy

Job title

Datacenter Hardware Engineer, HPC

Job type

Experience level

Mid levelSenior

Salary

Not specified

Degree requirement

No Education Requirement

Location requirements

Report this job

See something inaccurate? Let us know and we'll update the listing.

Report job