Platform Engineer improving AWS and GPU clusters for quantum simulations, collaborating with quantum researchers and shaping the platform's evolution.
Responsibilities
Own our AWS infrastructure end-to-end and actively shape how it evolves; building, not just maintaining.
Reduce friction in the deployment pipeline so developers can ship without infrastructure blockers.
Harden systems with intention: lock down IAM roles, container images, and authentication flows in ways that reflect a clear understanding of where the real risks are.
Implement monitoring and alerting that catches production issues before users notice them.
Make deployments faster to roll out, easier to roll back, and less prone to failure.
Lead incident response and post-mortems when necessary.
Make GPU clusters and other infrastructure invisible to the researchers running it.
Own CUDA compatibility and driver versions across heterogeneous GPU clusters.
Build standardized SLURM job submission workflows that researchers can use without help.
Package and containerize Python simulation code for reproducible execution.
Monitor job health across utilization, cost, and runtime efficiency.
Requirements
Experience: 5+ years in Platform Engineering, DevOps, or SRE roles.
Production AWS experience: Built and maintained systems on ECS/EKS, managed multi-account networking (VPCs, security groups), and dealt with real-world infrastructure complexity.
Infrastructure as Code: You've written and maintained Terraform (or Pulumi/CDK) in production, including applying ongoing changes as requirements evolved.
CI/CD: Improved build pipelines in production (reduced build times, increased reliability, made deployments easier to debug), including experience with GitLab CI, GitHub Actions, or equivalent.
GPU/HPC experience: Supported GPU workloads in production environments, including code optimization, CUDA debugging, and job scheduler setup.
Background in scientific computing, research infrastructure, ML platforms, or early-stage startups (especially research computing vendors).
Security & compliance experience: You've implemented auth systems (Auth0/Okta), managed encryption (KMS), or worked on FedRAMP/compliance-driven infrastructure. FedRAMP experience is a strong plus.
Exposure to quantum computing SDKs (Qiskit, Cirq, PennyLane) or hybrid classical-quantum workflows is a plus, but not required; genuine interest in quantum computing matters more than prior exposure.
Software Engineering Developer at Kyndryl designing and implementing software solutions for clients. Collaborating on complex projects using advanced technologies and methodologies.
AI Platform Engineer at Utica National Insurance Group responsible for evaluating, designing, and implementing AI/ML solutions. Collaborating with internal teams and ensuring effective use of AI - driven tools.
Platform Engineer focused on GitOps and cloud infrastructure for a global QSR retailer. Collaborating with teams to enhance Kubernetes delivery and deployment processes.
Platform Engineer - AI responsible for designing and prototyping AI - driven systems at Temedica. Collaborating on cloud infrastructure for modern applications and ensuring reliability and security in deployments.
Kubernetes Platform Engineer working with self - managed clusters and AI infrastructure. Collaborating with a team to design and operate Kubernetes solutions and automate operational tasks.
Lead Platform Engineer developing enterprise - grade developer tooling at Capital Group. Evolving SDLC toolchain through hands - on adoption of AI - assisted development and collaboration across teams.
Staff Engineer leading CI/CD platform development for fintech solutions at Early Warning. Collaborating across teams to enhance software delivery capabilities in a hybrid work environment.
Web Platform Developer at Xerxes Global enhancing automation and AI capabilities. Developing WordPress solutions while ensuring high - quality web experiences through integrations and optimizations.
Platform Engineer role at Blip focused on designing and building the Flutter UKI platform infrastructure. Responsibilities include system improvement and team collaboration for sports entertainment software solutions.
Power Platform Developer delivering user - centric low - code applications leveraging Microsoft Power Platform for various business needs. Collaborating in a hybrid environment while focusing on usability and quality.