Hybrid Senior Software Engineer, Compute Platform

Posted 4 weeks ago

Apply now

About the role

  • Senior Software Engineer building AI inference systems for AION's multi-cloud compute platform. Leading design and development of scalable managed services and orchestration systems for GPU workloads.

Responsibilities

  • Design and architect AION's multi-cloud compute platform, building abstraction layers that unify diverse cloud providers (AWS, GCP, Azure, bare-metal data centers)
  • Work directly with cloud providers to expand AION's compute pool—understanding pricing, availability zones, GPU types, and capacity planning
  • Build and maintain the AION managed services
  • Understand and abstract cloud provider differences in storage (block, object, file systems), networking (VPCs, subnets, security groups), and compute resources
  • Design composable platform components that enable forward deployments and promote reusability across AION's infrastructure stack
  • Own end-to-end development of managed services on the compute platform—from design and architecture through execution and production monitoring
  • Build scalable orchestration systems for GPU workloads, container scheduling, and resource allocation
  • Develop robust APIs and control planes for compute lifecycle management (provisioning, scaling, termination)
  • Lead technical discussions on platform reliability, performance optimization, and cost efficiency
  • Execute on peripheral platform services including billing systems, usage accounting, observability infrastructure, and compliance tooling
  • Build monitoring and telemetry systems for compute utilization, cost tracking, and performance metrics
  • Establish engineering standards for platform development including code reviews, quality gates, and testing practices
  • Mentor engineers on infrastructure best practices and distributed systems design

Requirements

  • 4+ years of experience building and scaling complex backend systems, cloud infrastructure, or distributed platforms
  • Strong understanding of multi-cloud architectures and experience working with AWS, GCP, or Azure at scale
  • Deep knowledge of cloud abstractions: compute (EC2, GCE, VMs), storage (S3, GCS, EBS), networking (VPCs, load balancers, security groups)
  • Proficiency in Golang strongly preferred; Python, Rust, or other systems languages a plus
  • Experience with Kubernetes, container orchestration, and infrastructure-as-code (Terraform, Pulumi, CloudFormation)
  • Solid understanding of distributed systems principles, consensus algorithms, and state management
  • Experience building APIs, control planes, and platform services for infrastructure management
  • Familiarity with databases (PostgreSQL, Redis, etcd), message queues (Kafka, RabbitMQ), and event-driven architectures
  • Knowledge of GPU orchestration, AI/ML workloads, or HPC systems is highly desirable
  • Experience with observability tools (Prometheus, Grafana, Datadog) and distributed tracing
  • Understanding of cloud billing models, cost optimization strategies, and resource scheduling

Benefits

  • **Preferred Attributes:**
  • High ownership, self driven and bias for action.
  • Strong strategic thinking and ability to connect technical decisions to business impact.
  • Excellent communication and mentoring skills.
  • Thrives in ambiguity, fast-paced environments, and early-stage startup culture.
  • **Why Join AION?**
  • Work directly with high-pedigree founders shaping technical and product strategy.
  • Build infrastructure powering the future of AI compute globally.
  • Significant ownership and impact with equity reflective of your contributions.
  • Competitive compensation, flexible work options, and wellness benefits

Job title

Senior Software Engineer, Compute Platform

Job type

Experience level

Senior

Salary

Not specified

Degree requirement

Bachelor's Degree

Location requirements

Report this job

See something inaccurate? Let us know and we'll update the listing.

Report job