Senior Software Engineer building AI inference systems for AION's multi-cloud compute platform. Leading design and development of scalable managed services and orchestration systems for GPU workloads.
Responsibilities
Design and architect AION's multi-cloud compute platform, building abstraction layers that unify diverse cloud providers (AWS, GCP, Azure, bare-metal data centers)
Work directly with cloud providers to expand AION's compute pool—understanding pricing, availability zones, GPU types, and capacity planning
Build and maintain the AION managed services
Understand and abstract cloud provider differences in storage (block, object, file systems), networking (VPCs, subnets, security groups), and compute resources
Design composable platform components that enable forward deployments and promote reusability across AION's infrastructure stack
Own end-to-end development of managed services on the compute platform—from design and architecture through execution and production monitoring
Build scalable orchestration systems for GPU workloads, container scheduling, and resource allocation
Develop robust APIs and control planes for compute lifecycle management (provisioning, scaling, termination)
Lead technical discussions on platform reliability, performance optimization, and cost efficiency
Execute on peripheral platform services including billing systems, usage accounting, observability infrastructure, and compliance tooling
Build monitoring and telemetry systems for compute utilization, cost tracking, and performance metrics
Establish engineering standards for platform development including code reviews, quality gates, and testing practices
Mentor engineers on infrastructure best practices and distributed systems design
Requirements
4+ years of experience building and scaling complex backend systems, cloud infrastructure, or distributed platforms
Strong understanding of multi-cloud architectures and experience working with AWS, GCP, or Azure at scale
Deep knowledge of cloud abstractions: compute (EC2, GCE, VMs), storage (S3, GCS, EBS), networking (VPCs, load balancers, security groups)
Proficiency in Golang strongly preferred; Python, Rust, or other systems languages a plus
Experience with Kubernetes, container orchestration, and infrastructure-as-code (Terraform, Pulumi, CloudFormation)
Solid understanding of distributed systems principles, consensus algorithms, and state management
Experience building APIs, control planes, and platform services for infrastructure management
Familiarity with databases (PostgreSQL, Redis, etcd), message queues (Kafka, RabbitMQ), and event-driven architectures
Knowledge of GPU orchestration, AI/ML workloads, or HPC systems is highly desirable
Experience with observability tools (Prometheus, Grafana, Datadog) and distributed tracing
Understanding of cloud billing models, cost optimization strategies, and resource scheduling
Benefits
**Preferred Attributes:**
High ownership, self driven and bias for action.
Strong strategic thinking and ability to connect technical decisions to business impact.
Excellent communication and mentoring skills.
Thrives in ambiguity, fast-paced environments, and early-stage startup culture.
**Why Join AION?**
Work directly with high-pedigree founders shaping technical and product strategy.
Build infrastructure powering the future of AI compute globally.
Significant ownership and impact with equity reflective of your contributions.
Competitive compensation, flexible work options, and wellness benefits
FKP engineer at Salesforce designing software for Kubernetes cluster management. Evaluating and integrating open source technologies to enhance infrastructure capabilities.
Lead/Principal Software Engineer delivering scalable integration solutions at Salesforce. Collaborating with cross - functional teams and guiding engineering practices in a dynamic tech environment
Full Stack Engineer at Schwarz IT Barcelona developing high - quality software using SOLID principles and agile methodologies. Collaborating in cross - functional teams to ensure product quality and performance.
Lead Engineer managing advanced semiconductor packaging programs at Micron. Collaborating cross - functionally to ensure compliance and operational excellence throughout project lifecycle.
Lead Engineer providing technical leadership for advanced packaging initiatives at Micron. Driving technology enablement and optimizing development workflows with a focus on innovation.
Senior Test Automation Engineer developing automated test scripts and strategies for quality assurance in medical devices. Collaborating with teams to enhance product reliability and testing methodologies.
Ada Software Engineer developing and sustaining mission - critical software for Defence sector. Contributing to software requirements, design documentation, and collaboration within Agile teams.
Full Stack Developer joining veritree, a climate tech startup, to build applications for reforestation efforts. Responsible for full - stack development and server maintenance with AWS services.
Software engineer developing generative AI technologies for enterprise solutions at WRITER. Collaborating with teams to transform business operations using AI applications.
Software engineer focusing on generative AI solutions at WRITER. Collaborating with cross - functional teams to deliver scalable applications and transform enterprise productivity.