Senior Software Engineer building AI inference systems for AION's multi-cloud compute platform. Leading design and development of scalable managed services and orchestration systems for GPU workloads.
Responsibilities
Design and architect AION's multi-cloud compute platform, building abstraction layers that unify diverse cloud providers (AWS, GCP, Azure, bare-metal data centers)
Work directly with cloud providers to expand AION's compute pool—understanding pricing, availability zones, GPU types, and capacity planning
Build and maintain the AION managed services
Understand and abstract cloud provider differences in storage (block, object, file systems), networking (VPCs, subnets, security groups), and compute resources
Design composable platform components that enable forward deployments and promote reusability across AION's infrastructure stack
Own end-to-end development of managed services on the compute platform—from design and architecture through execution and production monitoring
Build scalable orchestration systems for GPU workloads, container scheduling, and resource allocation
Develop robust APIs and control planes for compute lifecycle management (provisioning, scaling, termination)
Lead technical discussions on platform reliability, performance optimization, and cost efficiency
Execute on peripheral platform services including billing systems, usage accounting, observability infrastructure, and compliance tooling
Build monitoring and telemetry systems for compute utilization, cost tracking, and performance metrics
Establish engineering standards for platform development including code reviews, quality gates, and testing practices
Mentor engineers on infrastructure best practices and distributed systems design
Requirements
4+ years of experience building and scaling complex backend systems, cloud infrastructure, or distributed platforms
Strong understanding of multi-cloud architectures and experience working with AWS, GCP, or Azure at scale
Deep knowledge of cloud abstractions: compute (EC2, GCE, VMs), storage (S3, GCS, EBS), networking (VPCs, load balancers, security groups)
Proficiency in Golang strongly preferred; Python, Rust, or other systems languages a plus
Experience with Kubernetes, container orchestration, and infrastructure-as-code (Terraform, Pulumi, CloudFormation)
Solid understanding of distributed systems principles, consensus algorithms, and state management
Experience building APIs, control planes, and platform services for infrastructure management
Familiarity with databases (PostgreSQL, Redis, etcd), message queues (Kafka, RabbitMQ), and event-driven architectures
Knowledge of GPU orchestration, AI/ML workloads, or HPC systems is highly desirable
Experience with observability tools (Prometheus, Grafana, Datadog) and distributed tracing
Understanding of cloud billing models, cost optimization strategies, and resource scheduling
Benefits
**Preferred Attributes:**
High ownership, self driven and bias for action.
Strong strategic thinking and ability to connect technical decisions to business impact.
Excellent communication and mentoring skills.
Thrives in ambiguity, fast-paced environments, and early-stage startup culture.
**Why Join AION?**
Work directly with high-pedigree founders shaping technical and product strategy.
Build infrastructure powering the future of AI compute globally.
Significant ownership and impact with equity reflective of your contributions.
Competitive compensation, flexible work options, and wellness benefits
Tech Lead managing Data Engineering for a French digital solutions company. Leading data solutions for e - retail performance with Python and SQL on modern architectures.
Staff Software Engineer enhancing TeamViewer ONE capabilities for small and medium businesses. Collaborating to maintain and improve user experiences with distributed systems and cloud platforms.
Senior Director of Software Engineering leading a team focused on AI - enabled technology initiatives. Manage projects that transform business and technology capabilities in the insurance industry.
Senior Software Engineer developing cross - product features for enterprise customers at Cloudera. Collaborating within a global team and ensuring high - quality metadata management services.
IT Cloud Software Architect designing and scaling cloud - native applications at Nelnet. Leading technical direction and fostering innovation in a hybrid work environment.
Software Engineer Lead developing ETL solutions for PNC's regulatory compliance needs. Leading design and development of data solutions with compliance emphasis.
Senior Software Engineer focusing on backend development at CVS Health. Building software components using a cloud - native platform on Google Cloud Platform.
Software Engineer developing high quality products for OPENLANE in web, iOS, and Android environments. Collaborating in an agile team to build solutions with backend microservices on AWS cloud.
Software Engineer supporting BlueCard claims processing by enhancing applications and modernizing legacy systems. Requires experience in COBOL, C#, and SQL Server with remote work options.