Senior Software Engineer focusing on complex infrastructure deployment for AION's AI cloud platform. Managing multi-cloud deployment strategies and ensuring compliance across customer environments.
Responsibilities
Design AION as a composable platform with independently deployable components that run seamlessly on AWS, GCP, Azure, and private data centers
Work with senior engineering leads to define private deployment strategies and build automation for customer VPC and on-premises installations
Build abstraction layers that unify diverse cloud providers while maintaining flexibility for customer-specific requirements
Design globally distributed deployment patterns with built-in data sovereignty, compliance, and regulatory requirements
Own end-to-end platform deployment automation using Terraform, Ansible, Helm, and infrastructure-as-code across hybrid cloud environments
Design and implement disaster recovery, failover, high-availability architectures, and cloud migration strategies for customer deployments
Build comprehensive CI/CD pipelines for infrastructure provisioning, configuration management, and deployment orchestration
Implement monitoring, observability (Prometheus, Grafana, Loki), and alerting systems tailored for customer-managed AION instances
Implement Kubernetes-based and custom orchestrator-based managed services with strict workload isolation and multi-tenancy
Design container security, runtime protection, network policies, and secrets management for production workloads
Own compliance implementation (SOC2, GDPR, HIPAA, ISO 27001, PCI-DSS) and security best practices for customer environments
Create deployment blueprints, reference architectures, self-service portals, and comprehensive documentation for customer success
Requirements
6+ years of experience in platform deployment, DevOps, SRE, or cloud infrastructure roles with focus on customer-facing deployments
Deep expertise in Kubernetes including cluster design, multi-tenancy, custom resources, operators / controllers, and production operations
Fundamental understanding of Linux processes and container internals, specifically regarding runtime optimizations like lazy loading (Nydus, eStargz) and snapshot checkpoint/restore mechanisms (CRIU) for fast migration and reduced cold-start times.
Deep understanding of computer networking and the OSI model, with experience in creating overlay networks using VXLAN or BGP and implementing network isolation through CIDRs
Strong understanding of hybrid and multi-cloud architectures combining on-premises, private, and public cloud resources, including VPCs, routing, network policies, and VPN tools like WireGuard
Proficiency in infrastructure-as-code using Terraform, Ansible, Pulumi, Nix, or CloudFormation across multiple cloud providers
Experience building and maintaining GitOps pipelines for infrastructure and application deployments using GitLab CI, GitHub Actions, ArgoCD or FluxCD
Knowledge of secrets management (External Secrets Operator, Vault, AWS Secrets Manager, GCP Secret Manager) and encryption at rest/in transit
Knowledge of observability stack including Prometheus, Grafana, Loki, distributed tracing (Jaeger, Tempo), and log aggregation
Programming/scripting skills in Go or Python for building automation tools, operators, and deployment scripts
Hands-on experience deploying complex platforms in customer VPCs and on-premises environments with strict isolation requirements
Experience designing and executing cloud migration strategies including lift-and-shift, re-platforming, and cloud-native transformations
Strong knowledge of security compliance frameworks (SOC2, GDPR, HIPAA, ISO 27001, PCI-DSS) and their implementation in cloud infrastructure
Familiarity with disaster recovery strategies, backup solutions (Velero, Kasten), and business continuity planning
Exposure to HPC systems, GPU orchestration, and AI workload patterns is highly desirable
Benefits
**Preferred Attributes:**
High ownership, self driven and a bias for action.
Strong strategic thinking and ability to connect technical decisions to business impact.
Excellent communication and mentoring skills.
Thrives in ambiguity, fast-paced environments, and early-stage startup culture.
**Why Join AION?**
Work directly with high-pedigree founders shaping technical and product strategy.
Build infrastructure powering the future of AI compute globally.
Significant ownership and impact with equity reflective of your contributions.
Competitive compensation, flexible work options, and wellness benefits
**Apply Now:**
If you’re a strong engineer ready to lead architecture and scale next-generation AI infrastructure, we want to hear from you. Please share:
Your resume highlights relevant projects and leadership experience.
Software Engineer III at S&C Electric focusing on developing embedded firmware for power reliability products. Collaborating with teams to ensure reliability and performance of electrical grid solutions.
Senior Full - Stack Software Engineer designing web applications for financial stability solutions. Collaborating on intuitive frontends and robust backends in a mission - driven environment.
Tech Lead managing Data Engineering for a French digital solutions company. Leading data solutions for e - retail performance with Python and SQL on modern architectures.
Staff Software Engineer enhancing TeamViewer ONE capabilities for small and medium businesses. Collaborating to maintain and improve user experiences with distributed systems and cloud platforms.
Senior Director of Software Engineering leading a team focused on AI - enabled technology initiatives. Manage projects that transform business and technology capabilities in the insurance industry.
Senior Software Engineer developing cross - product features for enterprise customers at Cloudera. Collaborating within a global team and ensuring high - quality metadata management services.
IT Cloud Software Architect designing and scaling cloud - native applications at Nelnet. Leading technical direction and fostering innovation in a hybrid work environment.
Software Engineer Lead developing ETL solutions for PNC's regulatory compliance needs. Leading design and development of data solutions with compliance emphasis.
Senior Software Engineer focusing on backend development at CVS Health. Building software components using a cloud - native platform on Google Cloud Platform.