Senior Software Engineer focusing on complex infrastructure deployment for AION's AI cloud platform. Managing multi-cloud deployment strategies and ensuring compliance across customer environments.
Responsibilities
Design AION as a composable platform with independently deployable components that run seamlessly on AWS, GCP, Azure, and private data centers
Work with senior engineering leads to define private deployment strategies and build automation for customer VPC and on-premises installations
Build abstraction layers that unify diverse cloud providers while maintaining flexibility for customer-specific requirements
Design globally distributed deployment patterns with built-in data sovereignty, compliance, and regulatory requirements
Own end-to-end platform deployment automation using Terraform, Ansible, Helm, and infrastructure-as-code across hybrid cloud environments
Design and implement disaster recovery, failover, high-availability architectures, and cloud migration strategies for customer deployments
Build comprehensive CI/CD pipelines for infrastructure provisioning, configuration management, and deployment orchestration
Implement monitoring, observability (Prometheus, Grafana, Loki), and alerting systems tailored for customer-managed AION instances
Implement Kubernetes-based and custom orchestrator-based managed services with strict workload isolation and multi-tenancy
Design container security, runtime protection, network policies, and secrets management for production workloads
Own compliance implementation (SOC2, GDPR, HIPAA, ISO 27001, PCI-DSS) and security best practices for customer environments
Create deployment blueprints, reference architectures, self-service portals, and comprehensive documentation for customer success
Requirements
6+ years of experience in platform deployment, DevOps, SRE, or cloud infrastructure roles with focus on customer-facing deployments
Deep expertise in Kubernetes including cluster design, multi-tenancy, custom resources, operators / controllers, and production operations
Fundamental understanding of Linux processes and container internals, specifically regarding runtime optimizations like lazy loading (Nydus, eStargz) and snapshot checkpoint/restore mechanisms (CRIU) for fast migration and reduced cold-start times.
Deep understanding of computer networking and the OSI model, with experience in creating overlay networks using VXLAN or BGP and implementing network isolation through CIDRs
Strong understanding of hybrid and multi-cloud architectures combining on-premises, private, and public cloud resources, including VPCs, routing, network policies, and VPN tools like WireGuard
Proficiency in infrastructure-as-code using Terraform, Ansible, Pulumi, Nix, or CloudFormation across multiple cloud providers
Experience building and maintaining GitOps pipelines for infrastructure and application deployments using GitLab CI, GitHub Actions, ArgoCD or FluxCD
Knowledge of secrets management (External Secrets Operator, Vault, AWS Secrets Manager, GCP Secret Manager) and encryption at rest/in transit
Knowledge of observability stack including Prometheus, Grafana, Loki, distributed tracing (Jaeger, Tempo), and log aggregation
Programming/scripting skills in Go or Python for building automation tools, operators, and deployment scripts
Hands-on experience deploying complex platforms in customer VPCs and on-premises environments with strict isolation requirements
Experience designing and executing cloud migration strategies including lift-and-shift, re-platforming, and cloud-native transformations
Strong knowledge of security compliance frameworks (SOC2, GDPR, HIPAA, ISO 27001, PCI-DSS) and their implementation in cloud infrastructure
Familiarity with disaster recovery strategies, backup solutions (Velero, Kasten), and business continuity planning
Exposure to HPC systems, GPU orchestration, and AI workload patterns is highly desirable
Benefits
**Preferred Attributes:**
High ownership, self driven and a bias for action.
Strong strategic thinking and ability to connect technical decisions to business impact.
Excellent communication and mentoring skills.
Thrives in ambiguity, fast-paced environments, and early-stage startup culture.
**Why Join AION?**
Work directly with high-pedigree founders shaping technical and product strategy.
Build infrastructure powering the future of AI compute globally.
Significant ownership and impact with equity reflective of your contributions.
Competitive compensation, flexible work options, and wellness benefits
**Apply Now:**
If you’re a strong engineer ready to lead architecture and scale next-generation AI infrastructure, we want to hear from you. Please share:
Your resume highlights relevant projects and leadership experience.
FKP engineer at Salesforce designing software for Kubernetes cluster management. Evaluating and integrating open source technologies to enhance infrastructure capabilities.
Lead/Principal Software Engineer delivering scalable integration solutions at Salesforce. Collaborating with cross - functional teams and guiding engineering practices in a dynamic tech environment
Full Stack Engineer at Schwarz IT Barcelona developing high - quality software using SOLID principles and agile methodologies. Collaborating in cross - functional teams to ensure product quality and performance.
Lead Engineer managing advanced semiconductor packaging programs at Micron. Collaborating cross - functionally to ensure compliance and operational excellence throughout project lifecycle.
Lead Engineer providing technical leadership for advanced packaging initiatives at Micron. Driving technology enablement and optimizing development workflows with a focus on innovation.
Senior Test Automation Engineer developing automated test scripts and strategies for quality assurance in medical devices. Collaborating with teams to enhance product reliability and testing methodologies.
Ada Software Engineer developing and sustaining mission - critical software for Defence sector. Contributing to software requirements, design documentation, and collaboration within Agile teams.
Full Stack Developer joining veritree, a climate tech startup, to build applications for reforestation efforts. Responsible for full - stack development and server maintenance with AWS services.
Software engineer developing generative AI technologies for enterprise solutions at WRITER. Collaborating with teams to transform business operations using AI applications.
Software engineer focusing on generative AI solutions at WRITER. Collaborating with cross - functional teams to deliver scalable applications and transform enterprise productivity.