Principal IaaS Engineer leading architecture, standardization of AI infrastructure. Collaborating with data and security teams to enhance global infrastructure platforms.
Responsibilities
Architect and evolve the company’s IaaS platform across hybrid environments (on-premise, distributed), enabling secure and scalable compute foundations.
Design, build, and maintain infrastructure automation frameworks using Terraform, Pulumi, and Ansible, including development of custom providers and modules.
Define and enforce engineering standards for infrastructure provisioning, networking, and observability to ensure reliability, security, and consistency.
Lead evaluation and integration of core technologies including OpenShift, Kubernetes, MAAS, and Ceph to optimize performance, cost, and maintainability.
Drive multi-tenant PaaS initiatives and private cloud modernization leveraging OpenShift, Juju, and S3-compatible storage (Ceph, MinIO, TrueNAS).
Collaborate with Data, ML, and Platform Engineering teams to align IaaS architecture with emerging workloads—data pipelines, MLflow, and Airflow orchestration.
Establish GitOps and CI/CD frameworks (ArgoCD, Helm, GitHub Actions, Azure DevOps) for consistent infrastructure delivery and configuration management.
Lead capacity planning, HA/DR strategy, and monitoring/alerting design using Prometheus, Grafana, and Loki stacks.
Partner with InfoSec to embed zero-trust, OIDC/SAML-based IAM, and secret management best practices into infrastructure lifecycle.
Mentor engineers and contribute to organization-wide technical enablement through documentation, workshops, and community participation.
Requirements
10+ years of experience designing and operating large-scale infrastructure systems across on-prem and cloud environments.
Proven expertise in Infrastructure as Code (Terraform, Pulumi, Ansible) with experience authoring reusable modules and providers.
Deep understanding of hybrid and private cloud platforms (OpenShift, Juju, MAAS, OpenStack, VMware, Proxmox).
Strong background in storage (Ceph, TrueNAS, S3, NFS) and networking (VLAN, VXLAN, SDN) for high-availability architectures.
Demonstrated experience building GitOps-based deployment pipelines and maintaining production-grade Kubernetes environments.
Familiarity with data and ML infrastructure integration—MLflow, Airflow, Databricks, or Spark preferred.
Strong proficiency in Python, Go, and Bash for automation and platform tooling.
Excellent cross-functional leadership, communication, and mentorship skills.
System Engineer for IBM i at WIIT, leading projects in Hosted Private and Hybrid Cloud environments. Responsibilities include AS400 management, migration projects, and Disaster Recovery.
Senior Systems Engineer responsible for leading systems engineering of autonomy architecture in support of Special Operations Forces. Collaborating with technical teams on advanced integration solutions.
Systems Engineer/Technical Project Manager supporting federal government contractor with migration of applications. Requires Secret Clearance and experience with complex migration projects in DoD/DISA environments.
Systems Architect supporting Nuclear Enterprise Center at Fort Meade for USSTRATCOM under SEI framework. Collaborating on large - scale systems architecture and engineering operations.
System Engineer managing and maintaining Microsoft environments in a hybrid role. Collaborating with clients on projects while ensuring system stability without frequent travel.
Principal Systems Engineer at Northrop Grumman managing integration activities for the TRMC program with a focus on Systems Engineering principles. Collaborating with stakeholders and managing engineering document changes.
Virtualization Systems Engineer providing technical leadership for enterprise virtualization infrastructure at GDIT. Overseeing operations and ensuring compliance across classified and unclassified environments.
Autonomy Systems Engineer at Caterpillar Inc. focusing on developing autonomous machine systems and requirements. Collaborating with global teams to ensure system integration and validation in engineering product development.
Analista de Sistemas PL responsible for managing Zendesk platform and aligning technological solutions to organizational goals. Involves process analysis for automation identification and system integrity.