AI Infrastructure Engineer focusing on scalable backend systems for AI workflows in a fast-paced startup. Collaborating on reliability, data performance, and infrastructure for rapid growth.
Responsibilities
Design and implement scalable backend architectures for AI workloads (inference, orchestration, monitoring).
Own distributed job orchestration with Temporal and related systems.
Improve data pipeline performance by designing smarter caching strategies (e.g., file deduplication, hot/cold storage, Redis caching layers) to reduce redundant compute and API calls.
Build observability, monitoring, retries, and fault tolerance into all workflows.
Manage infrastructure reliability, incident response, and performance.
Develop tooling and platform infrastructure to support rapid growth.
Partner with ML engineers to bring models to production at scale.
Requirements
4+ years of backend engineering (Python is a must).
Strong background in distributed systems, job orchestration, and task queues.
Deep knowledge of concurrency, parallelism, and multithreading—including async/await, event loops, thread pools, synchronization primitives, deadlocks, and race conditions—is a must.
Hands-on experience with Temporal, Redis, Airflow, Celery, RabbitMQ (or similar).
Experience with LLM serving and routing fundamentals (rate limiting, streaming, load balancing, budgets).
Comfortable with containers & orchestration: Docker, Kubernetes.
Familiarity with cloud platforms (AWS/GCP) and IaC (Terraform).
Experience with multiple storage systems: S3, Postgres, MongoDB, Redis, and Elasticsearch.
Track record scaling systems in startups or fast-paced environments.
Understanding of deploying, monitoring, and optimizing AI/ML systems in production with strong CI/CD practices.
Staff ML Infrastructure Engineer building and scaling robust Compute platforms for Simulation and data workflows at GM. Collaborating with engineers to drive efficiency and reliability in AI infrastructure.
IT Infrastructure Engineer managing network and digital infrastructure for Physicians Insurance, a boutique mutual insurance company. Collaborating on design, deployment, and maintenance operations.
Modern Workplace Exchange Infrastructure Architect at Avanade driving end - to - end cloud solutions with Microsoft 365. Collaborating with a large team on enterprise projects for digital transformation.
Infrastructure Specialist supporting enterprise voice platforms including Avaya and RingCentral. Balancing transformation with service stability while working in a hybrid environment.
VP of Technology Infrastructure leading multidisciplinary teams at Early Warning. Managing complex infrastructure and influencing company strategy for payment solutions.
Senior Infrastructure Architect II at Pacific Life defining global infrastructure architecture and ensuring alignment with business objectives. Collaborating cross - functionally to support enterprise - wide initiatives.
Responsible for managing IT infrastructure ensuring service availability and security. Leading support teams and overseeing technical projects for Pierre Fabre in Brazil.
Lead Infrastructure Engineer designing secure automation infrastructure for GE Vernova's digital transformation in utility operations. Collaborate with architects to develop reusable IT solutions.
Infrastructure Engineer managing VMware Server Infrastructure for CMA CGM in the UK. Providing L2/L3 support and ensuring smooth IT operations across client environments.
Infrastructure Engineer responsible for IT infrastructure maintenance and user support. Join One Beyond's innovative team to enhance system reliability and performance while working flexibly.