AI Infrastructure Engineer focusing on scalable backend systems for AI workflows in a fast-paced startup. Collaborating on reliability, data performance, and infrastructure for rapid growth.
Responsibilities
Design and implement scalable backend architectures for AI workloads (inference, orchestration, monitoring).
Own distributed job orchestration with Temporal and related systems.
Improve data pipeline performance by designing smarter caching strategies (e.g., file deduplication, hot/cold storage, Redis caching layers) to reduce redundant compute and API calls.
Build observability, monitoring, retries, and fault tolerance into all workflows.
Manage infrastructure reliability, incident response, and performance.
Develop tooling and platform infrastructure to support rapid growth.
Partner with ML engineers to bring models to production at scale.
Requirements
4+ years of backend engineering (Python is a must).
Strong background in distributed systems, job orchestration, and task queues.
Deep knowledge of concurrency, parallelism, and multithreading—including async/await, event loops, thread pools, synchronization primitives, deadlocks, and race conditions—is a must.
Hands-on experience with Temporal, Redis, Airflow, Celery, RabbitMQ (or similar).
Experience with LLM serving and routing fundamentals (rate limiting, streaming, load balancing, budgets).
Comfortable with containers & orchestration: Docker, Kubernetes.
Familiarity with cloud platforms (AWS/GCP) and IaC (Terraform).
Experience with multiple storage systems: S3, Postgres, MongoDB, Redis, and Elasticsearch.
Track record scaling systems in startups or fast-paced environments.
Understanding of deploying, monitoring, and optimizing AI/ML systems in production with strong CI/CD practices.
Site Infrastructure Engineer managing HVAC and utility systems at SABIC. Overseeing maintenance, project activities, and long - term asset strategies for operational efficiency.
Key engineer developing and operating Web Application Firewall (WAF) platforms at Lloyds Banking Group. Enhancing security and performance while working with modern engineering practices.
Lead Infrastructure Engineer driving Edge Security capabilities for Lloyds Banking Group. Focusing on web access protection, Zero Trust architectures, and modern security engineering approaches.
Senior System Administrator & Infrastructure Engineer managing reliable infrastructure and driving DevOps practices at IMAGO. Collaborating with development teams and providing technical guidance to ensure best practices.
Infrastructure Engineer maintaining high availability of systems at mortgage platform provider Pylon. Focus on developer productivity and codebase quality with instant feedback from peers.
Infrastructure Systems Engineer II managing production application support for Conduent. Collaborating on ITIL processes and incident management while working in a 24/7 environment.
OT Cybersecurity Specialist responsible for secure IT - OT infrastructures in industrial operations. Engaging in secure deployments, integrating cybersecurity frameworks, and providing expert support.
Ingeniero de Infraestructura y Seguridad colaborando en el diseño de arquitecturas seguras en CRG Solutions. Integrando buenas prácticas de ciberseguridad y gestionando incidentes en entornos Windows y Linux.
Senior Infrastructure Engineer managing global IT infrastructure for aviation solutions, focusing on VMware, Nutanix, and Windows Server environments. Collaborating with teams to ensure high availability and optimal performance in a hybrid work model.
Cloud Support Engineer maintaining operational stability and automation for Azure cloud platforms. Working collaboratively across IT teams to ensure infrastructure reliability and security.