Design and implement scalable backend architectures for AI workloads (inference, orchestration, monitoring).
Own distributed job orchestration with Temporal and related systems.
Improve data pipeline performance by designing smarter caching strategies (e.g., file deduplication, hot/cold storage, Redis caching layers) to reduce redundant compute and API calls.
Build observability, monitoring, retries, and fault tolerance into all workflows.
Manage infrastructure reliability, incident response, and performance.
Develop tooling and platform infrastructure to support rapid growth.
Partner with ML engineers to bring models to production at scale.
Requirements
4+ years of backend engineering (Python is a must).
Strong background in distributed systems, job orchestration, and task queues.
Deep knowledge of concurrency, parallelism, and multithreading—including async/await, event loops, thread pools, synchronization primitives, deadlocks, and race conditions—is a must.
Hands-on experience with Temporal, Redis, Airflow, Celery, RabbitMQ (or similar).
Experience with LLM serving and routing fundamentals (rate limiting, streaming, load balancing, budgets).
Comfortable with containers & orchestration: Docker, Kubernetes.
Familiarity with cloud platforms (AWS/GCP) and IaC (Terraform).
Experience with multiple storage systems: S3, Postgres, MongoDB, Redis, and Elasticsearch.
Track record scaling systems in startups or fast-paced environments.
Understanding of deploying, monitoring, and optimizing AI/ML systems in production with strong CI/CD practices.
Senior Infrastructure Support Engineer at Mr. Cooper Group analyzing and optimizing enterprise infrastructure systems. Ensuring security, integration, and performance in technology solutions.
Senior Infrastructure Engineer managing today's RPC infrastructure and building a Pharmacy Management System for a growing compounding pharmacy network. Hands - on role in AWS environments with monitoring, access management, and CI/CD.
Senior Infrastructure Engineer supporting IT infrastructure implementation and maintenance at INTEGRIS Health. Involves mentoring, troubleshooting, and system optimization responsibilities in a hybrid work setting.
Senior BizOps Infrastructure Engineer managing global IT infrastructure at Simply Business. Collaborating on projects and driving automation in a cloud - first environment.
Linux Infrastructure Specialist managing the implementation and maintenance of Linux infrastructure for Morgan Stanley. Collaborating with infrastructure teams to ensure operational stability and compliance for production environments.
Senior Azure Infrastructure Engineer responsible for designing and managing Azure cloud solutions. Collaborating with development and IT operations to deploy and optimize cloud environments.
IT Infrastructure Engineer leading Hardware & Virtualization team at Optasia for financial technology solutions. Overseeing infrastructure stability, capacity planning, and team mentorship in Athens, Greece.
Cloud Engineer I at Travelers focusing on cloud automation, infrastructure design, and service management. Collaborating with teams to modernize cloud provisioning and improve operational efficiency.
Senior Cloud Infrastructure Engineer responsible for designing Azure infrastructure for healthcare AI. Collaborating with teams to enhance reliability, security, and compliance in cloud services.
IT Infrastructure Specialist overseeing hybrid IT infrastructure systems in global SaaS company. Responsible for system stability, security, and collaboration with engineering teams.