AIOps/LLMOps Engineer at Parspec designing and managing AI infrastructure. Helping transform the construction materials supply chain by building AI-powered infrastructure.
Responsibilities
Design and build document AI platforms powered by generative AI, leveraging asynchronous architectures for scalable inference.
Implement event-driven and queue-based systems to support elastic scaling and non-blocking AI workflows.
Architect and maintain self-hosted LLM infrastructure using tools such as vLLM or Ollama on Kubernetes or EC2 with GPU orchestration.
Manage production systems for LLM serving, inference pipelines, and AI workflow orchestration.
Implement LLM gateways and routing systems (e.g., LiteLLM, Portkey) to ensure proper model usage and governance.
Develop guardrails and monitoring systems to reduce hallucinations, misuse, and unsafe outputs in generative AI systems.
Implement end-to-end observability for AI/ML pipelines using distributed tracing and monitoring tools.
Monitor AI system health using platforms such as OpenTelemetry, AWS X-Ray, Prometheus, and Grafana.
Track performance metrics including latency, token usage, inference quality, and model drift.
Manage machine learning workflows using tools such as MLflow, Kubeflow, or SageMaker MLFlow setups.
Enable experiment tracking, model versioning, and deployment pipelines for production AI systems.
Work closely with engineering teams to integrate AI workflows into scalable backend systems.
Implement AI platform security controls including Bedrock Guardrails, KMS encryption, IAM least-privilege policies, VPC endpoints, and CloudTrail auditing.
Optimize AWS infrastructure—including Bedrock, SageMaker, and EKS—for cost efficiency, performance, and reliability.
Ensure production AI systems maintain high availability and security standards.
Requirements
Strong experience with AWS cloud infrastructure including services such as EC2, Lambda, S3, EKS, Bedrock, Step Functions, API Gateway, EventBridge, and SQS/SNS.
Experience building ML infrastructure using Infrastructure-as-Code tools such as Terraform or CloudFormation.
Hands-on experience deploying and operating LLM serving infrastructure using platforms such as vLLM or Text Generation Inference.
Experience managing vector databases and retrieval systems such as Pinecone, PGVector, or Weaviate.
Strong experience designing event-driven or asynchronous systems using queues (SQS, Kafka) and micro-batching patterns.
Experience implementing observability and monitoring for distributed AI systems using tools such as ELK, Prometheus, Grafana, and OpenTelemetry.
Strong programming experience in Python, including frameworks such as FastAPI and asynchronous programming patterns (asyncio).
Experience with Docker, Kubernetes, and CI/CD pipelines using tools such as GitHub Actions or ArgoCD.
5+ years of experience in MLOps, LLMOps, AIOps, or DevOps supporting machine learning or AI systems.
Proven track record building production generative AI systems with high availability and scalability.
Experience deploying self-hosted LLMs on AWS infrastructure and building production-grade document AI platforms.
Experience operating AI systems with >99.9% uptime and cost-efficient infrastructure management.
Benefits
Competitive salary and benefits, including family insurance coverage
Free health teleconsultations
Learning/upskilling budgets
Equity in the company
Flexible hours and a hybrid work setup
Unlimited PTO
Opportunity to grow with a fast-scaling company transforming a large market
Betriebsleiter responsible for the daily operations and team management at HANS IM GLÜCK location in Münster. Focusing on productivity, team motivation, and compliance with regulations.
Operations Manager leading staff across various sites in Germany. Responsibilities include personnel management, operation coordination, and performance monitoring.
Betriebsleiter managing operations and processes in waste management services at KNETTENBRECH + GURDULIC. Leading teams and ensuring compliance with environmental standards in Mannheim.
Product Operations Manager at Kazaar improving offline marketing execution processes and collaborating across multiple teams. Aiming to ensure clarity, structure, and efficient communication in product operations.
Operationstechnische:r Assistent:in supporting surgical teams in Frankfurt. Preparing surgical environments and assisting students during their training.
Operations Manager leading core processes and ensuring compliance at Ear to the Ground. Driving strategic goals into tangible results and optimizing agency operations in Manchester.
Operations Specialist managing payments and bookings for the European boat rental platform Click&Boat. Overseeing operations, cash flow, and assisting with finance - related tasks in a hybrid role based in Barcelona.
Neuropsychologist providing therapy and diagnostics in clinical settings for neurological patients. Collaborating interprofessionally and maintaining comprehensive documentation for treatment.
Subject matter expert responsible for management of supply chain compliance at Avnet. Overseeing trade compliance and regulations in the region with leadership responsibilities.
Senior Director leading Digital Transformation initiatives at Regeneron. Focusing on strategy, project management, and collaborations in life sciences.