About the role

AIOps/LLMOps Engineer at Parspec designing and managing AI infrastructure. Helping transform the construction materials supply chain by building AI-powered infrastructure.

Responsibilities

Design and build document AI platforms powered by generative AI, leveraging asynchronous architectures for scalable inference.
Implement event-driven and queue-based systems to support elastic scaling and non-blocking AI workflows.
Architect and maintain self-hosted LLM infrastructure using tools such as vLLM or Ollama on Kubernetes or EC2 with GPU orchestration.
Manage production systems for LLM serving, inference pipelines, and AI workflow orchestration.
Implement LLM gateways and routing systems (e.g., LiteLLM, Portkey) to ensure proper model usage and governance.
Develop guardrails and monitoring systems to reduce hallucinations, misuse, and unsafe outputs in generative AI systems.
Implement end-to-end observability for AI/ML pipelines using distributed tracing and monitoring tools.
Monitor AI system health using platforms such as OpenTelemetry, AWS X-Ray, Prometheus, and Grafana.
Track performance metrics including latency, token usage, inference quality, and model drift.
Manage machine learning workflows using tools such as MLflow, Kubeflow, or SageMaker MLFlow setups.
Enable experiment tracking, model versioning, and deployment pipelines for production AI systems.
Work closely with engineering teams to integrate AI workflows into scalable backend systems.
Implement AI platform security controls including Bedrock Guardrails, KMS encryption, IAM least-privilege policies, VPC endpoints, and CloudTrail auditing.
Optimize AWS infrastructure—including Bedrock, SageMaker, and EKS—for cost efficiency, performance, and reliability.
Ensure production AI systems maintain high availability and security standards.

Requirements

Strong experience with AWS cloud infrastructure including services such as EC2, Lambda, S3, EKS, Bedrock, Step Functions, API Gateway, EventBridge, and SQS/SNS.
Experience building ML infrastructure using Infrastructure-as-Code tools such as Terraform or CloudFormation.
Hands-on experience deploying and operating LLM serving infrastructure using platforms such as vLLM or Text Generation Inference.
Experience managing vector databases and retrieval systems such as Pinecone, PGVector, or Weaviate.
Strong experience designing event-driven or asynchronous systems using queues (SQS, Kafka) and micro-batching patterns.
Experience implementing observability and monitoring for distributed AI systems using tools such as ELK, Prometheus, Grafana, and OpenTelemetry.
Strong programming experience in Python, including frameworks such as FastAPI and asynchronous programming patterns (asyncio).
Experience with Docker, Kubernetes, and CI/CD pipelines using tools such as GitHub Actions or ArgoCD.
5+ years of experience in MLOps, LLMOps, AIOps, or DevOps supporting machine learning or AI systems.
Proven track record building production generative AI systems with high availability and scalability.
Experience deploying self-hosted LLMs on AWS infrastructure and building production-grade document AI platforms.
Experience operating AI systems with >99.9% uptime and cost-efficient infrastructure management.

Benefits

Competitive salary and benefits, including family insurance coverage
Free health teleconsultations
Learning/upskilling budgets
Equity in the company
Flexible hours and a hybrid work setup
Unlimited PTO
Opportunity to grow with a fast-scaling company transforming a large market

Hybrid LLM Ops Engineer

at Parspec

About the role

Responsibilities

Requirements

Benefits

Job title

Job type

Experience level

Salary

Degree requirement

Tech skills

Location requirements

Report this job

Similar roles

Insurance Operations Analyst - Mid Level (Focus on Renewal Journey)

Sicredi

Manager, Building Operations

Emory University

GM – IT Operations

Supermicro

Mission Operations Leader – Programs

Mission Technologies, a division of HII

Operational Excellence Engineer

Ecologi | B Corp™

Technical Operations Coordinator

Air Apps

Operations Manager

HumanIA

IT Operations

Hourly

Junior Administrative Analyst – Operational Excellence

HumanIA

Senior Analytics Partner – Operations

Wemolo