Lead AI Platform Engineer bridging AI workloads and production infrastructure focused on NVIDIA stack optimizations. Design and implement scalable AI systems in hybrid work environment.
Responsibilities
Translate AI/ML workloads into optimized infrastructure and deployment strategies
Optimize model performance across GPU environments (latency, throughput, memory utilization)
Design and implement inference and training pipelines using NVIDIA stack tools (TensorRT, Triton, NIM)
Convert and optimize models across frameworks (PyTorch → ONNX → TensorRT)
Analyze and resolve performance bottlenecks using profiling tools (GPU, memory, network)
Improve GPU utilization and scheduling efficiency across clusters
Design scalable distributed training and inference architectures
Work closely with customers to define AI infrastructure strategies and deployment models
Support production deployments including monitoring, rollback, and performance validation
Conduct applied research to improve model efficiency and infrastructure utilization
Mentor team members on AI infrastructure, optimization, and GPU systems
Experiment tracking tools (MLflow, W&B, Neptune) log parameters, metrics, and artifacts for comparison
Find the Model degradation happens post-deployment: concept drift, data pipeline changes, traffic pattern shifts
Root cause analysis (RCA) applies to ML systems: isolating variables, reproducing issues
Requirements
8+ years of experience in AI systems
8+ years of experience in ML systems, HPC and AI infrastructure
Strong proficiency in Python
Strong experience with GPU-based AI workloads and performance optimization
Deep understanding of model optimization techniques (quantization, pruning, batching)
Hands-on experience with:
PyTorch
ONNX / ONNX Runtime
TensorRT / TensorRT-LLM
Triton Inference Server
Knowledge of CUDA, cuDNN, and GPU architecture fundamentals
Experience with distributed systems (multi-GPU / multi-node)
Familiarity with:
NCCL communication
NVLink / InfiniBand
Kubernetes or Slurm for orchestration
Experience deploying AI models into production environments
Ability to analyze system bottlenecks (compute, memory, network)
Experience with profiling tools (Nsight, TensorRT profiler, etc.)
Knowledge of cost optimization strategies for GPU workloads
Experiment tracking tools (MLflow, W&B, Neptune) log parameters, metrics, and artifacts for comparison
Find the Model degradation happens post-deployment: concept drift, data pipeline changes, traffic pattern shifts
Root cause analysis (RCA) applies to ML systems: isolating variables, reproducing issues
Nice to Have
Experience with NVIDIA NIM and NGC ecosystem
Exposure to Megatron-LM, NeMo, or large-scale LLM training/inference
Experience with LLM optimization techniques (KV cache, batching strategies)
Familiarity with MLOps practices and CI/CD for AI systems
Experience in customer-facing architecture or consulting roles
Familiarity with hybrid cloud / on-prem HPC environments
Full stack engineer developing and scaling generative AI applications at PwC. Collaborating across teams to enhance software solutions and mentor junior engineers while maintaining high standards.
Developer Technology Engineer pushing boundaries of AI and computing at NVIDIA. Collaborate with teams to develop next - generation software platforms and performance optimization.
Software Development Engineer III developing AI products and deployment pipelines for Finance team. Collaborating with Product Managers to deliver trusted and explainable AI systems.
Senior AI Engineer developing advanced AI solutions at Daimler Truck Financial. Leading deployment of Generative and Agentic AI technologies in a collaborative environment.
Applied AI Architect at Intapp developing and deploying AI solutions for enterprise clients. Collaborating across teams and driving adoption of AI technologies in complex environments.
AI Engineering intern contributing to Generative AI products development at Erste Digital. Collaborating within an international team and gaining hands - on experience with advanced technologies.
AI Engineer designing and developing AI platforms for Contour Software, focusing on building GenAI systems and advanced LLM orchestration layers. Responsibilities include architecture, system integration, and AI adoption.