Hybrid Senior Software Engineer – AI/ML Infra

Posted 2 days ago

Apply now

About the role

  • Design and implement scalable infrastructure for training, fine-tuning, and serving open source LLMs (Llama, Mistral, Gemma, etc.)
  • Architect and manage Kubernetes clusters for ML workloads, including GPU scheduling, autoscaling, and resource optimization
  • Design, implement, and maintain feature stores for ML model training and inference pipelines
  • Build and optimize LLM inference systems using frameworks like vLLM, TensorRT-LLM, and custom serving solutions
  • Ensure 99.9%+ uptime for ML platforms through robust monitoring, alerting, and incident response procedures
  • Design and implement ML platforms using DataRobot, Azure Machine Learning, Azure Kubernetes Service (AKS), and Azure Container Instances
  • Develop and maintain infrastructure using Terraform, ARM templates, and Azure DevOps
  • Implement cost-effective solutions for GPU compute, storage, and networking across Azure regions
  • Ensure ML platforms meet enterprise security standards and regulatory compliance requirements
  • Evaluate and potentially implement hybrid cloud solutions with AWS/GCP as backup or specialized use cases
  • Mentor junior engineers and data scientists on platform best practices, infrastructure design, and ML operations
  • Lead comprehensive code reviews focusing on scalability, reliability, security, and maintainability
  • Design and deliver technical onboarding programs for new team members joining the ML platform team
  • Work closely with data scientists to understand requirements and optimize workflows for model development and deployment
  • Collaborate with product engineering teams to integrate ML capabilities into customer-facing applications
  • Support research teams with infrastructure for experimenting with cutting-edge LLM techniques and architectures

Requirements

  • Bachelor’s degree in computer science, Engineering, or related technical field (or equivalent experience)
  • 5+ years of software engineering experience with focus on infrastructure, platform engineering, or MLOps
  • 2+ years of hands-on experience with machine learning infrastructure and deployment at scale
  • 1+ years of experience working with Large Language Models and transformer architectures
  • Proficient in Python; strong skills in Go, Rust, or Java preferred
  • Proven experience working with open source LLMs (Llama 2/3, Qwen, Mistral, Gemma, Code Llama, etc.)
  • Proficient in Kubernetes including custom operators, helm charts, and GPU scheduling
  • Deep expertise in Azure services (AKS, Azure ML, Container Registry, Storage, Networking)
  • Experience implementing and operating feature stores (Chronon, Feast, Tecton, Azure ML Feature Store, or custom solutions)
  • Hands-on experience with inference optimization using vLLM, TensorRT-LLM, Triton Inference Server, or similar
  • Advanced experience with Azure DevOps, GitHub Actions, Jenkins, or similar CI/CD platforms
  • Proficiency with Terraform, ARM templates, Pulumi, or CloudFormation
  • Deep understanding of Docker, container optimization, and multi-stage builds
  • Experience with Prometheus, Grafana, ELK stack, Azure Monitor, and distributed tracing
  • Knowledge of both SQL and NoSQL databases, data warehousing, and vector databases

Benefits

  • Comprehensive Total Rewards program that offers personalized coverage tailor-made for you and your family’s overall well-being.
  • Financial benefits including market-competitive compensation; a 401K savings plan vested from day one that offers a 6% match; performance and recognition-based incentives; and tuition assistance.
  • Access to additional benefits like mental healthcare as well as fertility and adoption assistance.
  • Supports flexibility- We provide workplace flexibility as well as our GEICO Flex program, which offers the ability to work from anywhere in the US for up to four weeks per year.

Job title

Senior Software Engineer – AI/ML Infra

Job type

Experience level

Senior

Salary

$105,000 - $300,000 per year

Degree requirement

Bachelor's Degree

Location requirements

Report this job

See something inaccurate? Let us know and we'll update the listing.

Report job