Hybrid Senior Site Reliability Engineer – Named Accounts

Posted 4 weeks ago

Apply now

About the role

  • Embed on-site with a named strategic customer, becoming an extension of their team
  • Act as the primary technical liaison between Lambda and the customer organization
  • Navigate ambiguous requirements to identify root problems and define clear technical solutions
  • Drive alignment across internal Lambda teams and customer stakeholders
  • Scope, sequence, and build full-stack solutions that deliver measurable business value
  • Design and implement infrastructure optimizations for AI/ML workloads at scale
  • Debug complex distributed systems issues across the infrastructure stack
  • Ship iteratively and learn fast, adjusting approach based on customer feedback and results
  • Identify reusable patterns from customer engagements that can scale across Lambda's customer base
  • Surface field intelligence that influences Lambda's product roadmap
  • Document and share learnings to elevate the capabilities of the broader team
  • Represent Lambda with executive presence in high-stakes customer interactions

Requirements

  • 6+ years of experience in a SRE, software engineer, or similar role, with a deep knowledge of running Linux clusters and systems
  • Strong programming skills in Go and Python; experience with GitOps (e.g., ArgoCD), Helm, and Kubernetes operators
  • Proven experience operating Kubernetes clusters in production environments (on-prem, EKS, GKE, or similar)
  • Hands-on experience with AI/ML workload management tools (Volcano, Kubeflow, or similar)
  • Can work either independently with limited direction or as part of a team
  • Familiarity with observability tools like Prometheus, Grafana, FluentBit, and CI/CD pipelines
  • Proven experience provisioning Kubernetes using tools such as kubeadm, Cluster API, or similar
  • Excellent communication skills with the ability to translate technical complexity for diverse audiences
  • Executive presence and ability to represent Lambda in customer-facing situations
  • Comfort operating in ambiguous environments with competing priorities
  • Strong bias for action and shipping iteratively

Benefits

  • Health, dental, and vision coverage for you and your dependents
  • Wellness and Commuter stipends for select roles
  • 401k Plan with 2% company match (USA employees)
  • Flexible Paid Time Off Plan that we all actually use

Job title

Senior Site Reliability Engineer – Named Accounts

Job type

Experience level

Senior

Salary

$240,000 - $425,000 per year

Degree requirement

Bachelor's Degree

Location requirements

Report this job

See something inaccurate? Let us know and we'll update the listing.

Report job