Hybrid Senior Software Engineer, LLM Inferencing, AI Gateway

Posted 27 minutes ago

Apply now

About the role

  • Senior Software Engineer designing and building the GPU-based GenAI platform for Wells Fargo. Leading technical initiatives and collaborating with teams to resolve AI challenges.

Responsibilities

  • Lead complex Generative AI initiatives and deliverables within technical domain environments
  • Contribute to large scale planning of strategies
  • Design, code, test, debug, and document for projects and programs associated with technology domain, including upgrades and deployments
  • Review moderately complex technical challenges that require an in-depth evaluation of technologies and procedures
  • Resolve moderately complex issues and lead a team to meet existing client needs or potential new clients needs while leveraging solid understanding of the function, policies, procedures, or compliance requirements
  • Collaborate and consult with peers, colleagues, and mid-level managers to resolve technical challenges and achieve goals
  • Lead projects and act as an escalation point, provide guidance and direction to less experienced staff
  • Engineer GPUs clusters and node pools; configure NVLink/NVSwitch, NVIDIA GPU Operator, MIG profiles, container runtime, and kernel/driver baselines for high‑throughput LLM/SLM workloads.

Requirements

  • 4+ years of Software Engineering experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
  • 1+ years of experience with GPU Inference including NVIDIA CUDA, cuDNN, NVLink/NVSwitch, MIG, NIXL, GPU profiling, and performance tuning on H100/H200 architectures
  • 1+ years of experience with GPU orchestration platforms, such as RunAI (collections, queues, quotas, preemption, fair-share scheduling), OpenShift AI (RHOAI), and cluster administration on OCP or GKE
  • 1+ years of experience with LLM/SLM serving frameworks, including vLLM, Triton, TensorRT‑LLM/MII, KV‑cache optimization strategies, and FP8/INT4 quantization techniques (AWQ/GPTQ)
  • 1+ years of experience working with LLM API gateways, including OAuth2/mTLS authentication, rate‑limiting and quota management, OpenAPI/SDK integration, SLAs, and versioning/deprecation practices
  • 2+ years of experience in Generative AI engineering, including LLM/SLM operations, fine‑tuning, evaluation pipelines, and developing model‑specific performance optimization recipes
  • 4+ years of experience in Python, including scripting, automation, and model/inference‑related development

Benefits

  • Health benefits
  • 401(k) Plan
  • Paid time off
  • Disability benefits
  • Life insurance, critical illness insurance, and accident insurance
  • Parental leave
  • Critical caregiving leave
  • Discounts and savings
  • Commuter benefits
  • Tuition reimbursement
  • Scholarships for dependent children
  • Adoption reimbursement

Job title

Senior Software Engineer, LLM Inferencing, AI Gateway

Job type

Experience level

Senior

Salary

$100,000 - $196,000 per year

Degree requirement

Bachelor's Degree

Location requirements

Report this job

See something inaccurate? Let us know and we'll update the listing.

Report job