Lead the architecture, development, and deployment of scalable machine learning systems, focusing on real-time inference for LLMs serving multiple concurrent users.
Optimize inference pipelines using high-performance frameworks like vLLM, Groq, ONNX Runtime, Triton Inference Server, and TensorRT to minimize latency and cost.
Design and implement agentic AI systems utilizing frameworks such as LangChain, AutoGPT, and ReAct for autonomous task orchestration.
Fine-tune, integrate, and deploy foundation models including GPT, LLaMA, Claude, Mistral, Falcon, and others into intelligent applications.
Develop and maintain robust MLOps workflows to manage the full model lifecycle including training, deployment, monitoring, and versioning.
Collaborate with DevOps teams to implement scalable serving infrastructure leveraging containerization (Docker), orchestration (Kubernetes), and cloud platforms (AWS, GCP, Azure).
Implement retrieval-augmented generation (RAG) pipelines integrating vector databases like FAISS, Pinecone, or Weaviate.
Build observability systems for LLMs to track prompt performance, latency, and user feedback.
Work cross-functionally with research, product, and operations teams to deliver production-grade AI systems handling real-world traffic patterns.
Stay updated on emerging AI trends, hardware acceleration techniques, and contribute to open-source or research initiatives where possible.
Requirements
Bachelor’s or Master’s degree in Computer Science, Artificial Intelligence, Machine Learning, or related fields.
6–7 years of experience in machine learning engineering, applied AI, or MLOps roles.
Strong proficiency in Python and ML frameworks such as PyTorch, TensorFlow, and Hugging Face Transformers.
Deep knowledge of NLP, transformer-based architectures, and generative AI models.
Hands-on experience with scalable LLM inference optimization using tools like vLLM, Groq, Triton Inference Server, TensorRT, or ONNX Runtime.
Proven ability to serve AI models to concurrent users with low latency and high throughput.
Experience in deploying ML systems on cloud platforms (AWS, GCP, Azure).
Expertise in containerization (Docker), orchestration (Kubernetes), and CI/CD pipelines.
Familiarity with vector search technologies (FAISS, Pinecone, Weaviate) and RAG implementations.
Analyst within Credit Risk Management team identifying credit segmentation opportunities using statistical methods. Collaborating with teams to enhance credit decision process and policies.
Data Manager managing and analyzing company data at Amoddex, a consultancy for IT transformation projects. Ensuring data integrity and supporting strategic decision - making in a collaborative environment.
Data Scientist at Capital One on the LLM Customization Team utilizing the latest in computing and machine learning technologies. Collaborating with data scientists and engineers to deliver AI powered products.
Lead Full Stack Data Scientist at Tilt, building the intelligence layer for data - based decisions. Driving data science strategy and analytics to enhance product and growth insights.
Data Scientist focusing on Generative AI applications and engineering problem - solving at Ford. Collaborating with cross - functional teams to innovate and improve technology solutions in the automotive sector.
AI Engineer/Data Scientist in Ford's Global Data Insights & Analytics team. Developing advanced AI/ML solutions and collaborating on cloud - native data products.
Data Scientist transforming customer data into insights that guide strategic decisions for Riachuelo. Collaborating with teams to analyze and visualize data trends for business growth.
VP, Credit Risk & Data Science overseeing credit risk framework and portfolio management at Purpose Financial. Leading strategy and governance to enable profitable growth and risk mitigation.
Data Scientist joining a leading economic consultancy to implement data science solutions for business challenges and advance thought leadership in advanced analytics.