Seeking a Senior/Lead Platform Engineer responsible for architecting and implementing scalable data and ML platforms. Focusing on AWS and Databricks, while leading DevSecOps practices.
Responsibilities
Architect and implement end-to-end data and ML platforms: data lakes, warehouses, streaming and batch pipelines, model training/deployment infrastructure, on AWS + Databricks.
Lead DevSecOps and DataOps practices: infrastructure as code (IaC), CI/CD pipelines for data & ML workflows, secure multi-account/multi-region cloud operations.
Integrate AWS services (e.g., S3, Redshift, Kinesis, Lambda, EKS/ECS) with Databricks runtime, Delta Lake, Unity Catalog etc to build scalable, performant pipelines.
Build and operate ML infrastructure: training clusters, model versioning, MLOps toolchain (e.g., MLflow), model monitoring and observability, automatic retraining workflows.
Establish data governance, lineage, quality, observability standards across data pipelines and ML workflows.
Mentor engineering teams, define architectural best practices and guide implementation of high-scale data/ML systems.
Optimize system performance, cost and scalability; diagnose and resolve large-scale production issues.
Continuously evaluate new tools and technologies in the areas of cloud, data platform, DevSecOps, ML infrastructure and apply them to drive innovation.
Requirements
7+ years of experience in data platform architecture, cloud/ML infrastructure engineering or related roles.
Deep technical expertise in **Databricks and AWS**: demonstrated ability to design, integrate and operate solutions spanning both platforms.
Strong hands-on implementation skills: you will not just design but build, deploy and operate the platform.
Proven track record of building and operating scalable ML/AI platforms in production (model training & deployment).
Expertise in Apache Spark, Delta Lake, modern data pipeline frameworks (batch + streaming).
Strong background in infrastructure as code (Terraform, CloudFormation), CI/CD for data/ML, and DevSecOps practices.
Proficiency in Python and SQL; familiarity with Scala or equivalent is a plus.
Experience with data governance, data lineage, observability and MLOps frameworks (e.g., MLflow, Airflow, dbt).
Bonus: Experience in fintech, regulated industries or high-security environments.
Platform Engineer at 3E working on cloud - based and on - premises infrastructure. Collaborating with teams to support infrastructure projects and ensure security compliance.
Platform Engineer focusing on AWS services and infrastructure modernization for a cloud - based POS provider. Responsibilities include design, deployment, and mentoring in engineering best practices.
Lead Platform Engineer enhancing Humana's advanced healthcare solutions. Overseeing enterprise platform services and driving modernization initiatives across teams and systems.
Senior Platform Engineer contributing to scalable and resilient healthcare technology and AI solutions at Humana. Focused on cloud infrastructure modernization and automation best practices for operational excellence.
Network Automation Platform Support Engineer focused on supporting and maintaining automation and data platforms at Fiserv. Involves collaboration with engineering teams for improved processes and solutions.
Senior AI Platform Engineer designing and implementing AI infrastructures at leading financial services company. Utilizing big data platforms and mentoring engineers in AI best practices.
Senior AI Product Platform Engineer at Kulu, an AI startup building onboarding agents. Responsible for product platform ownership and release - quality systems.
Intern assisting in modernization initiatives for agentic AI workflows and data platforms. Supporting the development and maintenance of data pipelines and prototyping AI use cases.
Senior Research and Development Engineer for transformer mechanical design at Hitachi Energy. Leading software development for innovative projects and collaborating within a global team.
Platform Engineer leading lifecycle management of MOM and AMHS systems across Kubernetes clusters in semiconductor industry. Collaborating with internal teams to ensure operational reliability in manufacturing.