AI/ML Data Engineer designing and building scalable data pipelines for personalized experiences at College Board. Collaborating cross-functionally to drive impactful student engagement solutions.
Responsibilities
Design, build, and own batch and streaming ETL (e.g., Kinesis/Kafka → Spark/Glue → Step Functions/Airflow) for training, evaluation, and inference use cases
Stand up and maintain offline/online feature stores and embedding pipelines (e.g., S3/Parquet/Iceberg + vector index) with reproducible backfills
Implement data contracts & validation (e.g., Great Expectations/Deequ), schema evolution, and metadata/lineage capture (e.g., OpenLineage / DataHub /Amundsen)
Optimize lakehouse/warehouse layouts and partitioning (e.g., Redshift/Athena/Iceberg) for scalable ML and analytics
Productionize training and evaluation datasets with versioning (e.g., DVC/LakeFS) and experiment tracking (e.g., MLflow)
Build RAG foundations: document ingestion, chunking, embeddings, retrieval indexing, and quality evaluation (precision@k, faithfulness, latency, and cost)
Collaborate with DS to ship models to serving (e.g., SageMaker/EKS/ECS), automate feature backfills, and capture inference data for continuous improvement
Define SLOs and instrument observability across data and model services (freshness, drift/skew, lineage, cost, and performance)
Embed security & privacy by design (PII minimization/redaction, secrets management, access controls), aligning with College Board standards and FERPA
Build CI/CD for data and models with automated testing, quality gates, and safe rollouts (shadow/canary)
Maintain docs-as-code for pipelines, contracts, and runbooks; create internal guides and tech talks
Mentor peers through design reviews, pair/mob sessions, and post-incident learning.
Requirements
4 + years in data engineering (or 3+ with substantial ML productionization)
Strong Python and distributed compute (Spark/Glue/Dask) skills
Proven experience shipping ML data systems (training/eval datasets, feature or embedding pipelines, artifact/version management, experiment tracking)
MLOps / LLMOps: orchestration (Airflow/Step Functions), containerization (Docker), and deployment (SageMaker/EKS/ECS); CI/CD for data & models
Expert SQL and data modeling for lakehouse/warehouse (Redshift/Athena/Iceberg), with performance tuning for large datasets
Data quality & contracts (Great Expectations/Deequ), lineage/metadata (OpenLineage/DataHub/Amundsen), and drift/skew monitoring
Cloud experience preferably with AWS services such as S3, Glue, Lambda, Athena, Bedrock, OpenSearch, API Gateway, DynamoDB, SageMaker, Step Functions, Redshift and Kinesis
BI tools like Tableau, Quicksight, or Looker for real-time analytics and dashboards
Security and privacy mindset; ability to design compliant pipelines handling sensitive student data
An ability to judiciously evaluate the feasibility, fairness, and effectiveness of AI solutions and articulate considerations and concerns around implementing models in the context of specific business applications
Excellent communication, collaboration, and documentation habits.
Data Architect designing end - to - end Snowflake data solutions and collaborating with technical stakeholders at Emerson. Supporting the realization of Data and Digitalization Strategy.
Manager of Data Engineering leading data assets and infrastructure initiatives at CLA. Collaborating with teams to enforce data quality standards and drive integration efforts.
Data Engineer building modern Data Lake architecture on AWS and implementing scalable ETL/ELT pipelines. Collaborating across teams for analytics and reporting on gaming platforms.
Chief Data Engineer leading Scania’s Commercial Data Engineering team for growing sustainable transport solutions. Focused on data products and pipelines for BI, analytics, and AI.
Entry - Level Data Engineer at GM, focusing on building large scale data platforms in cloud environments. Collaborating with data engineers and scientists while migrating systems to cloud solutions.
Data Engineer designing and building scalable ETL/ELT pipelines for enterprise - grade analytics solutions. Collaborating with product teams to deliver high - quality, secure, and discoverable data.
Data Engineer responsible for data integrations with AWS technology stack for Adobe's Digital Experience. Collaborating with multiple teams to conceptualize solutions and improve data ecosystem.
People Data Architect designing and managing people data analytics for Gen, delivering actionable insights for HR. Collaborating across teams to enhance data - driven decision - making.
Data Engineer role focused on shaping future connectivity for customers at Vodafone. Involves solving complex challenges in a diverse and inclusive environment.