About the role

  • Senior Data Engineer leveraging big data and cloud expertise to build data pipelines at Alight. Ensuring reliability, governance, and operational excellence across data platforms.

Responsibilities

  • Design, build, and maintain high‑volume ETL/ELT pipelines across Hadoop (HDFS, Hive, Spark, Kafka) and AWS (Glue, EMR, Lambda, Step Functions, Redshift)
  • Develop distributed data processing solutions using PySpark, Spark SQL , and scalable cloud serverless patterns
  • Implement reusable data ingestion frameworks for batch (Sqoop, Hive, Spark) and streaming (Kafka, Kinesis)
  • Optimize data workflows using partitioning, bucketing, compression, file formats (Parquet/ORC)
  • Understanding hybrid data lake architectures using S3 + HDFS , ensuring governance consistency (Atlas, Ranger, Lake Formation)
  • Understanding the reporting requirements and perform data profiling and create design for same
  • Create data flow diagram and do data modelling
  • Job orchestration using Airflow, Control‑M, Step Functions , or event-driven triggers
  • Understand auto-scaling, capacity planning, and performance tuning on EMR and Spark clusters
  • Ensure data is protected and compliant with regulatory standards
  • Work closely with business stakeholders to enable high‑quality datasets
  • Provide technical leadership in architecture decisions, code reviews, and best‑practice adoption and provide technical guidance to peers/juniors in team
  • Improve reliability, scalability, and performance through automation, autoscaling, and capacity planning
  • Own deployment, incident response, and post-incident reviews for production environments, troubleshooting Spark performance issues, job failures, and cluster bottlenecks
  • Understanding security best practices (IAM, KMS, security groups, WAF, parameter/secret management)
  • Optimize cost and usage of AWS resources and recommend architecture improvements
  • Collaborate closely with developers, QA, and product teams to streamline release processes

Requirements

  • Strong experience from 5-8 years with the Hadoop ecosystem (HDFS, Hive, Spark, YARN, Kafka)
  • Strong hands-on expertise in Scala, PySpark , Spark optimization techniques, HiveQL, and distributed computing
  • Good work experience in SQL in hive and impala
  • Good understanding of AWS data stack (S3, Glue, EMR, Lambda, Kinesis, Redshift, Step Functions)
  • Proficiency in at least one scripting/programming language: Python, Shell scripting
  • Strong experience with CI/CD , GitHub, Git commands
  • Expertise in ETL and Data Warehousing and cloud concepts
  • Good understanding of data modelling (star/snowflake), partitioning strategies, and schema evolution
  • Expertise in data profiling and decision making
  • Able to understand, design and create data flow diagrams and do data modelling
  • Hands-on experience with Airflow, Control‑M , or other orchestrators
  • Well versed with security and compliance aspects in Cloud
  • Good understanding of AWS networking (VPC, subnets, routing, SGs, NACLs)
  • Familiarity with serverless patterns and containerization (Docker, ECS/EKS)
  • Experience with monitoring/logging tools and incident management practices

Benefits

  • Options include a variety of health coverage options
  • Wellbeing and support programs
  • Retirement
  • Vacation and sick leave
  • Maternity, paternity & adoption leave
  • Continuing education and training
  • Several voluntary benefit options

Job title

Senior Data Engineer

Job type

Experience level

Senior

Salary

Not specified

Degree requirement

Bachelor's Degree

Location requirements

Report this job

See something inaccurate? Let us know and we'll update the listing.

Report job