Data Engineer designing, developing, and maintaining data products by liaising with stakeholders. Requires strong skills in Python, SQL, and big data technologies.
Responsibilities
Work with stakeholders to understand the data requirements to design, develop, and maintain complex ETL processes.
Create the data integration and data diagram documentation.
Lead the data validation, UAT and regression test for new data asset creation.
Create and maintain data models, including schema design and optimization.
Create and manage data pipelines that automate the flow of data, ensuring data quality and consistency.
Requirements
Strong knowledge on Python and Pyspark
Expectation is to have ability to write Pyspark scripts for developing data workflows.
Strong knowledge on SQL, Hadoop, Hive, Azure, Databricks and Greenplum
Expectation is to write SQL to query metadata and tables from different data management system such as, Oracle, Hive, Databricks and Greenplum.
Familiarity with big data technologies like Hadoop, Spark, and distributed computing frameworks.
Expectation is to use Hue and run Hive SQL queries, schedule Apache Oozie jobs to automate the data workflows.
Good working experience of communicating with the stakeholders and collaborate effectively with the business team for data testing.
Expectation is to have strong problem-solving and troubleshooting skills.
Expectation is to establish comprehensive data quality test cases, procedures and implement automated data validation processes.
Degree in Data Science, Statistics, Computer Science or other related fields or an equivalent combination of education and experience.
5-7 years of experience in Data Engineer.
Proficiency in programming languages commonly used in data engineering, such as Python, Pyspark, SQL.
Experience in Azure cloud computing platform, such as developing ETL processes using Azure Data Factory, big data processing and analytics with Azure Databricks.
Strong communication, problem solving and analytical skills with the ability to do time management and multi-tasking with attention to detail and accuracy.
Data Engineer/Analyst maintaining and improving data infrastructure for Braiins. Collaborating with technical and business teams to ensure reliable data flows and insights.
Medior Data Engineer handling Azure migrations for a major urban mobility client. Focused on data pipeline development and ensuring platform reliability with cutting - edge technologies.
Developing ML and computer vision solutions for cutting - edge autonomous vehicle dataset pipeline at Mobileye. Collaborating across teams for data curation and advanced perception algorithms.
Data Migration Lead in a hybrid role managing data migration for a major transformation programme in the media sector. Collaborating with various teams to ensure data integrity and successful migration.
Consultant ML & DataOps at Smile integrating data science projects for major clients. Designing MLOps solutions and enhancing data governance in a collaborative environment.
Data Engineer developing and maintaining data pipelines for Coolbet’s analytical services. Working within an Agile framework to ensure data reliability and efficiency.
API Data Engineer developing innovative data - driven solutions and advancing data architecture for AI Control Tower. Building and integrating APIs and data pipelines to support organizational needs.
Journeyman Data Architect supporting Leidos' enterprise data and analytics program for the Department of War. Collaborating on solutions for data architecture, cloud environments, and governance.
Senior Software Engineer developing backend services and data infrastructure for integrated products at Booz Allen. Collaborating with a small elite team to deliver reliable and scalable services.
AWS Streaming Data Engineer developing software and systems in a fast, agile environment. Utilizing experience with real - time data ingestion and processing systems across distributed environments.