Data Engineer working on open-source Data Lakehouse and data pipeline development. Involves data integration and ensuring data quality in a hybrid work environment.
Responsibilities
Development and maintenance of a fully open-source Data Lakehouse.
Design and development of data pipelines for scalable and reliable data workflows to transform extensive quantities of both structured and unstructured data.
Data integration from various sources, including databases, APIs, data streaming services and cloud data platforms.
Optimisation of queries and workflows for increased performance and enhanced efficiency.
Writing modular, testable and production-grade code.
Ensuring data quality through monitoring, validation and data quality checks, maintaining accuracy and consistency across the data platform.
Elaboration of test programs.
Document processes comprehensively to ensure seamless data pipeline management and troubleshooting.
Assistance with deployment and configuration of the system.
Requirements
Master's degree in IT and minimum 9 years of relevant professional experience (or Bachelor's degree in IT and minimum 13 years of experience).
Extensive hands-on experience as Data Engineer or Data Architect in modern cloud-based open-source data platform solutions and on data analytics tools.
Excellent knowledge of data warehouse and/or data lakehouse design & architecture.
Experience with open-source, code-based data transformation tools such as dbt, Spark and Trino.
Previous experience with open-source orchestration tools such as Airflow, Dagster or Luigi.
Experience with SQL and Python.
Experience with AI-powered assistants like Amazon Q that can streamline data engineering processes.
Good knowledge of relational database systems.
Good knowledge of event streaming platforms and message brokers like Kafka and RabbitMQ.
Extensive experience in creating end-to-end data pipelines and the ELT framework.
Understanding of the principles behind storage protocols like Apache Iceberg or Delta Lake.
Proficiency with Kubernetes and Docker/Podman.
Good knowledge of data modelling tools.
Good knowledge of online analytical data processing (OLAP) and data mining tools.
Advanced English (C1) communication skills (written and spoken).
Data Engineer creating data pipelines in Databricks for a fast - growing digital banking platform. Responsible for ensuring data quality and optimising processes to support decision - making.
Data Engineer building scalable data pipelines and collaborating with teams at Ekimetrics. Involved in data quality, governance, and maintaining data integrity.
Senior Data Engineer developing data solutions and scalable systems at SimplePractice. Collaborating with teams to enhance analytics and decision - making for health and wellness clinicians.
Senior Data Engineer responsible for designing and implementing cloud - native data platforms for LPL Financial. Collaborating with stakeholders to enhance party reference data services and solutions.
Data Engineer in charge of designing and building data integration pipelines with Informatica and AWS technologies. Work collaboratively to deliver high - quality solutions in an agile environment.
Senior Software Engineer specializing in data engineering and infrastructure for cloud - native solutions at Cloudera. Leading technical direction and mentoring engineers in a high - impact role.
Senior Data Engineer at Sonatype responsible for building data pipelines and BI solutions. Collaborating with teams to design infrastructures empowering analytics and decision - making.
Lead Data Engineer responsible for driving data initiatives at Lennar, one of the nation's leading homebuilders. Manage projects, ensure scalability, and collaborate with stakeholders to meet organizational goals.
Financial Data Engineer Intern assisting with model integration and process automation at Transamerica. Focused on data engineering tasks with collaboration across IT and Finance teams.