Data Engineer building and operating data pipelines for Qloo's platform. Collaborating with teams on data integrity and accessibility processes.
Responsibilities
Design, develop, and maintain batch data pipelines using Python, Spark (EMR), and AWS Glue, loading data from S3, RDS, and external sources into Hive/Athena tables.
Model datasets in our S3/Hive data lake to support analytics (Hex), API use cases, Elasticsearch indexes, and ML models.
Implement and operate workflows in Airflow (MWAA), including dependency management, scheduling, retries, and alerting via Slack.
Build robust data quality and validation checks (schema validation, freshness/volume checks, anomaly detection) and ensure issues are surfaced quickly with monitoring and alerts.
Optimize jobs for cost and performance (partitioning, file formats, join strategies, proper use of EMR/Glue resources).
Collaborate closely with data scientists, ML engineers, and application engineers to understand data requirements and design schemas and pipelines that serve multiple use cases.
Contribute to internal tooling and shared libraries that make working with our data platform faster, safer, and more consistent.
Document pipelines, datasets, and best practices so the broader team can easily understand and work with our data.
Requirements
Bachelor’s degree in Computer Science, Software Engineering, or a related field, or equivalent practical experience.
Experience with Python and distributed data processing using Spark (PySpark) on EMR or a similar environment.
Hands-on experience with core AWS data services, ideally including:
• S3 (data lake, partitioning, lifecycle management)
• AWS Glue (jobs, crawlers, catalogs)
• EMR or other managed Spark platforms
• Athena/Hive and SQL for querying large datasets
• Relational databases such as RDS (PostgreSQL/MySQL or similar)
Experience building and operating workflows in Airflow (MWAA experience is a plus).
Strong SQL skills and familiarity with data modeling concepts for analytics and APIs.
Solid understanding of data quality practices (testing, validation frameworks, monitoring/observability).
Comfortable working in a collaborative environment, managing multiple projects, and owning systems end-to-end.
Benefits
Competitive salary and benefits package, including health insurance, retirement plan, and paid time off.
The opportunity to shape a modern cloud-based data platform that powers real products and ML experiences.
A collaborative, low-ego work environment where your ideas are valued and your contributions are visible.
Flexible work arrangements (remote and hybrid options) and a healthy respect for work-life balance.
Full - Stack Data Engineer designing and optimizing complex data solutions for automotive content. Collaborating with teams to enhance user experience across MOTOR's product lines.
Principal Data Engineer designing and evolving enterprise data platform. Collaborating with analytics teams to enable AI and data products at American Tower.
BI Data Engineer II supporting scalable Lakehouse data pipelines at Boston Beer Company. Collaborating with stakeholders to drive data ingestion and maintain enterprise data quality.
Senior Data Engineer at A Kube Inc responsible for building and maintaining data pipelines for product performance. Collaborating with product, engineering, and analytics teams to ensure data quality and efficiency.
Data Engineer engineering DUAL Personal Lines’ strategic data platforms for global insurance group. Providing technical expertise in data engineering and collaborating with internal teams for solution delivery.
Data Engineer role focused on creating and monitoring data pipelines in an innovative energy company. Collaborate with IT and departments to ensure quality data availability in a hybrid work environment.
SQL Migration Data Engineer at Auxo Solutions focusing on Azure SQL/Fabric Lakehouse migrations and building data pipelines. Collaborating on technical designs and data governance for modernization initiatives.
Data Engineer developing cloud solutions and software tools on Microsoft Azure big data platform. Collaborating with various teams for data analysis and visualization in healthcare.