About the role

  • Update documentation of existing data pipelines to establish clear operational standards and knowledge transfer
  • Refactor legacy code to improve scalability, performance and maintainability
  • Optimise aggregation layers and transformation logic for reporting workloads
  • Reduce pipeline runtimes through strategic improvements in query optimisation, partitioning and resource allocation
  • Review and enhance data models supporting analytics, reporting and machine learning use cases
  • Design and implement efficient data structures that balance query performance with storage costs
  • Collaborate with analytics and data science teams to ensure data models meet downstream requirements
  • Implement comprehensive data quality checks and validation frameworks within pipelines
  • Develop unit tests for data transformations to ensure correctness and prevent regressions
  • Monitor data integrity throughout the pipeline lifecycle
  • Establish and maintain data quality metrics and alerting mechanisms
  • Troubleshoot pipeline failures and implement root cause analysis
  • Design and deploy preventative measures to minimize future incidents
  • Maintain pipeline uptime targets of 99%
  • Ensure SLA adherence across critical data workflows
  • Maintain data security, governance and compliance standards across all data assets
  • Implement access controls and data lineage tracking
  • Ensure adherence to regulatory requirements and internal data policies

Requirements

  • Bachelor's degree in Computer Science, Engineering, or a related field.
  • 2-4 years data engineering experience
  • Proven track record of building and maintaining production data pipelines at scale
  • Strong problem-solving skills with focus on systematic root cause analysis
  • Excellent communication skills for technical documentation and cross-functional collaboration
  • Core Stack
  • Data Warehousing: Experience with cloud data warehouses (Redshift, BIgQuery) or lakehouse architectures
  • AWS Cloud Services: Working knowledge of S3, IAM, EC and related services
  • Programming: Strong Python skills with focus on PySpark, SQL, Scala for ETL
  • DevOps & Engineer Practices
  • Containerisation (Docker, Kubernetes)
  • CI/CD pipelines and version control (Git, GitHub Actions)
  • Preferred Qualifications
  • cExperience optimisation aggregation layers for high-volume reporting workloads
  • Familiarity with Delta Lake and lakehouse design patterns
  • Background in distributed computing and Spark performance tuning
  • Infrastructure as code and automated deployment practices

Job title

Data Engineer

Job type

Experience level

JuniorMid level

Salary

Not specified

Degree requirement

Bachelor's Degree

Location requirements

Report this job

See something inaccurate? Let us know and we'll update the listing.

Report job