Data Engineer designing, implementing, and optimizing data pipelines for DeepLight AI. Collaborating closely with a multidisciplinary team to analyze large-scale data.
Responsibilities
Design, build, and optimise scalable data solutions, primarily utilising the Lakehouse architecture to unify data warehousing and data lake capabilities.
Advise stakeholders on the strategic choice between Data Warehouse, Data Lake, and Lakehouse architectures based on specific business needs, cost, and latency requirements.
Design, develop, and maintain scalable and reliable data pipelines to ingest, transform, and load diverse datasets from various sources, including structured and unstructured data, streaming data, and real-time feeds.
Implement standards and tooling to ensure ACID properties, schema evolution, and high data quality within the Lakehouse environment.
Implement robust data governance frameworks (security, privacy, integrity, compliance, auditing).
Continuously optimize data storage, compute resources, and query performance across the data platform to reduce costs and improve latency for both BI and ML workloads.
Develop and maintain CI/CD pipelines to automate the entire machine learning lifecycle, from data validation and model training to deployment and infrastructure provisioning.
Deploy, manage, and scale machine learning models into production environments, utilizing MLOps principles for reliable and repeatable operations.
Establish and manage monitoring systems to track model performance metrics, detect data drift (changes in input data), and model decay (degradation in prediction accuracy).
Ensure rigorous version control and tracking for all components: code, datasets, and trained model artifacts (using tools like MLflow or similar).
Create comprehensive documentation, including technical specifications, data flow diagrams, and operational procedures, to facilitate understanding, collaboration, and knowledge sharing.
Requirements
Proven practical experience in designing, building, and optimising solutions using Data Lakehouse architectures (e.g., Databricks, Delta Lake).
Strong hands-on experience with managing data ingestion, schema enforcement, ACID properties, and utilizing big data technologies/frameworks like Spark and Kafka.
Expertise in data modeling, ETL/ELT processes, and data warehousing concepts. Proficiency in SQL and scripting languages (e.g., Python, Scala).
Demonstrated practical experience implementing MLOps pipelines for production systems. This includes a solid understanding and implementation experience with MLOps principles: automation, governance, and monitoring of ML models throughout the entire lifecycle.
Experience with CI/CD tools, containerization/orchestration technologies (e.g., Docker, Kubernetes), model serving frameworks (e.g., TensorFlow Serving, Sagemaker), and experiment tracking (e.g., MLflow).
Experience with production monitoring tools to detect data drift or model decay.
Strong hands-on experience with major cloud platforms (e.g., AWS, Azure, GCP) and familiarity with DevOps practices.
Excellent analytical, problem-solving, and communication skills, with the ability to translate complex technical concepts into clear and actionable insights.
Proven ability to work effectively in a fast-paced, collaborative environment, with a passion for innovation and continuous learning.
Benefits
Competitive salary and performance bonuses
Comprehensive health insurance
Professional development and certification support
Opportunity to work on cutting-edge AI projects
Flexible working arrangements
Career advancement opportunities in a rapidly growing AI company
Data Engineer designing and maintaining scalable ETL pipelines at Satori Analytics. Collaborating with teams to deliver high - quality analytics solutions across various industries.
Data Architect responsible for defining enterprise data architecture on AWS and Databricks Lakehouse platforms. Enabling scalable data lakes and enterprise analytics for financial services organizations.
Data Platform Operations Support leading data engineering strategy across projects for EXL. Driving innovation and optimization while collaborating with various teams in the organization.
Manager II leading data engineering projects at Navy Federal Credit Union. Overseeing data governance and quality initiatives while managing engineering teams in a hybrid work environment.
Senior Data Engineer building and maintaining data pipelines for cloud and AI solutions at Qodea. Collaborating with ML engineers and focusing on reliability and performance in a cloud - native environment.
Principal Data Engineer responsible for architecting scalable data pipelines and building high - quality data foundations. Collaborating closely with experts to ensure data readiness for advanced analytics.
Senior Data Engineer at Qodea designing scalable data pipelines and infrastructure. Delivering solutions utilizing cutting - edge tools and collaborating closely with teams for impactful results.
Senior Data Engineer designing and maintaining data pipelines for Qodea's global technology solutions. Collaborating with teams to ensure data quality and governance across platforms.
Product Director managing Target's Customer Data Platform. Leading strategy, financials, and team development to enhance guest experience through data - driven initiatives.
Senior AI Data Pipeline Engineer building scalable data pipelines and optimizing AI workflows at Trimble. Designing architectures that enhance digital construction technology across industries.