Data Engineer II building pilot datasets and production-grade data platforms at GeoComply. Collaborating with product teams to deliver data-driven features for geolocation compliance.
Responsibilities
Build Pilot Datasets: Rapidly design and develop experimental data models and datasets to support pilot product features and validate hypotheses.
Bridge Business & Data: Collaborate closely with product managers to translate functional requirements into initial data schemas and logic.
Ad-Hoc to Self-Serve: Construct initial logic for ad-hoc data requests and evolve them into standardized, self-serve tools for the product team.
Productionize Pipelines: Take successful pilot datasets and transform them into robust, production-grade data pipelines. Refactor "pilot code" to follow best practices, ensuring high performance and data quality in the production environment.
Foundation Development: Build and maintain the internal libraries, services (Airflow), and Databricks jobs required to run these datasets at scale.
Project Management: Manage the lifecycle of data products from "epic-size" concepts through to delivery and maintenance.
Stakeholder Communication: Effectively communicate technical constraints and data insights to stakeholders during the transition from pilot to production.
Product Ideation: Actively contribute ideas on how data can drive new product benefits during weekly team calls.
Requirements
Four (4) years of relevant experience, with a focus on bridging database technology with product requirements.
You possess strong product thinking skills—the ability to analyze requirements, identify customer pain points, and build data solutions that contribute directly to the product vision.
Strong skills in Databricks, Spark/PySpark, and SparkSQL to manipulate heavy datasets during both the discovery and production phases.
Extensive experience with MySQL (for operational data) and NoSQL, with an understanding of how to model data for analytics vs. applications.
Proven ability to operate on epic-size projects, specifically managing the timeline from "proof of concept" to "delivered feature".
Strong interpersonal skills, adept at explaining complex data concepts to non-technical product stakeholders.
Familiarity with Git, Linux environments, and the Prom/Loki Stack for monitoring data health
Senior Data Engineer supporting AI - enabled financial compliance initiative with data pipelines and ingestion processes. Collaborating with diverse teams in a mission - critical regulated environment.
Data Architect leading the definition and construction of cloud data architecture for Kyndryl. Participating in significant technological modernization initiatives, focusing on Google Cloud Platform.
Senior Data Engineer driving data intelligence requirements and scalable data solutions for a global consulting firm. Collaborating across functions to enhance Microsoft architecture and analytics capabilities.
Experienced AI Engineer designing and building production - grade agentic AI systems using generative AI and large language models. Collaborating with data engineers, data scientists in a tech - driven company.
Intermediate Data Engineer designing and building data pipelines for travel industry data management. Collaborating across teams to ensure reliable data for analytics and reporting.
Data Engineer managing and organizing datasets for AI models at Walaris, developing AI - driven autonomous systems for defense and security applications.
Data Engineer designing and maintaining data pipelines at Black Semiconductor. Collaborating with process, equipment, and IT teams to support manufacturing analytics and decision - making.
Junior Data Engineer role focusing on Business Intelligence and Big Data at Avanade. Collaborating on data analysis and SQL queries in a supportive learning environment.
GCP Data Engineer designing and developing data processing modules for Ki, an algorithmic insurance carrier. Working closely with multiple teams to optimize data pipelines and reporting.
Data Engineer at Securian Financial optimizing scalable data pipelines for AI and advanced analytics. Collaborating with teams to deliver secure and accessible data solutions.