Senior Data Engineer building scalable data pipelines at Monstro, an innovative fintech company. Shape the future of data architecture in a data-intensive environment.
Responsibilities
Build and own scalable pipelines that parse and normalize unstructured sources for retrieval, knowledge graphs, and agents.
Conceive and implement novel processes for processing thousands of types of unstructured documents with accuracy and consistency.
Process semi structured sources into consistent, validated schemas.
Transform structured datasets for analytics, features, and retrieval workloads.
Create, version, and maintain multiple collections in a vector database.
Manage embeddings, metadata, and lifecycle, and tune chunking and filters for relevance and latency.
Design and implement robust multi-modal document processing systems that handle heterogeneous file formats (PDFs, images, HTML, XML).
Own ingestion from APIs, file drops, partner feeds, and scheduled jobs with monitoring, retries, and alerting.
Implement data quality checks for schema, ranges, and nulls, and document lineage and SLAs.
Stand up and harden object, relational, document, and vector stores with the right indexing and partitioning.
Build reusable libraries and services for parsing, enrichment, and embedding generation.
Handle sensitive financial and personal data with access controls, auditing, and retention policies.
Partner with product and engineering to ship features that depend on reliable data.
Document standards, coach teammates, and contribute to future hiring.
Requirements
Minimum 2 years in a dedicated Data Engineering role at an AI-native startup or 4+ years of experience in traditional Data Engineering, with ~8+ years of experience in Tech overall.
Proven ownership of end-to-end pipelines (ingestion → transformation → serving), including scalable sourcing processes, ETL pipelines, and serving services.
Experience owning and operating infrastructure in production environments.
Strong Python and SQL.
Hands on document parsing and ETL across PDFs, HTML, JSON, and XML.
Experience operating vector databases such as pgvector, Pinecone, or Weaviate, with multiple collections.
Building and scheduling ingestion via APIs, web downloads, and cron or an orchestrator, plus cloud storage and queues.
Understanding of embeddings, chunking strategies, metadata design, and retrieval evaluation.
Solid data modeling, schema design, indexing, and performance tuning across storage types.
History of implementing data quality checks, observability, and access controls for sensitive data.
Track record of delivering high-consistency systems for mission-critical data pipelines.
Ownership mindset, clear written communication, and effective collaboration with product and engineering.
Digital Analytics Capability - Adobe Data Engineer helping Bankwest with analytical foundations for digital experiences. Implementing and maintaining Adobe Experience Cloud applications for customer engagement.
AWS Data Architect overseeing enterprise data platform architecture for Signet Jewelers. Guiding engineering teams and ensuring data solutions are reliable and aligned with enterprise strategy.
MDM Data Engineer managing Profisee MDM platform and ensuring data quality in enterprise systems at Pacific Life. Collaborating with data stewards and integrating with upstream and downstream systems.
Senior/Lead Data Engineer at HOLYWATER TECH managing infrastructure for analytical platforms like BigQuery and data integration. Involves collaborations with Data Product Owners and significant engineering responsibilities.
Process Mining Data Engineer implementing Celonis across business units at LSEG. Collaborating with executives and teams to optimize operations and drive business outcomes.
Senior Data Engineer focusing on Retrieval - Augmented Generation (RAG) and AI solutions at LexisNexis. Collaborating with teams to integrate AI into existing systems and optimizing models for performance.
Senior Data Engineer on CNN's AI Enablement & Machine Learning team optimizing ML and AI experiences. Collaborating with engineers to enhance data pipelines and integrate features into platforms.
Data Engineer at Sand Cherry Associates with expertise in ETL, Python, and SQL. Responsible for designing data structures and maintaining accuracy for client projects.
Data Engineer responsible for creating and maintaining ETL pipelines at Sand Cherry Associates. Requires strong expertise in Redshift SQL, Python, and DBT with hybrid work structure.