Senior Data Engineer building scalable data pipelines at Monstro, an innovative fintech company. Shape the future of data architecture in a data-intensive environment.
Responsibilities
Build and own scalable pipelines that parse and normalize unstructured sources for retrieval, knowledge graphs, and agents.
Conceive and implement novel processes for processing thousands of types of unstructured documents with accuracy and consistency.
Process semi structured sources into consistent, validated schemas.
Transform structured datasets for analytics, features, and retrieval workloads.
Create, version, and maintain multiple collections in a vector database.
Manage embeddings, metadata, and lifecycle, and tune chunking and filters for relevance and latency.
Design and implement robust multi-modal document processing systems that handle heterogeneous file formats (PDFs, images, HTML, XML).
Own ingestion from APIs, file drops, partner feeds, and scheduled jobs with monitoring, retries, and alerting.
Implement data quality checks for schema, ranges, and nulls, and document lineage and SLAs.
Stand up and harden object, relational, document, and vector stores with the right indexing and partitioning.
Build reusable libraries and services for parsing, enrichment, and embedding generation.
Handle sensitive financial and personal data with access controls, auditing, and retention policies.
Partner with product and engineering to ship features that depend on reliable data.
Document standards, coach teammates, and contribute to future hiring.
Requirements
Minimum 2 years in a dedicated Data Engineering role at an AI-native startup or 4+ years of experience in traditional Data Engineering, with ~8+ years of experience in Tech overall.
Proven ownership of end-to-end pipelines (ingestion → transformation → serving), including scalable sourcing processes, ETL pipelines, and serving services.
Experience owning and operating infrastructure in production environments.
Strong Python and SQL.
Hands on document parsing and ETL across PDFs, HTML, JSON, and XML.
Experience operating vector databases such as pgvector, Pinecone, or Weaviate, with multiple collections.
Building and scheduling ingestion via APIs, web downloads, and cron or an orchestrator, plus cloud storage and queues.
Understanding of embeddings, chunking strategies, metadata design, and retrieval evaluation.
Solid data modeling, schema design, indexing, and performance tuning across storage types.
History of implementing data quality checks, observability, and access controls for sensitive data.
Track record of delivering high-consistency systems for mission-critical data pipelines.
Ownership mindset, clear written communication, and effective collaboration with product and engineering.
Senior Data Engineer designing and implementing sustainable data solutions for diverse clients. Collaborating closely with stakeholders to enhance data services and platforms in a hybrid environment.
Risk Data Engineer and Architect at Lincoln Financial supporting risk analytics through AWS data solutions. Building scalable data pipelines and collaborating with cross - functional teams.
Senior Data Engineer designing secure and scalable data systems for maritime and defense applications. Seeking experienced professional with strong expertise in AWS and Azure environments.
Data Engineer managing payment processing and data accuracy while collaborating with financial teams. Building and optimizing data pipelines for transactional data in a hybrid work environment.
Data Engineer building analytical tools for Dry Bulk market data operations at Kpler. Join a team of over 700 experts transforming data into actionable strategies.
Data Engineer developing tools for maintaining data integrity in cargo tracking at Kpler. Collaborating with analysts and engineers to enhance data quality management.
Lead Azure Data Engineer designing and optimizing data ecosystems on Microsoft Cloud. Responsible for building scalable data platforms and pipelines for analytics and reporting.