Founding Staff Data Engineer building and leading data engineering team for AI-driven art valuation platform. Establishing architecture and standards for data systems and pipelines.
Responsibilities
Build the data team from scratch —define the hiring roadmap, recruit and onboard your first 2–3 data engineers, and establish the team’s culture, standards, and ways of working
Own the entire data platform architecture from day one—make the critical decisions on storage layers, processing frameworks, orchestration, and data modeling patterns
Define technical standards and best practices for data quality, testing, documentation, lineage, and governance
Lead system design for complex problems involving large-scale ingestion, entity resolution, LLM-powered data extraction, and real-time analytics
Evaluate and adopt new technologies that improve data velocity, quality, reliability, or capabilities
Establish data governance frameworks including versioning, reproducibility, validation, and compliance
Design and operate scalable data ingestion and web scraping systems, including best practices around retries, proxies, rate limiting, and anti-bot strategies
Build batch and real-time pipelines to normalize, enrich, deduplicate, and version data across structured and unstructured sources
Architect systems to support LLM- and ML-based document parsing, OCR, entity extraction, and classification at scale
Own the data storage and processing stack, including PostgreSQL, data lakes, data warehouses, and vector databases
Operationalize AI/ML workflows by preparing clean training and inference datasets with robust lineage, validation, and error handling
Design and maintain data models that serve backend APIs, valuation services, analytics dashboards, and public indices
Contribute to infrastructure tooling, including CI/CD, IaC (Terraform), data observability, and cost management
Co-own data platform vision with Head of Engineering: collaborate daily on architecture, technical roadmap, and engineering standards
Partner with backend engineers to define API contracts, data serving patterns, and integration points between pipelines and application services
Collaborate with product and domain experts to translate business requirements into reliable, well-modeled datasets
Work with company leadership (Head of Engineering, CPO, President) on data strategy, hiring, and long-term platform vision
Communicate technical decisions clearly to non-technical stakeholders
Requirements
B.S. in Computer Science or equivalent
7+ years of data engineering experience with at least 2+ years in a technical lead, staff, or principal role at a high-growth startup or product company
Expert in Python and SQL, with deep understanding of performance, data modeling, and processing patterns
Strong database expertise (PostgreSQL or similar) including query optimization, schema design, indexing, and partitioning strategies
Deep experience with pipeline orchestration tools like Airflow, Dagster, Prefect, or Temporal
Hands-on experience designing and maintaining web scraping systems at scale, including retries, proxies, and anti-bot strategies
Production experience integrating structured and unstructured sources, with a track record of resolving messy, real-world data challenges
Hands-on experience with LLM/AI integration in data workflows —you’ve built pipelines using OpenAI, Anthropic, or open-source models for document understanding, NLP, entity extraction, or classification
Deep knowledge of data architecture patterns including ETL vs. ELT, data lakes vs. warehouses, batch vs. streaming, and schema evolution
Production experience with AWS (or GCP/Azure) including compute, storage, networking, and managed data services
Strong DevOps fundamentals: Docker, Terraform, CI/CD, and data observability/monitoring
Benefits
A Welcoming Team
Generous paid time off, including vacation, sick days, and holidays
Senior Data Engineer responsible for developing, maintaining ETL processes and integrating data solutions. Collaborating with teams on data quality and cloud migration initiatives.
Data Engineer optimizing data architectures and pipelines at Nexu. Focused on building reliable and efficient data flows while collaborating with cross - functional teams.
Senior Software Engineer designing and maintaining scalable data solutions for restaurant tech industry at SpotOn. Collaborating with cross - functional teams to enhance reporting and analytics platforms.
Data Architect needed to define and evolve data architecture supporting scientific compute at EIT. Collaborate and lead in large - scale research environments for transformative scientific challenges.
Engineering Data Coordinator leading a data engineering team in Azure and Databricks at Deroyque. Focusing on project management, quality assurance, and team development based in Campinas.
Data Migration Specialist managing ongoing Salesforce data quality initiatives for Abby Care. Executing and validating data migrations while ensuring data accuracy.
Lead Data Engineer overseeing and managing the Data Engineering team. Developing ETL pipelines and ensuring data integrity within Cloud (Azure) infrastructure.
Data Engineer designing and optimizing data solutions for Qualco Intelligent Finance. Focus on data integrity, consistency, and reusability in analytics deliverables within a hybrid environment.
Data Architect leading data architecture and design for LifeByte's technology ecosystem. Collaborating with multiple teams to ensure robust data governance, compliance, and innovative strategies.