Founding Data Engineer at Elicit | Hybrid Hired

About the role

Architect and implement robust, scalable systems to handle data ingestion while maintaining high performance and quality
Build and optimize academic research paper pipeline: efficiently deduplicate hundreds of millions of research papers and calculate embeddings
Make Elicit the most complete and up-to-date database of scholarly sources
Expand the datasets Elicit works over (court documents, SEC filings, spreadsheets, presentations, audio, video, etc.) and ingest less-structured documents
Define and build secure, reliable, fast, and auditable private data connectors for customers
Preprocess and prepare data to make it useful to models; work with ML engineers and evaluation experts to find, gather, version, and apply datasets for training
Lead data pipeline optimization and enhancement projects and contribute to CI/CD, monitoring, and documentation
Collaborate with cross-functional teams and spend regular in-person time with teammates (approx. 1 week every 6)

Requirements

5+ years of experience as a data engineer: owning make-or-break decisions about how to ingest, manage, and use data
Strong proficiency in Python (5+ years experience)
You have created and owned a data platform at rapidly-growing startups—gathering needs from colleagues, planning an architecture, deploying the infrastructure, and implementing the tooling
Experience with architecting and optimizing large data pipelines, ideally with particular experience with Spark
Strong SQL skills, including understanding of aggregation functions, window functions, UDFs, self-joins, partitioning, and clustering approaches
Experience with columnar data storage formats like Parquet
Strong opinions, weakly-held about approaches to data quality management
Creative and user-centric problem-solving
You should be excited to play a key role in shipping new features to users—not just building out a data platform!
Nice to have: experience developing deduplication processes for large datasets
Nice to have: hands-on experience with full-text extraction and processing from various document formats (PDF, HTML, XML, etc.)
Nice to have: familiarity with machine learning concepts and their application in search technologies
Nice to have: experience with distributed computing frameworks beyond Spark (e.g., Dask, Ray)
Nice to have: experience in science and academia: familiarity with academic publications
Nice to have: hands-on experience with Airflow, DBT, or Hadoop
Nice to have: experience with data lake, data warehouse, or lakehouse paradigms

Benefits

Flexible work environment: work from our office in Oakland or remotely with time zone overlap (between GMT and GMT-8), as long as you can travel for in-person retreats and coworking events
Fully covered health, dental, vision, and life insurance for you, generous coverage for the rest of your family
Flexible vacation policy, with a minimum recommendation of 20 days/year + company holidays
401K with a 6% employer match
A new Mac + $1,000 budget to set up your workstation or home office in your first year, then $500 every year thereafter
$1,000 quarterly AI Experimentation & Learning budget
A team administrative assistant who can help you with personal and work tasks
Above-market equity and employee-friendly equity terms (10-year exercise period)

Similar roles

Browse all Data Engineer jobs

2 hours ago

NI

NinjaTraderSenior Data Engineer

Data Engineer developing architecture and pipelines for data analytics at NinjaTrader. Empowering analysts and improving business workflows through data - driven solutions.

Hybrid Role

Chicago United States Data Engineer

$140,000 - $160,000 per year

4 hours ago

SP

SpecsaversData Engineer

Data Engineer designing and developing data solutions at Specsavers. Collaborating on data pipelines and migrations with analysts and stakeholders.

Onsite Role

Fareham United Kingdom Data Engineer

5 hours ago

AL

AlterricData Engineer

Data Engineer joining Alterric to collaborate on data platform projects and analytics solutions. Working with Azure Cloud technologies to ensure data quality and integrity for informed decision - making.

Hybrid Role

Germany Data Engineer

7 hours ago

KY

KyndrylELK Data Engineer

Data Engineer at Kyndryl transforming raw data into actionable insights using ELK Stack. Responsible for developing, implementing, and maintaining data pipelines and processing workflows.

Hybrid Role

Bangalore India Data Engineer

10 hours ago

TC

The Clorox CompanySenior Data Engineer

Senior Data Engineer at Clorox developing cloud - based data solutions. Leading data engineering projects and collaborating with business stakeholders to optimize data flows.

Hybrid Role

Pleasanton United States Data Engineer

$128,000 - $252,200 per year

18 hours ago

FT

FCamara Consulting & TrainingData Engineer – AWS Stack

Data Engineer building solutions on AWS for high - performance data processing. Leading initiatives in data architecture and analytics for operational support.

Hybrid Role

Barueri Brazil Data Engineer

21 hours ago

SI

SilaeSenior Data Engineer

Senior Data Engineer overseeing Databricks platform integrity, optimizing data practices for efficient usage. Leading teams on compliance while mentoring a junior Data Engineer.

Hybrid Role

Paris France Data Engineer

yesterday

EX

EXLAssociate Data Engineer – Application Development

Associate Data Engineer contributing to software applications development and maintenance using Python. Collaborating with teams for clean coding and debugging practices in Pune, India.

Hybrid Role

Pune India Data Engineer

yesterday

EX

EXLLead Assistant Manager – Data Engineering, Cloud Data Engineering

Data Engineer focusing on development and optimization of data pipelines in an insurance context. Ensuring data integrity and supporting data - driven decision - making processes.

Hybrid Role

India Data Engineer

yesterday

EX

EXLManager – Data Engineering, Big Data Engineering

Lead Data Engineer responsible for delivering scalable cloud - based data solutions and managing cross - functional teams. Collaborating with global stakeholders and ensuring high - quality project execution in a fast - paced environment.

Hybrid Role

India Data Engineer