Data Engineer responsible for data infrastructure and pipelines to support drug discovery efforts. Collaborating with scientists and engineers to facilitate data-driven insights in an innovative biotech startup.
Responsibilities
Design and implement data pipelines that harmonize, validate, and version scientific data for downstream use in modeling and analysis
Develop tools and schemas for integrating heterogeneous data types (chemical, image-based, genomic, etc)
Build and maintain scalable data storage systems and APIs to make experimental and model-derived data accessible to scientists and machine learning teams
Collaborate with ML Scientists to prepare and curate datasets for training and evaluating predictive models
Partner with Software Engineers to surface clean, well-structured data to end users through our internal and customer-facing platforms
Establish and enforce best practices for data governance, reproducibility, and lineage tracking
Requirements
4+ years of experience as a Data Engineer, ML Platform Engineer, or similar role
Proficiency building and maintaining data pipelines and ETL processes in python (e.g. using orchestration tools such as Dagster, Airflow, or Prefect)
Experience with cloud-based storage and compute (AWS S3, ECS, etc, or equivalent)
Outstanding written and oral communication skills
Interest in diving deep into the science of a drug discovery and the business of a growing startup
Nice to have: Experience managing and working with scientific data, particularly in chemistry
Benefits
Competitive salary and equity-based compensation
Comprehensive healthcare benefits (including dental and vision)
Opportunity to grow along with a rapidly scaling company
Data Management professional at Kyndryl involved in creating innovative data solutions and ensuring the seamless operation of complex data systems. Collaborating with teams to transform requirements into scalable database solutions.
Software Engineer designing and developing scalable data processing applications on cloud infrastructure for Thomson Reuters. Collaborating with Data Analysts on AI - enabled solutions for data management and insight generation.
Manager of Data Platform overseeing AWS cloud infrastructure and Snowflake data warehouses for Thomson Reuters. Leading the design and implementation of data processing applications in a hybrid role located in Bengaluru.
Senior Data Engineer designing scalable data pipelines and solutions for Enterprise Data Lake at Thomson Reuters. Collaborating across teams to ensure efficient data ingestion and accessibility.
Senior Data Engineer at Technis developing scalable data pipelines and solutions for innovative connected spaces products. Collaborating within a cross - functional team to deliver high - quality data - driven outcomes.
Data Architect designing and implementing data architectures supporting analytics and ML for federal clients. Collaborating with teams to translate mission needs into robust data solutions.
IT Data Engineer developing data pipelines and integrations for Scanfil Group's global IT organization. Collaborating across teams to enhance data solutions and reporting capabilities.
Data Engineer developing Azure data solutions at PwC New Zealand. Responsibilities include data quality monitoring, pipeline development, and collaboration with stakeholders in a supportive environment.
Senior Data Engineer designing and implementing the Enterprise Data Platform at Stellix. Focusing on analytics and insights with a growth path to Principal Data Engineer or Data Architect.