Data Curation Developer at GSK preparing high-quality data assets for R&D analysis through collaboration and technical expertise. Handle diverse datasets and ensure compliance with privacy and analysis standards in a hybrid work environment.
Responsibilities
Lead the development of business requirements for data curation through collaboration with R&D business and data platform teams
Maintain strong connections with analytical groups and R&D Data Platform teams to ensure seamless data integration and usage
Deliver pre-packaged, curated datasets aligned to business requirements for analytics
Document data specification that describes the required processing steps to generate analysis-ready datasets
Integrate diverse datasets (e.g., clinical trials, real-world data, omics) into a unified format for consistent analysis
Ensure all datasets meet analysis-ready and privacy requirements by performing necessary data curation activities
Provide coaching and peer review to ensure that the team’s work reflects industry best practices for data curation activities
Ensure that datasets are processed to meet conditions mentioned in the approved data re-use request
Write clean, readable code
Ensure that deliverables are appropriately quality controlled, documented, and can be handed over to R&D Tech team for production pipeline implementation
Requirements
BSc/MSc/PhD (or equivalent) in Computer Science, Mathematics, Statistics, or related subject
Proven experience of handling various modalities of scientific clinical data such as clinical trial data (including biomarkers), real world data (RWD), omics etc.
Experience in Python, Databricks, Delta Lake, PySpark, Pandas, other data engineering frameworks
Proven ability to handle and process large structured, semi-structured, and unstructured datasets efficiently
Strong communication skills and expertise to translate business needs into technical data requirements and processes
Ability to quantify and provide insights to business impact and value creation from data curation activities
Experience with at least one of the industry data standards such as CDISC(ODM: CDASH, SDTM, ADaM), HL7 FHIR, OMOP(CDM) etc.
Duales Studium in Elektrotechnik at OMEXOM in Berlin, focusing on future energy solutions through education and practical work. Engage in innovative projects within the energy sector with hands - on experience.
Vice President of Engineering leading a multidisciplinary engineering organization at MyFunded Futures. Establishing engineering excellence and driving strategic project delivery with a focus on platform reliability.
SQL Developer responsible for data integration tasks and managing customer data transfer processes. Working in a hybrid environment with a focus on collaboration and international teams.
Product Assurance Engineering Intern providing support for HDD, SSD, and Systems testing and analytics at Seagate. Learning problem solving and process improvement in a collaborative environment.
Dual study program in Digital Engineering in Mechanical Engineering at Roding. Involves technical system development and analysis of machines and systems.
Data Engineer designing and implementing scalable data solutions for multi - asset indices at S&P Global. Collaborating with cross - functional teams to enhance data products and analytics capabilities.
Product Manager for Commercial Forward Impact Engineering at Pfizer, leading AI and data - driven digital product development and collaboration with cross - functional teams.
VP of Engineering leading the engineering organisation for scalable SaaS solutions at Elsevier. Driving engineering strategies and governance for advanced data and AI capabilities in academic and healthcare domains.
Consulting Engineering Sales Lead driving consulting engineering sales and team leadership for DXC in Mexico. Positioning solutions effectively across industries with strategic sales leadership.