Data management

The ORCHESTRA project has developed a comprehensive approach to manage data with the dual goals of facilitating the sharing of cohort data across international borders while maintaining rigorous standards for personal data protection.

What are the methods and tools developed by ORCHESTRA?

ORCHESTRA's data management strategy incorporates:

Data Management Plan

ORCHESTRA stores and regularly updates the detailed descriptions of all the cohorts involved in the project. These data are integrated into the ORCHESTRA Data Portal, a web interface for accessing the ORCHESTRA cohort's data and adding information on the new cohorts.

Harmonisation and standardisation

ORCHESTRA uses international standard terminologies to uniformly describe over 3700 variables from different types of studies, including general population, fragile population, and healthcare workers.

The purpose is to ensure that data from various studies, collected through the Case Report Forms (CRFs), can be compared and analysed together.

ORCHESTRA shares the methods to describe clinical concepts on a publicly accessible platform, encouraging the widespread adoption of a common language in COVID-19 research.

Data Protection

GDPR compliance

The data management processes adhere to GDPR standards, ensuring that personal data is handled according to the strict privacy regulations.

Data protection

The ORCHESTRA Pseudonymisation Tool (OPT) is a crucial tool for the pseudonymisation of patient data. It generates unique codes to identify patients and samples across the whole project, creates labels for sample shipments and management, as well as handles additional patient and sample information. This system was implemented across multiple sites, registering thousands of patients and samples.

Workflow

ORCHESTRA defined and designed data workflows for each study, which include analysis of the cohorts (data providers), pre-processing on data, National Hub requirements, and type of analysis (centralised/federated) applicable.

An example of workflow applicable for the cohorts that share their data.

ORCHESTRA platform design and implementation

The National Hub (NH) is the core component of the infrastructure.

NH is intended to centralise both cohort and biosample data at a national level and to support storage, sharing and analysis on pseudonymised data, as well as retrospective and prospective data ingested from the National Data Providers.

These hubs are critical for safely storing large amounts of data and have strong security and privacy safeguards in place.

For centralised data collection, the Italian NH deployed a dedicated instance of the Electronic Data Capture (REDCap) in its infrastructure.

This approach is used in cases where data cannot be moved to  National Hubs due to the  large size of the data (e.g. genomics) or legal regulations. In this scenario, analyses are performed directly where data are stored, raw data is never exchanged.

Data Portal

The Data Portal is a web-based top-level component of the ORCHESTRA architecture that acts as a new centralized point of access for metadata and aggregated results.

It adheres to FAIR data principles and serves as a central point for updating cohort information and publications. It also redirects to other relevant services, such as requests for access to data and federated analysis, enhancing future collaborations.

ORCHESTRA has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 101016167.

Images of flatart at freepik.com