As genomic research rapidly advances, so does the volume of data being produced. From understanding genetic diseases to developing precision medicine treatments, genomic data offers unprecedented potential for improving healthcare. However, to fully harness its power, we must overcome significant barriers to data sharing. In a recent publication in Nature Reviews Genetics, Zornitza Stark and her team call for a concerted effort to scale up genomic data sharing. They present a roadmap to enhance the accessibility, usability, and security of genomic data worldwide, ensuring equitable benefits across populations.
The Importance of Genomic Data Sharing
Genomic data sharing is crucial for accelerating scientific discovery and enhancing clinical outcomes. When researchers and clinicians have access to large-scale genomic datasets, they can identify genetic mutations linked to diseases more accurately, develop targeted therapies, and diagnose conditions more quickly. Moreover, sharing genomic data across borders helps scientists understand the genetic diversity of different populations, enabling more inclusive research and reducing health disparities.
The challenge, however, lies in making these vast and varied datasets accessible while ensuring privacy, security, and proper governance. The current state of genomic data sharing is fragmented, with varying degrees of accessibility depending on the country, institution, or specific dataset. Without a coordinated global effort, we risk underutilizing the wealth of data generated by ongoing research.
Current Models of Genomic Data Sharing
The paper highlights three main models of genomic data sharing:
Aggregated Databases: These centralized platforms combine data from multiple sources into a single, harmonized database. One prominent example is the Genome Aggregation Database (gnomAD), which contains over 800,000 genomic sequences from individuals worldwide. Aggregated databases like gnomAD enable large-scale variant interpretation and improve the clinical assessment of genetic variations.
Federated Data Systems: In a federated system, individual institutions or countries retain control of their datasets but allow researchers to access and analyze the data remotely. This method provides a balance between data accessibility and privacy. Tools like the Matchmaker Exchange enable federated searches, allowing researchers to identify disease-causing genes by querying across multiple international platforms without directly sharing the data.
Cloud-Based Environments: Trusted research environments (TREs) and secure data environments (SDEs) are cloud-based platforms where genomic data is stored and analyzed. These environments democratize access to large datasets by lowering technical barriers and ensuring compliance with data protection regulations. The UK Biobank and Genomics England are examples of initiatives using cloud-based environments to facilitate genomic research.
Challenges in Scaling Up Genomic Data Sharing
Despite the success of these models, several obstacles remain that prevent the routine cross-analysis of large genomic datasets. The authors identify four key challenges that need to be addressed to scale up data sharing:
1. Consent, Engagement, and Trust
Sharing genomic data involves sensitive personal information, so it is critical to respect the consent and expectations of participants. Different countries and institutions have varying approaches to consent, often limiting data use to specific purposes or geographic regions. For data sharing to expand, participant communities must be involved in decisions about how their data is used and by whom. Building trust with communities, particularly historically marginalized groups, is essential for expanding the scope of genomic research.
2. Technical Barriers and Standardization
Genomic data exists in a variety of formats, making it difficult to combine or analyze datasets from different sources. Standards developed by organizations like the Global Alliance for Genomics and Health (GA4GH) are helping to harmonize data formats and analytical methods, but inconsistencies remain. Furthermore, tools like the Observational Medical Outcomes Partnership (OMOP) common data model are helping to structure health data for analysis, but they still face interoperability challenges due to differences in data vocabulary.
3. Policy and Governance
National policies and institutional interests often restrict genomic data sharing, particularly when intellectual property or national security concerns are involved. Implementing global policies that mandate or incentivize data sharing can help overcome these barriers. For instance, the NHS Genomic Medicine Service in the UK has created a framework for sharing clinical genomic data with researchers while ensuring privacy and data security.
4. Equity, Diversity, and Inclusion
To ensure that the benefits of genomic research are distributed equitably, it is essential to include diverse populations in genomic studies. Many genomic datasets are skewed towards individuals of European ancestry, limiting the applicability of research findings to other populations. Efforts like the All of Us Research Program in the USA and the Genomics England Diverse Data Initiative aim to address this issue by recruiting participants from underrepresented groups. However, more must be done to ensure that genomic research benefits everyone, not just those in well-represented populations.
Accelerating Discovery Through Cross-Cohort Analysis
One of the most exciting possibilities for the future of genomic research is cross-cohort analysis—the ability to analyze data from multiple large cohorts simultaneously. This approach can accelerate the discovery of gene-disease associations and provide more robust insights into complex conditions. The paper highlights successful examples of cross-cohort analysis, such as a study conducted by the UK Biobank and the All of Us program, which analyzed lipid levels across different populations and identified significant genetic variations.
However, cross-cohort analysis requires the development of new policies, tools, and standards to ensure data privacy and interoperability. The authors emphasize that while the potential for discovery is enormous, the costs and complexities associated with cross-cohort analysis increase with the number of datasets involved.
A Call to Action: 12 Steps to Scale Up Genomic Data Sharing
To address these challenges and unlock the full potential of genomic data, the authors outline 12 actions that stakeholders—researchers, policymakers, funders, and health system leaders—can take to systematically scale up genomic data sharing:
Engage participants in data governance and ensure their expectations are met.
Advocate for cross-cohort analysis to accelerate discovery.
Promote best practices for data sharing among cohort programs.
Expand data sharing initiatives from local to global levels.
Start small pilot programs to demonstrate the benefits of data sharing.
Share tools and standards for data sharing and encourage others to adopt them.
Promote common infrastructure across research portfolios.
Create incentives for cohorts to share their data.
Simplify processes to lower barriers to data access.
Advocate for the inclusion of diverse populations in genomic studies.
Build bridges between research and healthcare.
Support the integration of genomic data into clinical care for precision medicine.
The authors conclude by stressing the importance of acting now to expand genomic data sharing. The global community has already made great strides in producing and analyzing genomic data, but the real challenge lies in harmonizing and sharing this data for maximum impact. By implementing the actions outlined in this roadmap, we can ensure that genomic research fulfills its promise of transforming healthcare for all populations.
References:
Stark, Z., Glazer, D., Hofmann, O. et al. A call to action to scale up research and clinical genomic data sharing. Nat Rev Genet (2024). https://doi.org/10.1038/s41576-024-00776-0
-Written by Sohni Tagore
Komentarze