Advances in DNA sequencing have created large databases of whole-genome sequence (WGS) and multi-omics data, enabling new opportunities to explore how the genome plays a role in regulating human health. In order to translate these massive and diverse datasets into a deeper understanding of disease risk and therapeutic response, a new approach to genomic analysis and data management is required. Research projects are transitioning from siloed datasets and investigators working in isolation, to the establishment of large pooled datasets, spanning multiple studies and institutions, all harmonized and integrated with multi-omic and phenotypic data.
The recent paper, Analysis commons, a team approach to discovery in a big-data environment for genetic epidemiology, published last October in Nature Genetics highlights the need for a collaborative approach when embracing data diversity at scale. The Analysis Commons, a cloud-based solution, developed on the DNAnexus Platform, provides an collaborative environment where private investigators can co-develop and validate tools, which are then made available to the greater scientific community. The Analysis Commons framework addresses the challenges multi-center WGS analysis projects face enabling translation of massive multi-omics data into actionable insights.
The challenges of multi-center WGS analysis projects, such as National Heart, Lung, and Blood Institute (NHLBI) Trans-Omics for Precision Medicine (TOPMed), the Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortium and the Centers for Common Disease Genomics (CCDG), are steep and include the need for better data-sharing mechanisms, data harmonization, integrated multi-omics analyses, annotation and computational flexibility. A cloud-based solution was developed because it can provide extensive computational resources and permission-based control for researchers globally. The feasibility of physically shipping large datasets to hundreds of researchers is just not practical.
Another major hurdle the Analysis Commons fulfilled is the ability to not only combine data across studies and institutions but also to share legacy data among participating investigators from multiple institutions. This requires mechanisms in place to provide authorized investigators access to sensitive data, while maintaining robust security protocols. Two methods are implemented to address data security:
- Individual studies secure institutional approval to share data with a consortium through a single “consortium agreement”.
- National Center for Biotechnology Information (NCBI) database of Genotypes and Phenotypes (dbGaP) system is leveraged to control access and coordinate authorization and data sharing across approved collaborators.
Data security is not a nice-to-have when it comes to sensitive data, it is essential. The Analysis Commons allows for the secure storage of datasets, and data management and analysis. Platform features such as two-factor authentication, end-to-end encryption, need-based network access control, 24/7 security monitoring and updates, audit and access logging provide the industry’s most comprehensive security and privacy framework (including ISO 27001, HIPAA, CLIA, CAP, and GCP).
An example set of Analysis Commons pipeline apps are available on the DNAnexus Platform. To access the pipeline, users can login to DNAnexus and create a new project. Users can then copy the public Analysis Commons toolset and demonstration files, located in the Analysis Commons Project, into the user’s own project to get started. The Analysis Commons toolset is also available on github.
The Analysis Commons is providing a strong framework that will enable experts to convene and collaborate and support each other in the integration and translation of massive quantities of WGS and phenotypic data. DNAnexus is proud to support the Analysis Commons and researchers around the world focused on accelerating the promise of precision medicine.