Analysis Commons: A Collaborative Approach to Multi-Omics Discovery

Advances in DNA sequencing have created large databases of whole-genome sequence (WGS) and multi-omics data, enabling new opportunities to explore how the genome plays a role in regulating human health. In order to translate these massive and diverse datasets into a deeper understanding of disease risk and therapeutic response, a new approach to genomic analysis and data management is required. Research projects are transitioning from siloed datasets and investigators working in isolation, to the establishment of large pooled datasets, spanning multiple studies and institutions, all harmonized and integrated with multi-omic and phenotypic data.

The recent paper, Analysis commons, a team approach to discovery in a big-data environment for genetic epidemiology, published last October in Nature Genetics highlights the need for a collaborative approach when embracing data diversity at scale.  The Analysis Commons, a cloud-based solution, developed on the DNAnexus Platform, provides an collaborative environment where private investigators can co-develop and validate tools, which are then made available to the greater scientific community. The Analysis Commons framework addresses the challenges multi-center WGS analysis projects face enabling translation of massive multi-omics data into actionable insights.

The challenges of multi-center WGS analysis projects, such as National Heart, Lung, and Blood Institute (NHLBI) Trans-Omics for Precision Medicine (TOPMed), the Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortium and the Centers for Common Disease Genomics (CCDG), are steep and include the need for better data-sharing mechanisms, data harmonization, integrated multi-omics analyses, annotation and computational flexibility. A cloud-based solution was developed because it can provide extensive computational resources and permission-based control for researchers globally. The feasibility of physically shipping large datasets to hundreds of researchers is just not practical.

Another major hurdle the Analysis Commons fulfilled is the ability to not only combine data across studies and institutions but also to share legacy data among participating investigators from multiple institutions. This requires mechanisms in place to provide authorized investigators access to sensitive data, while maintaining robust security protocols. Two methods are implemented to address data security:

  1. Individual studies secure institutional approval to share data with a consortium through a single “consortium agreement”.
  2. National Center for Biotechnology Information (NCBI) database of Genotypes and Phenotypes (dbGaP) system is leveraged to control access and coordinate authorization and data sharing across approved collaborators.

Data security is not a nice-to-have when it comes to sensitive data, it is essential. The Analysis Commons allows for the secure storage of datasets, and data management and analysis. Platform features such as two-factor authentication, end-to-end encryption, need-based network access control, 24/7 security monitoring and updates, audit and access logging provide the industry’s most comprehensive security and privacy framework (including ISO 27001, HIPAA, CLIA, CAP, and GCP).  

An example set of Analysis Commons pipeline apps are available on the DNAnexus Platform. To access the pipeline, users can login to DNAnexus and create a new project. Users can then copy the public Analysis Commons toolset and demonstration files, located in the Analysis Commons Project, into the user’s own project to get started. The Analysis Commons toolset is also available on github.

The Analysis Commons is providing a strong framework that will enable experts to convene and collaborate and support each other in the integration and translation of massive quantities of WGS and phenotypic data. DNAnexus is proud to support the Analysis Commons and researchers around the world  focused on accelerating the promise of precision medicine.


PMWC 2018: Leveraging Multi-Omic Datasets in Discovery & Clinical Trials

The Precision Medicine World Conference kicks off next week at the Computer History Museum in Mountain View, California. The program traverses innovative technologies, thriving initiatives, and clinical case studies that enable the translation of precision medicine into direct improvements.

Please join us for a lively panel discussion around scalable infrastructure/platforms that integrate next-generation sequencing (NGS) and other data (e.g. phenotypic) to power discovery in Pharma and the clinic.

Title: Scalable NGS Infrastructure/Platforms
Talk Details: Track 1 – Monday, January 22 at 10:30am
Moderator: Brady Davis, Chief Strategy Officer, DNAnexus

Panel Speakers:

  • AstraZeneca/MedImmune – David Fenstermacher, VP BioInformatics
  • Sutter Health – Greg Tranah,  Director, CPMC Research Institute, Adjunct Professor Dept. of Epidemiology & Biostatistics, UCSF
  • Carol Franc Buck Breast Cancer Center at UCSF– Laura Esserman, Director
  • City of Hope – Sorena Nadaf, SVP & CIO


Health care providers increasingly require multi-omic datasets, including phenotypic data informed by genomic data. Such data needs to be obtained in an economically sustainable way and made available on an agile user-friendly platform so that these data may inform clinical care and lead to health improvements.

Pharmaceutical companies (“Pharmas”) are interested in obtaining datasets containing phenotypic/clinical and genomic information generated from patient cohorts of specific disease areas. Such datasets can help Pharma researchers identify drug targets or find biomarkers, validate hypotheses related to the interaction of genomics with disease or with specific therapies, and identify candidate populations for future clinical trials. Payers are also interested in the outcomes related to new discoveries and therapies in order to reimburse for these treatments.

This session will focus on how both healthcare provider organizations, Pharmas and Payers are working toward solving these complex and challenging problems from a technical and business model perspective.


Rady Children’s Quest to Finding That Needle in a Haystack

Rady Children’s Institute for Genomic Medicine (RCIGM), located in San Diego, has announced a pioneering effort to deliver life-changing genetic diagnoses for children suffering from rare diseases. Led by president and CEO, Dr. Stephen Kingsmore, Rady is building an end-to-end clinical whole genome data analysis solution, built on the DNAnexus Platform, for children’s hospitals nationally.

The impact of diagnosis by WGS is often life changing. The team routinely tests critically ill children for over 5,000 diseases, of which more than 500 have highly effective treatments. For example, if the test reveals a mutation in a gene involved in digestion, causing the inability to process a particular nutrient thereby leading to buildup of a poisonous byproduct, a simple change in diet can limit the effects of the disease. The sooner this condition can be diagnosed the less damage the child will suffer. In these cases, minutes literally matter.

Dr. Kingsmore’s vision is to ensure genome-powered diagnosis is accessible to every child who needs it. Building a world-class pipeline at a single hospital isn’t enough. RCIGM needed a solution that could scale and be deployed at institutions around the world. DNAnexus provides the technology and expertise that allows RCIGM to grow an innovative pediatric-focused genomics network, distribute its clinical tools and collaborate with colleagues in a secure and compliant environment.

This work was done as part of RCIGM’s collaboration with the The Newborn Sequencing In Genomic medicine and public HealTh (NSIGHT) program. NSIGHT addresses how genomic sequencing can replicate or augment known screening results for newborn disorders, what knowledge sequencing can provide for conditions not currently screened, and what additional clinical information could be learned from sequencing relevant to the clinical care of newborns. The NSIGHT program is funded by the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD) and the National Human Genome Research Institute (NHGRI), components of the National Institutes of Health.

DNAnexus provides a flexible platform that connects Edico Genome’s ultra-fast variant calling algorithms with Fabric Genomics’ interpretation software, and integrates seamlessly with Rady Children’s custom data interpretation portal. Users monitor jobs, organize and share data, and compare patients’ data to a diagnostic resource within the network. At DNAnexus, we are proud to support Dr. Kingsmore and RCGIM’s endeavor to prevent, diagnose, and treat childhood diseases through genomics research.