Supporting Innovative Open Access Cancer Genomics Pilot

In January 2015, President Obama unveiled the Precision Medicine Initiative, an audacious research effort to revolutionize how we practice medicine and ultimately improve human health. Nearly one year later, US Vice President Joe Biden announced the $1 billion Moonshot to Cure Cancer, aiming to translate advances in genomics into treatments. To see these initiatives to fruition, we all need to work together to coordinate across silos and increase access to information.

Open access for sharing genomic data is not a new idea. The completion of the Human Genome Project and the 1000 Genomes Project showed us how the broad sharing of data generated by genomic research can maximize utility.  At DNAnexus, we believe in fostering a culture of openness in genomic research to allow for medical breakthroughs. There are many troves of genomic data, but the mechanism for combining them is far from ideal.

Most data sharing in cancer genomics research has been centralized through rich, yet controlled-access databases like The Cancer Genome Atlas (TCGA) or International Cancer Genome Consortium (ICGC) –both of which properly approved researchers can easily access on the DNAnexus Platform. The access restrictions are structured with the worthy goal of protecting the privacy of individuals donating their samples and data to science, since access to genomic data could hypothetically lead to their re-identification. But arguably, by limiting the access to these datasets we are hampering faster progress and greater reach to patients.

In this spirit, an open access (OA) pilot for freely sharing cancer genomic data was established by a research team at the Human Genome Sequencing Center at Baylor College of Medicine and Texas Cancer Research Biobank (TCRB). For the first time, genomic sequencing data from seven human cancer cases with matched normal are freely available to anyone. Users of the data are simply asked to not attempt to re-identify the participants.

Beyond the dataset itself, the pilot project’s salient contribution is the process developed for participant education and consent. Can cancer patients –with all the physical and psychological challenges they endure, and usually without extensive prior background in biology and genetic privacy– give truly informed consent for the benefits and risks of open-access data sharing? The rigorous protocol applied in this pilot indicates that many indeed possess the capacity, and the desire too.



Controlled-access research datasets will remain a reality and DNAnexus will continue our recognized leadership in cloud security and protection for both research and clinical applications. But the open-access TCRB cases –and, we hope, others like it to come– provide an opportunity for the research community to freely experiment with “real” cancer genomics data, rather than artificial simulations, and refine methods to better analyze controlled-access cases as well, ultimately advancing cancer research.

DNAnexus is grateful to the anonymous patients and our HGSC colleagues for providing this opportunity. We’re proud to participate in this and other innovative cancer data sharing initiatives, like the exciting public/private partnership with ITOMIC led by University of Washington’s Tony Blau, and projects unfolding within the Global Alliance for Genomics and Health. A copy of the open-access TCRB data, conditions of use, and the HGSC’s Mercury informatics pipeline is available now for DNAnexus Platform users.

CHARGE-ing Ahead After ASHG

AWS, HGSC & DNAnexus collaborationAfter last week’s ASHG frenzy, we could use a week off! But we’re so inspired by the positive response to the DNAnexus cloud computing platform that we are back in the office and digging right in.

Our big story at the conference was about our collaboration with Baylor’s Human Genome Sequencing Center (HGSC) and Amazon Web Services, enabling the largest genomic analysis project to have ever taken place in the cloud. Working together we proved the cloud’s efficacy for massive-scale data analysis for the Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) consortium by porting HGSC’s variant-calling Mercury pipeline onto our platform to interrogate more than 14,000 exomes and genomes. Not only is it a great example of how DNAnexus can be used, but also the Baylor scientists opted to make the pipeline available to all DNAnexus users at no extra cost.

Our ASHG workshop session focused on this HGSC case study, with speakers Jeff Reid from Baylor College of Medicine and our own Andreas Sundquist and Andrew Carroll sharing some technical details about the project. We want to thank all of the scientists who packed the workshop room and offered us valuable feedback on their own cloud computing needs.

Separately, Jeff Reid spoke about the Mercury pipeline and DNAnexus in a program session called “Mo’ Data, Mo’ Problems.” Jeff’s talk was well received (blowing up the #ASHG2013 Twitter feed last Friday beginning at 9:48am EST) and sparked great discussion around the need for the scientific community to embrace a centralized environment to enable collaboration on biological questions rather than on building siloed computational infrastructure. During Q&A, one scientist asked Jeff if a pipeline for RNA-seq was in the works, and he said that an RNA-seq parallel to the Mercury pipeline is currently being developed to port on to DNAnexus for public use.

We also want to thank everyone who made our experience at ASHG so rewarding, including all the scientists who stopped by our booth — many of whom were drawn in by our new Genomics Cloud Computing Infographic visualizing the details of the Baylor HGSC case study. We had great conversations with our visitors and came away with useful intel about how our platform-as-a-service can support other genomics industry needs.

If you missed ASHG or have more questions about CHARGE and how the Mercury pipeline can help you, check out this use case or read related news reports from FierceBiotechIT or Genomeweb.

Run the Mercury Variant-Calling Pipeline on Your Own Data

HGSC Baylor College of MedicineMercury, designed by the Human Genome Sequencing Center at Baylor College of Medicine (HGSC), is used as the core variant-calling pipeline for the CHARGE consortium. The Mercury pipeline is a semi-automated and modular set of tools for the analysis of NGS data in clinically focused studies. HGSC designed the pipeline to identify mutations from genomic data, setting the stage for determining the significance of these mutations as a cause of serious disease.

Thanks to HGSC’s work with us, the Mercury pipeline is now freely available to any DNAnexus user. The Mercury pipeline is located in the applets folder of the  HGSC_Mercury project. You can find the project, along with everything you need to run the applet, under the ‘Featured Projects’ section on your home page.  Login to DNAnexus or create an account today to get started immediately.

Inside the Mercury Project

  • Both whole genome and exome samples
  • All annotation and reference data required
  • Pre–configured workflow (just drag & drop your inputs)

Results from the Mercury pipeline will be made up of a set of annotated variants from your data sample. You’ll also see all of the biologically significant data that applies to the variants from the Baylor College of Medicine database, using their Cassandra annotation tool. You can easily visualize the mappings and variant calls within our integrated genome browser.