CHARGE-ing Ahead After ASHG

AWS, HGSC & DNAnexus collaborationAfter last week’s ASHG frenzy, we could use a week off! But we’re so inspired by the positive response to the DNAnexus cloud computing platform that we are back in the office and digging right in.

Our big story at the conference was about our collaboration with Baylor’s Human Genome Sequencing Center (HGSC) and Amazon Web Services, enabling the largest genomic analysis project to have ever taken place in the cloud. Working together we proved the cloud’s efficacy for massive-scale data analysis for the Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) consortium by porting HGSC’s variant-calling Mercury pipeline onto our platform to interrogate more than 14,000 exomes and genomes. Not only is it a great example of how DNAnexus can be used, but also the Baylor scientists opted to make the pipeline available to all DNAnexus users at no extra cost.

Our ASHG workshop session focused on this HGSC case study, with speakers Jeff Reid from Baylor College of Medicine and our own Andreas Sundquist and Andrew Carroll sharing some technical details about the project. We want to thank all of the scientists who packed the workshop room and offered us valuable feedback on their own cloud computing needs.

Separately, Jeff Reid spoke about the Mercury pipeline and DNAnexus in a program session called “Mo’ Data, Mo’ Problems.” Jeff’s talk was well received (blowing up the #ASHG2013 Twitter feed last Friday beginning at 9:48am EST) and sparked great discussion around the need for the scientific community to embrace a centralized environment to enable collaboration on biological questions rather than on building siloed computational infrastructure. During Q&A, one scientist asked Jeff if a pipeline for RNA-seq was in the works, and he said that an RNA-seq parallel to the Mercury pipeline is currently being developed to port on to DNAnexus for public use.

We also want to thank everyone who made our experience at ASHG so rewarding, including all the scientists who stopped by our booth — many of whom were drawn in by our new Genomics Cloud Computing Infographic visualizing the details of the Baylor HGSC case study. We had great conversations with our visitors and came away with useful intel about how our platform-as-a-service can support other genomics industry needs.

If you missed ASHG or have more questions about CHARGE and how the Mercury pipeline can help you, check out this use case or read related news reports from FierceBiotechIT or Genomeweb.

Run the Mercury Variant-Calling Pipeline on Your Own Data

HGSC Baylor College of MedicineMercury, designed by the Human Genome Sequencing Center at Baylor College of Medicine (HGSC), is used as the core variant-calling pipeline for the CHARGE consortium. The Mercury pipeline is a semi-automated and modular set of tools for the analysis of NGS data in clinically focused studies. HGSC designed the pipeline to identify mutations from genomic data, setting the stage for determining the significance of these mutations as a cause of serious disease.

Thanks to HGSC’s work with us, the Mercury pipeline is now freely available to any DNAnexus user. The Mercury pipeline is located in the applets folder of the  HGSC_Mercury project. You can find the project, along with everything you need to run the applet, under the ‘Featured Projects’ section on your home page.  Login to DNAnexus or create an account today to get started immediately.

Inside the Mercury Project

  • Both whole genome and exome samples
  • All annotation and reference data required
  • Pre–configured workflow (just drag & drop your inputs)

Results from the Mercury pipeline will be made up of a set of annotated variants from your data sample. You’ll also see all of the biologically significant data that applies to the variants from the Baylor College of Medicine database, using their Cassandra annotation tool. You can easily visualize the mappings and variant calls within our integrated genome browser.

See You at ASHG — Lunch Is on Us!

We are looking forward to catching up with you at ASHG. You can stop by booth #915 to check out the latest and get a demo of DNAnexus, or join our lunch workshop on Wednesday to hear about a massive-scale genomic study that performed its analysis in our cloud environment.

Jeffrey Reid HGSCAt the workshop, you’ll hear from Jeffrey Reid, assistant professor at the Baylor College of Medicine’s Human Genome Sequencing Center (HGSC), who will talk about his experience using our computing solution for the Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortium, a group of more than 300 scientists. Dr. Reid will discuss the technical and practical challenges related to the analysis of over 14,000 genomes or exomes, such as meeting peak compute demand, enabling secure access for multiple researchers and sharing tools and results across the entire consortium.

At the project’s peak, HGSC was able to spin up over 20,000 AWS cores on-demand in order to run the analysis pipeline of the CHARGE data. During this period, HGSC was running one of the largest genomics analysis clusters in the world!

We’ll also be hosting a CHARGE Q&A Hour at our booth (#915) on Thursday, October 24th, from 11am to noon. Stop by if you’d like to learn more about the CHARGE project; folks from the HGSC and DNAnexus will be on hand to answer any questions.

Workshop details

• Wednesday, October 23rd, 12:30 – 2:00 pm
• Room 209, Level 2, Convention Center
• Boxed lunch & refreshments will be provided

Agenda

Realizing the Full Potential of Mega-Scale Cohort Analysis in the Cloud

Introduction
Andreas Sundquist, PhD, CTO & Co-founder, DNAnexus

The Nuts & Bolts of Enabling Ultra Large-Scale Genomic Analysis in the Cloud
Andrew Carroll, PhD, Scientist, DNAnexus

Vacation Slides from Bespin: A Guided Tour Through Large-Scale Genomic Analysis in the Cloud
Jeffrey Reid, PhD, Research Assistant Professor, BCM Human Genome Sequencing Center