Bring on the New Reference Genome!

GRCh38Like many of our fellow genomics scientists, we are eager to see the much-anticipated new human reference genome. From the Genome Reference Consortium — which consists of the Wellcome Trust Sanger Institute, the Genome Institute at Washington University, the National Center for Biotechnology Information, and the European Bioinformatics Institute — the new GRCh38 reference is expected to be a significant upgrade.

The release has been delayed a bit due to some processing issues, but you can keep an eye out for the new reference via NCBI. Why all the fuss about the latest version? For one thing, it now adds modeled centromeres and novel sequences. Beyond that, it updates for individual bases and fixes for tiling path or assembly errors. (A great overview of GRCh38 is available in this presentation given by NCBI’s Deanna Church in a Cold Spring Harbor workshop this month.) The reference also takes advantage of data from the 1,000 Genomes Project to correct SNPs and indels and to capture decoy sequence.

Scientists across the community are anticipating a computational frenzy once the new reference is released. With all these updates, it’s only natural that researchers with human data sets will want to dust off their sequence data and realign them to the new reference to see what they missed. At the annual meeting of the American Society of Human Genetics in October, Jeff Reid from the Human Genome Sequencing Center at Baylor College of Medicine said he was “terrified” by the idea of how much simultaneous demand he expects for computational resources just from this reference release.

That would indeed be scary for researchers with access only to limited on-premises compute infrastructure. But this is the perfect type of project for elastic cloud computing! No need to stress local resources with a massive burst of intensive demand when you can easily run your reanalysis in the cloud using a platform such as DNAnexus. Our scientific and engineering teams are on standby; just think of us as an extension of your lab offering additional computational resources in a secure and clinically compliant environment.

So bring on GRCh38 — we are ready for it!

Run the Mercury Variant-Calling Pipeline on Your Own Data

HGSC Baylor College of MedicineMercury, designed by the Human Genome Sequencing Center at Baylor College of Medicine (HGSC), is used as the core variant-calling pipeline for the CHARGE consortium. The Mercury pipeline is a semi-automated and modular set of tools for the analysis of NGS data in clinically focused studies. HGSC designed the pipeline to identify mutations from genomic data, setting the stage for determining the significance of these mutations as a cause of serious disease.

Thanks to HGSC’s work with us, the Mercury pipeline is now freely available to any DNAnexus user. The Mercury pipeline is located in the applets folder of the  HGSC_Mercury project. You can find the project, along with everything you need to run the applet, under the ‘Featured Projects’ section on your home page.  Login to DNAnexus or create an account today to get started immediately.

Inside the Mercury Project

  • Both whole genome and exome samples
  • All annotation and reference data required
  • Pre–configured workflow (just drag & drop your inputs)

Results from the Mercury pipeline will be made up of a set of annotated variants from your data sample. You’ll also see all of the biologically significant data that applies to the variants from the Baylor College of Medicine database, using their Cassandra annotation tool. You can easily visualize the mappings and variant calls within our integrated genome browser.

News: DNAnexus Offers Service for Clinical Testing Labs

This week we’re pleased to announce the launch of our specialized platform-as-a-service (PaaS) for the next-generation sequencing-based diagnostics market.

What does that mean? With this service, we are focusing specifically on clinical enterprises that want to eliminate the common costs and challenges associated with building clinically compliant pipelines for next-generation sequencing data analysis.

As demand for genomic data in the clinic ramps up, diagnostics companies that have traditionally used on-premise solutions are facing increasing challenges to scale their resources while meeting HIPAA and other regulatory requirements.

Clinical testing labs that are reconsidering those on-premise data centers need more than just a cloud provider; they also need a partner with expertise in genomics and bioinformatics to help build reliable, robust pipelines. That’s where we come in: with this PaaS, DNAnexus offers all of our usual rock-solid cloud computing along with access to our team of expert scientists and engineers who know this field inside and out.

Through our PaaS, clinical testing labs will be able to scale up their enterprise infrastructures for analyzing and managing DNA data. We provide a configurable API-based platform that allows users to move their analysis pipelines into the cloud, where they can utilize their own algorithms as well as industry-recognized tools and resources to create customized workflows in a secure and compliant environment.

One of our early clinical adopters, cancer researcher Boris Bastian from the University of California, San Francisco School of Medicine, said in a statement: “Working with the DNAnexus team has been invaluable for us as we deploy our data analysis pipeline to the cloud and work toward a production-grade clinical test. The DNAnexus platform is well-suited for rapid pipeline development and enterprise-readiness. We have relied heavily on their expertise in cloud-based solutions and benefited from their experience in managing data in a clinically appropriate manner.”

As part of this offering, we have ensured that our platform meets HIPAA, CLIA, and many other regulatory requirements for users working with sensitive medical information.