2014: The Year of the Cloud


The Chinese New Year is almost upon us — and the Year of the Horse has us thinking about what 2014 will bring to the world of DNA sequencing. We believe that this will turn out to be the year of cloud computing. Here are a Chinese New Yearfew of the trends that we’re watching:

Availability of large-scale genome studies. At one point, the 1000 Genomes Project was operating on a scale all its own. Today, many organizations are participating in large-scale sequencing projects to study thousands or even millions of people. As that data makes its way into the public realm, the demand for computational resources will soar. Accessing, querying, and manipulating these data sets will present a real challenge to IT teams with bursty episodes of unusually high demand mixed with the regular stream they normally see. That’s precisely the kind of environment where cloud computing makes the most sense: having unlimited on-demand compute resources allows IT teams to meet any infrastructure needs without having to spend the money on scaling up internal resources.

The new human reference genome. The Genome Reference Consortium has released build 38 of the human genome (known as GRCh38). This is a major improvement over the last build. Once the reference has been fully annotated, scientists around the world will want to dust off their existing human data sets and realign them to the updated reference to see if there are any new insights to be had. That will mean a short-term, high-intensity spike in demand for computational resources as these massive alignments are processed — in other words, the perfect occasion to try out cloud computing. It’s the cheapest possible way to add extensive computational resources without the long-term commitment to on-premises infrastructure.

Sequencing costs keep falling. The massive genomic studies underway have all been enabled by the rapidly falling cost of DNA sequencing — a trend that promises to continue, thanks to Illumina’s recent announcement and efforts from startups still working to commercialize innovative new methods for sequencing. As sequencing a genome gets ever more affordable, demand for the resources to process and analyze that data will grow at a faster and faster pace. Trying to keep up with this demand will be an uphill battle for IT teams focused only on internal infrastructure, so we see this leading to interest in how cloud computing can help relieve the pressure from those teams to add boxes and storage components.

Growing number of analysis apps. The ecosystem of available tools for performing specific steps or types of DNA analysis is expanding rapidly. As scientists and bioinformaticians find a growing need to build pipelines utilizing a number of these tools, the ease of doing so in a cloud environment will make this option even more appealing.

Here at DNAnexus, we’re eager for what’s to come in 2014. We have a number of collaborations underway with academic and commercial R&D organizations, and we look forward to sharing details about them with our blog readers in the months ahead. Here’s to the Year of the Cloud and a great and productive year for the biomedical community!

Bring on the New Reference Genome!

GRCh38Like many of our fellow genomics scientists, we are eager to see the much-anticipated new human reference genome. From the Genome Reference Consortium — which consists of the Wellcome Trust Sanger Institute, the Genome Institute at Washington University, the National Center for Biotechnology Information, and the European Bioinformatics Institute — the new GRCh38 reference is expected to be a significant upgrade.

The release has been delayed a bit due to some processing issues, but you can keep an eye out for the new reference via NCBI. Why all the fuss about the latest version? For one thing, it now adds modeled centromeres and novel sequences. Beyond that, it updates for individual bases and fixes for tiling path or assembly errors. (A great overview of GRCh38 is available in this presentation given by NCBI’s Deanna Church in a Cold Spring Harbor workshop this month.) The reference also takes advantage of data from the 1,000 Genomes Project to correct SNPs and indels and to capture decoy sequence.

Scientists across the community are anticipating a computational frenzy once the new reference is released. With all these updates, it’s only natural that researchers with human data sets will want to dust off their sequence data and realign them to the new reference to see what they missed. At the annual meeting of the American Society of Human Genetics in October, Jeff Reid from the Human Genome Sequencing Center at Baylor College of Medicine said he was “terrified” by the idea of how much simultaneous demand he expects for computational resources just from this reference release.

That would indeed be scary for researchers with access only to limited on-premises compute infrastructure. But this is the perfect type of project for elastic cloud computing! No need to stress local resources with a massive burst of intensive demand when you can easily run your reanalysis in the cloud using a platform such as DNAnexus. Our scientific and engineering teams are on standby; just think of us as an extension of your lab offering additional computational resources in a secure and clinically compliant environment.

So bring on GRCh38 — we are ready for it!