On DNA Day, We’re Thinking About (What Else?) Data

Today is DNA Day! This year it’s an especially big deal as we’re honoring the 60th anniversary of Watson and Crick’s famous discovery of the double-helix structure of DNA as well as the 10th anniversary of the completion of the Human Genome Project.

DNAnexusBack when Watson and Crick were poring over Rosalind Franklin’s DNA radiograph, they never could have imagined the data that would ultimately be generated by scientists reading the sequence of those DNA molecules. Indeed, even 40 years later at the start of the HGP, the data requirements for processing genome sequence would have been staggering to consider.

Check out this handy guide from the National Human Genome Research Institute presenting statistics from the earliest HGP days to today. In 1990, GenBank contained about 49 megabases of sequence; today, that has soared to some 150 terabases. The computational power needed to tackle this amount of genomic data didn’t even exist when the HGP got underway. Consider what kind of computer you were using in 1990: for us, that brings back fond memories of the Apple IIe, mainframes, and the earliest days of Internet (brought to us by Prodigy).

A couple of decades later, we have a far better appreciation for the elastic compute needs for genomic studies. Not only do scientists’ data needs spike and dip depending on where they are in a given experiment, but we all know that the amount of genome data being produced globally will continue to skyrocket. That’s why cloud computing has become such a popular option for sequence data analysis, storage, and management — it’s a simple way for researchers who don’t have massive in-house compute resources to go about their science without having to spend time thinking about IT.

So on DNA Day, we honor those pioneers who launched their unprecedented studies with a leap of faith: that the compute power they needed would somehow materialize in the nick of time. Fortunately, for all of us, that was a gamble that paid off!

Dispelling the Myths of the Cloud

cloud computing in genomicsWhat comes to mind when you hear the word “cloud”? Does the Amazon cloud immediately pop into your head?

Despite the cloud’s widespread recognition in the media, many are still uncertain about the benefits of cloud computing.  In a recent national survey, 95% of respondents who claimed they have never used the cloud actually have. In fact, they do so unwittingly nearly everyday via online banking and shopping, social networking, emailing etc.

Some scientists still seem skeptical of the cloud’s place in next-generation sequencing. If you tend to gravitate to skepticism, please read my article, Dispelling the Myths of the Cloud for the Skeptical Scientist, on BitesizeBio.com.

It provides an overview of how the cloud can be useful to scientists in a multitude of ways, such as infinite scalability, enabling instant and limitless access to storage and compute resources, eliminating up-front commitment to expensive hardware. Another advantage, Data security, is the cloud’s core competency, which cannot be measured up to any other internal infrastructure.

The “cloud” may be generating a lot of buzz in the NGS community, but is it worthy of all the hype? It appears that all signs point to yes.

Happy Holidays to All!

happy holidays

As 2012 winds down, we wanted to take a moment to wish everyone a happy and healthy holiday season. Before we get swept up in the frenzy of a new year, we’ll take this opportunity to look back at what 2012 brought us.

The year kicked off with all sorts of excitement at the JP Morgan Healthcare Conference, during which Life Tech’s Ion Torrent announced that it would unveil a platform that could sequence a full human genome in a single day — for the bargain price of $1,000. While we haven’t heard of anyone achieving that $1,000 milestone just yet, being so tantalizingly close to the target spurred a whole new conversation this year around the informatics implications of it. What will it take to get the genome analysis and interpretation step down to $1,000? We haven’t heard a concrete plan yet, but we’re pleased to see that the discussion is focused on analysis, since that will be critical in getting genomics into the clinic.

Here at DNAnexus, 2012 was a big year thanks to ever-increasing activity from our users. In the past year, our users processed and analyzed more data than they did since the launch in early 2010 (2010 and 2011 combined). It’s great to see that people are coming back again and again — and that they’re bringing more data when they do. This has further been exemplified by the fact that our users have also been busily publishing this year, as we noted recently in a round-up of recent papers using DNAnexus.

Overall, we’ve seen a surge of interest in cloud-based solutions for genomic data; we think that’s partly due to the increasing sequencing capacity as sequencing platforms become more affordable, and partly to people growing more comfortable with the cloud’s usefulness and security. At conferences we attended throughout the year — from AGBT and ABRF this spring to Bio-IT World and ESHG in the summer to Beyond the Genome and ASHG this fall, among others — we saw more vendors joining the space, increasing interest from scientists, and more posters and papers from researchers who have used cloud computing for their genomic projects.

Add this all up, it’s clear that we’re at the beginning of a golden age for genomic analysis and interpretation. We are thrilled to be part of it, and more excited that we’ll soon be launching a new platform catering to bioinformaticians and computational scientists in 2013. In the meantime, enjoy the holidays!