Navigating the Exome with DNAnexus

With a growing number of targeted exome capture solutions being integrated into the next-generation sequencing workflow, targeted exome analysis has become the go-to and cost-effective approach for obtaining sequence coverage of protein-coding regions of the genome. As a result, researchers are starting projects with larger populations, with larger, more complicated datasets. One reason targeted exome capture is gaining steam is because whole-genome sequencing is still cost prohibitive for most researchers.

To support this important methodology we have added Exome Analysis to our repertoire of analyses tools. With the DNAnexus Exome Analysis method, we’ve simplified a critical step in the processing and analysis of these datasets. Using this analysis you can quickly analyze exome sequence data by determining whether regions of interest have been sequenced with sufficient coverage to allow for further analyses.

For each exon, DNAnexus reports on the number and fraction of bases covered by sequence reads, along with the average coverage within the exon. Exons that are overlapping genes in a gene annotation track are labeled with the gene name to allow for easy follow-on searching for exons from a gene of interest.

Next-gen sequencing and the cloud – revolutionary, or hype?

It’s been an exciting time for DNAnexus since launching our company at the recent Bio-IT World Conference & Expo in Boston. We’ve spoken with many of you about your experience using DNAnexus and received great feedback, much of which is already finding its way into future releases.

One thing that struck us at Bio-IT World was the pervasiveness of “cloud”: talks filled with discussions about experimenting with cloud, vendor exhibits praising the magic of the cloud, an entire pre-conference workshop dedicated to cloud computing, and a keynote presentation describing the awesomeness of Amazon Web Services by Deepak Singh. It seems the NIH is also aware of this trend, as the NHGRI recently held a workshop in DC to bring together researchers and thought leaders to discuss the impact cloud will have on genome informatics. One year ago, “cloud” wasn’t on the tip of everyone’s tongue like it is today. So is the excitement over cloud mostly hype?

There is certainly skepticism out there, and plenty of negative experiences. Vivien Marx wrote a great story in BioInform (Full disclosure: we were interviewed for the article) highlighting the ongoing debate over cloud computing, and gives examples of real problems people in the field have experienced. The challenges of using cloud are of course not unique to computational biology, and have been discussed for years, for example in this excellent report from the UC Berkeley RAD lab. The term “cloud” conjures up concerns about data transfer issues, security and control, platform lock-in, difficulty managing amorphous compute resources, the reliability of those resources, and over-crowding.

To address this skepticism, let’s first agree upon what we mean by “cloud” because the term is used by some to describe anything that runs in your web browser, while to others it’s just a fashionable marketing tool for IT infrastructure. Our definition for cloud is an elastic and scalable infrastructure for compute, storage, and networking. Elastic means that we can grow or shrink our use of those resources at any time. Scalable means there’s always room to grow your infrastructure. These two traits of cloud computing are incredibly powerful: Do you have 100 jobs to run? Launch 100 compute nodes and run them all in parallel. Pay the same as running them in a serial fashion, but finish in 1/100th the time. Need to store 10 Terabytes of data for a 6-month project? No problem, it’s available, just pay for 60 TB-months of storage. And when the day comes that you need to run 10,000 compute nodes or store 10 Petabytes of data, you don’t have to worry about building out a datacenter – the cloud will scale to those levels!

But as others have said, the cloud is not a utopia. It doesn’t magically support sequence analysis. It can be difficult to use, and your old applications generally won’t run in the cloud. But that’s because the cloud is not a solution in and of itself. It’s an infrastructure, or an engine that you can use to power your applications. And even if the cloud is like a super-charged V12 engine, it won’t take you anywhere by itself. To harness that energy you need to build a vehicle around the engine: the chassis, transmission, wheels, brakes, and steering wheel and console to present a user-friendly interface to the driver. Once you’ve built the car around the engine, suddenly it’s easy to use and hugely enabling.

DNAnexus’ use of the cloud mirrors this: we’ve built a web-based platform on top of the cloud to harness its power. All the sequence analysis and data management tools are available to you through your web browser, and we transparently manage all the cloud resources. Moving data around the cloud, figuring out where and how to store it reliably, launching compute nodes and coordinating their work – all this happens below the surface. We present an intuitive interface to you that removes all the challenges of using the cloud, while passing through all the benefits – tremendous scalability on-demand. Is it possible to build it without the cloud? Yes, but we wouldn’t be able to amortize the infrastructure costs over the thousands of people working with similar data. We wouldn’t be able to charge you a low per-sample cost.

So to answer the question: revolutionary or hype? It’s both. There’s a lot of hype, and as a result there’s understandably skepticism and disappointment. But once you go beyond that and look at the technology it enables, it’s truly revolutionary. DNAnexus’ goal is not to promote the hype. Our goal is to solve the next-gen sequencing data bottleneck. And we happen to use the cloud as a key component of our platform to solve it. As sequencing growth continues to outpace Moore’s law, you can be sure that your need for compute infrastructure will grow tremendously. We’re here to make that growth as painless and cost-effective as possible.

Take a look for yourself. Sign up for a free account today and tell us what you think. Is the cloud hype? Or is it an innovative approach to next-gen DNA sequence analysis and data management?

Introducing DNAnexus

We are proud to welcome you to DNAnexus, a new paradigm in DNA sequence analysis. As the race to reduce the cost of sequencing continues, it’s become clear that the next bottleneck standing in the way of cheap, ubiquitous sequencing will not be the sequencing throughput or reagent cost—it will be the analysis and management of that data. This is our mission: to eliminate the sequence analysis bottleneck.

To achieve this goal, we’ve built the foundations of a compute platform that will change how people approach next-gen sequence analysis. We believe the power of next-gen sequencing should be accessible to everyone. A bioinformatics PhD should not be required. You should not have to purchase a computer cluster, disk arrays, or hire IT staff to configure them. You should not have to install or configure software, nor should you have to manage complicated data transfers or manipulations. We’re going to change that by leveraging cloud computing and Web 2.0 technologies to solve your sequence analysis problems, so you can instead focus on the science.

As next-gen sequencing evolves from a highly specialized field of work to a more general-purpose tool used in the life sciences, we need to free people from the burden of managing Gigabases and data pipelines. We need to elevate researchers to a higher level of abstraction, to think in terms of genes, interesting loci, and what the experiments tell us about the biology. Storing your sequence data in the cloud and doing all your management and analysis there is the first step in this process.

Who are we? We’re scientists and users of next-gen sequencing technologies, frustrated with the state of sequence analysis, just like you. Founded out of Stanford University, we’ve picked up some amazing people from places like MIT, UCSC, Berkeley, and UCSD, and we’re building a top-notch team intent on growing a powerful sequence analysis ecosystem around

If this resonates with you, join us! Sign up for a free account. (Or apply for a job!) Play with it, upload data, examine the data we’ve pre-loaded. Tell us what we should add, and what needs fixing. Help us help you, because if we create something truly valuable for you, then we are succeeding in our mission. We’re releasing the first version of today, featuring functional genomics analyses, and direct-upload capability from sequencing machines. But rest assured, we will be adding new features rapidly, and improving your experience.