DNA Day: Celebrating the Decades-long Unraveling of DNA

DNA day 2014

April 25th marks DNA Day. It’s on the calendar today to commemorate the 1953 publication of the structure of DNA by Francis Crick, James Watson, and others — but really, it’s a time to think about this special molecule and its role in what we do.

Just a decade after the completion of the Human Genome Project, it is truly amazing that it’s possible not only to sequence a human genome or exome, but to do so thousands of times and compare the results for a better understanding of human genetic variation. Studies comparing a few thousand exomes are becoming routine, and we are even seeing studies with 10,000-plus exomes. (Check out our Baylor CHARGE project as an example.)

These massive research and clinical efforts will be necessary to truly parse the meaning of our DNA. We are proud to be part of the community working on this challenge by providing cloud-based computational resources that let scientists run enormous analyses without crushing their local infrastructure. We also host great tools in our platform, so researchers have the option of porting in their own favorite pipelines or using our tools to create plug-and-play workflows in the cloud. And our insistence on enterprise-grade security means that you don’t have to worry about keeping your data safe; we take care of that for you. That’s cause for everyone to be optimistic about the future of these studies on DNA Day.

In case you were wondering, our data consumption isn’t just limited to As, Cs, Ts, and Gs. This week our team had fun looking back at the impressive results of writing contests associated with past DNA Days. Check out last year’s winners of the essay contest sponsored by the American Society of Human Genetics or by the European Society of Human Genetics (both groups are expected to announce this year’s winners today). And if you only have a coffee break to do some link-surfing, don’t miss these winning haikus from previous DNA Day poetry contests. Here’s our favorite:

A spiral staircase
Each step makes you what you are
But not who you are

One Simple Solution for Ten Simple Rules

plos computational biologyLike many in the systems biology space, we have been longtime fans of Philip Bourne’s Ten Simple Rules articles since the first one was published in PLoS Computational Biology back in 2005. (“Ten Simple Rules for Getting Published,” October 2005.)

The latest installment is especially near and dear to us at DNAnexus: “Ten Simple Rules for Reproducible Computational Research,” written by Geir Kjetil Sandve, Anton Nekrutenko, James Taylor, and Eivind Hovig. (And edited by Bourne, of course.) The writers begin with the premise that there is a growing need in the community for standards around reproducibility in research, noting that negative trends in paper retractions, clinical trial failures, and papers omitting necessary experimental details have been getting more attention lately.

“This has led to discussions on how individual researchers, institutions, funding bodies, and journals can establish routines that increase transparency and reproducibility,” Sandve et al. write. “In order to foster such aspects, it has been suggested that the scientific community needs to develop a ‘culture of reproducibility’ for computational science, and to require it for published claims.”

The rules begin with the lessons you learned when you got your first lab notebook — “Rule 1: For Every Result, Keep Track of How It Was Produced” — and progress to more complex mandates — “Rule 6: For Analyses That Include Randomness, Note Underlying Random Seeds.”

What really stood out for us was that all of these guidelines are addressed by best practices in cloud computing. For example, when we built our new platform, we implemented strict procedures to ensure auditability of data — the system automatically tracks what you did to get a result, ensures version control, serves as an archive of the exact analytical process you used, and stores the raw data underlying analyses. Utilizing a cloud-based pipeline also offers true reproducibility because you can always perform the same analysis again (using the specific version of your pipeline) or make your pipeline publicly accessible, granting anyone else the ability to rerun the analysis.

Be sure to check out all 10 rules, and feel free to take a tour of the DNAnexus platform to see how it can help you achieve reproducibility in your own computational research.

At Beyond the Genome Conference, Lessons on Data Analysis and Clinical Studies


A few of us from DNAnexus had the privilege of attending Beyond the Genome 2012, a conference organized by BioMedCentral and held at Harvard Medical School. The meeting, now in its third year, continued its trend of attracting top-notch speakers, including keynotes from Baylor’s Richard Gibbs and Stuart Schreiber from the Broad Institute.


From the first speaker, Gabor Marth from Boston College, it became clear that one of the major hurdles now facing scientists was not DNA sequencing, as has been true in years past, but processing the data. This has led to a situation where many groups are writing their own algorithms to perform the same functions — a widely recognized problem in allocating resources in the most productive way. Scientists encouraged each other to stop reinventing the wheel, and also to ensure that bioinformatics tools can be used and reported on easily by biologists. That message resonated with us, as we have long championed the concept of a central data resource where excellent algorithms would be accessible to anybody. It’s gratifying to see that the same principle is gaining acceptance throughout academia as easy-to-use, reproducible data analysis becomes the real challenge in the sequencing process.



We also saw a string of fantastic talks on clinical sequencing. Sharon Plon from Baylor gave a very insightful “lessons learned” talk about their first year of clinical exome sequencing. The biggest pain point in the process was not sequencing, data analysis, insurance reimbursement, or finding patients in need; it was figuring out what to report to patients and how to do it. This underscores the need to bring genetic counselors, ethicists, and doctors into the conversation early to give guidance on what until recently has been a purely research-based endeavor. Dr. Plon and Joris Veltman from Radboud University presented several amazing case studies where sequencing had identified the cause of disease and allowed the patient to make steps to improve their lives, as well as informing the family about risk of recurrence.  We look forward to hearing many more success stories.


Of course, cancer studies were a noteworthy trend at the conference. We heard research on cancer genome evolution, epigenetic modification, sifting causative mutations from neutral, and the general effects of genome organization in three dimensions. But it was clear that integrating the information that’s being generated from all these techniques will be a big challenge. To get even deeper insights into human cancers, we’ll need to bring together the computational tools that we’ve already built and also bring together people from different scientific, medical, and social disciplines to apply that information intelligently. The good news is that this is already starting to happen, and we at DNAnexus are excited to be in a position to offer help as this approach gains traction.