Like many in the systems biology space, we have been longtime fans of Philip Bourne’s Ten Simple Rules articles since the first one was published in PLoS Computational Biology back in 2005. (“Ten Simple Rules for Getting Published,” October 2005.)
The latest installment is especially near and dear to us at DNAnexus: “Ten Simple Rules for Reproducible Computational Research,” written by Geir Kjetil Sandve, Anton Nekrutenko, James Taylor, and Eivind Hovig. (And edited by Bourne, of course.) The writers begin with the premise that there is a growing need in the community for standards around reproducibility in research, noting that negative trends in paper retractions, clinical trial failures, and papers omitting necessary experimental details have been getting more attention lately.
“This has led to discussions on how individual researchers, institutions, funding bodies, and journals can establish routines that increase transparency and reproducibility,” Sandve et al. write. “In order to foster such aspects, it has been suggested that the scientific community needs to develop a ‘culture of reproducibility’ for computational science, and to require it for published claims.”
The rules begin with the lessons you learned when you got your first lab notebook — “Rule 1: For Every Result, Keep Track of How It Was Produced” — and progress to more complex mandates — “Rule 6: For Analyses That Include Randomness, Note Underlying Random Seeds.”
What really stood out for us was that all of these guidelines are addressed by best practices in cloud computing. For example, when we built our new platform, we implemented strict procedures to ensure auditability of data — the system automatically tracks what you did to get a result, ensures version control, serves as an archive of the exact analytical process you used, and stores the raw data underlying analyses. Utilizing a cloud-based pipeline also offers true reproducibility because you can always perform the same analysis again (using the specific version of your pipeline) or make your pipeline publicly accessible, granting anyone else the ability to rerun the analysis.
Be sure to check out all 10 rules, and feel free to take a tour of the DNAnexus platform to see how it can help you achieve reproducibility in your own computational research.