Case Study: Trio Analysis with Sentieon Rapid DNAseq on DNAnexus

Editor’s Note: This blog post is written by Don Freed, Bioinformatics Scientist at Sentieon. Email him at don.freed@sentieon.com. 

Introduction

At Sentieon we work hard to create the most efficient, accurate and robust tools for variant calling. Thanks to our partnership with DNAnexus, we are sharing the benefits of this hard work with you.  Through April 7th, we are offering license-free access to Sentieon pipelines on DNAnexus, request access today to see how using Sentieon DNAseq you can obtain identical results to GATK at a fraction of the cost. In addition, Sentieon’s variant calling is deterministic; given identical input data, Sentieon will always call the same set of variants. Utilizing Sentieon’s tools on the DNAnexus Platform, clinicians and researchers can perform accurate and cost-effective analysis of petabyte-scale datasets with ease, seamlessly running analyses of an arbitrary number of samples simultaneously in the cloud.

Many of our customers use Sentieon tools to call variants from human samples. The typical Human genome contains some 4.5 million variants relative to the Human reference genome. While almost all of these variants are inherited, every individual has approximately 50 de novo variants, which occur uniquely in their genome. De novo variants are some of the most interesting genetic variants to study, they frequently cause rare sporadic diseases such as KBG syndrome, and have been implicated in complex disorders such as autism.

In this post, we’ll demonstrate the power of running Sentieon tools on DNAnexus by performing alignment with BWA, duplicate removal, base-quality score recalibration, indel realignment, haplotype-based variant calling and joint genotyping of a 30x whole-genome trio. Using these data, it is possible to identify de novo variants, the parental origin of some interesting inherited mutations, and examine the carrier status of this individual for rare recessive mutations. With the Rapid DNAseq app on DNAnexus, processing an entire trio takes about an hour. Whether you have a cohort of three or 3,000, by leveraging the power of the DNAnexus Platform and the scalability of the cloud, any size cohort can be processed incredibly fast.

Running analyses on DNAnexus

For this trio analysis, we used data from the Illumina Platinum Genomes dataset for individuals NA12878, NA12891, and NA12892 downsampled to 30x. The original fastq files can be found at the European Nucleotide Archive. To process the data, we used the Sentieon rapid DNAseq app on DNAnexus. We called variants in GVCF mode and input the gVCF files into the Sentieon GVCFtyper resulting in a single multi-sample VCF file for the entire trio. We easily accomplished this by using the DNAnexus workflow shown below.

In total the analysis took just 73 minutes.

We performed the same analysis with the original 50x dataset in one hour and 46 minutes. Runtimes scale approximately linearly to the input coverage.

We identified 2,458 de novo mutations in NA12878, well above the expected 50, although this increase has been previously attributed to primary cell somatic mutations or mutations introduced during immortalization and subsequent passage of the sequenced cells. We can see that NA12878 is heterozygous for both rs2472297 and rs6968865, which have been associated with increased coffee consumption.

Utilizing the DNAnexus cloud-based platform and Sentieon tools, our rapid DNAseq and joint genotyping runtimes easily scale to thousands of samples. You can view everything we ran in this public project: Rapid trio genotyping.

Register here for a free trial of the rapid DNAseq tool.

DNAnexus Not Impacted by Cloudflare Information Leak (“Cloudbleed”)

A serious bug within the code running on Cloudflare edge servers may have leaked sensitive data from a large number of websites over many months. First, and most importantly, the DNAnexus Platform has not been impacted by this incident and no DNAnexus user data has been leaked.

Cloudflare provides Content Distribution Network (CDN) services, which enable providers of web content to enhance user experience by caching web content on edge servers geographically proximate to the web client. As part of a shared service, each edge server presents web content from multiple Cloudflare customers.

The bug led to a condition whereby the edge servers were returning content entirely unrelated to the requested web content, and that leaked content contained unencrypted private information such as HTTP cookies, authentication tokens, HTTP POST bodies, and other sensitive data. Search engines subsequently crawled and cached this leaked content, enabling it to be searched. For example, a web request to a ride sharing service could have resulted in leaked content being returned from a dating service.

DNAnexus uses the Cloudflare CDN service only to accelerate serving of public web content, such as web site images, help text, and html/css. DNAnexus does not serve any credentials, tokens, nor user data via the CDN and thus DNAnexus users are not impacted by this bug, and no DNAnexus user information has been leaked.

DNAnexus users do not need to change their DNAnexus password, unless they use similar passwords for other websites that were affected. We strongly recommend that users always choose a unique password for their DNAnexus account and that they configure their account to use two-factor authentication as described in the DNAnexus wiki documentation.

If you have any questions about your account, please contact our customer support team at support@dnanexus.com.

Innovation Fueled by Collaboration and Regulatory Science

In mid-2015 the Food and Drug Administration’s (FDA) Office of Health Informatics awarded DNAnexus a research and development contract to build precisionFDA, an online, cloud-based platform for sharing genomic information. Since its launch, more than 2,000 members of the next-generation sequencing (NGS) community have contributed to this resource by sharing and comparing biomedical data, software tools, and testing methodologies.

It falls under the responsibility of the FDA to ensure new medical treatments and tests meet a high standard for safety and efficacy, while working to get advances to market as quickly as possible. Following the announcement of President Obama’s Precision Medicine Initiative, the genomics community saw an increase in the use of NGS-based technologies in diagnostics, yet no standardized way to evaluate the accuracy of those tests. If new diagnostics were to be developed based on the broad applications of NGS, the approaches needed to be understood, and proved reliable, before they could be applied in clinical contexts.

The FDA took a forward-thinking approach to the regulation of genomic-based technologies and sponsored the development of the precisionFDA platform to, in the words of FDA leaders, “foster innovation and develop regulatory science around NGS tests,” and accelerate the implementation of precision medicine. Instead of government regulators establishing and imposing a set of performance standards for NGS tests with the typical top-down approach, precisionFDA seeks to empower the genomics community to develop regulatory science, through a collaborative and secure online platform.

The collaborative nature of precisionFDA lets researchers perform analyses on the same datasets, compare approaches, figure out what is successful, and determine where refinements can be made. The platform provides a flexible environment for test developers to leverage the findings from these collaborations to evaluate the accuracy and reproducibility of NGS analysis workflows, and share those results with the FDA and the rest of the community. The power of this approach is that the FDA remains at the epicenter of ongoing discussions, enabling the community to continue innovating, while keeping a pulse on the rapidly evolving genomics research space.

Robert Califf, former FDA Commissioner, penned an op-ed piece on his way out of office: How The FDA Will Help Lead the Next Medical Revolution. Califf believes that with precisionFDA, the agency can simultaneously meet the goals of protecting patients and advancing genomic medicine. Regulatory oversight can often be seen as a hindrance to innovation in healthcare, but the former commissioner believes that with this novel approach to regulation, the FDA will play a big role in realizing the potential of basing an individual’s’ treatment plan on their unique characteristics and genetic profile.

PrecisionFDA was founded upon the principles of collaboration and creating networks of stakeholders from industry, academia, and government. This platform is a successful example of how innovative regulation can spur progress by giving the key community stakeholders the ability to work together to define regulatory science.

In recent years, improvements in NGS technology have enhanced our ability to interrogate the human genome with high-specificity and bring those insights together with clinical patient data, which has pushed us closer to delivering on the promise of precision medicine. In order to keep pace with these technological advancements, it is crucial to harness the network effect of scientific collaboration. By empowering the community members with regulatory input, innovation can be stimulated instead of suppressed, and these innovations in turn will improve upon the quality of genomic tests and lead to advancements in health outcomes for patients.

George Asimenos, VP at DNAnexus will be presenting on precisionFDA at Molecular Med Tri-Conference in San Francisco as part of the Best Practice in Personalized and Translational Medicine short course. Hear the presentation Monday February 20th from 8am-11am.

Learn more and get involved at precision.fda.gov.