Case Study: Trio Analysis with Sentieon Rapid DNAseq on DNAnexus

Editor’s Note: This blog post is written by Don Freed, Bioinformatics Scientist at Sentieon. Email him at don.freed@sentieon.com. 

Introduction

At Sentieon we work hard to create the most efficient, accurate and robust tools for variant calling. Thanks to our partnership with DNAnexus, we are sharing the benefits of this hard work with you.  Through April 7th, we are offering license-free access to Sentieon pipelines on DNAnexus, request access today to see how using Sentieon DNAseq you can obtain identical results to GATK at a fraction of the cost. In addition, Sentieon’s variant calling is deterministic; given identical input data, Sentieon will always call the same set of variants. Utilizing Sentieon’s tools on the DNAnexus Platform, clinicians and researchers can perform accurate and cost-effective analysis of petabyte-scale datasets with ease, seamlessly running analyses of an arbitrary number of samples simultaneously in the cloud.

Many of our customers use Sentieon tools to call variants from human samples. The typical Human genome contains some 4.5 million variants relative to the Human reference genome. While almost all of these variants are inherited, every individual has approximately 50 de novo variants, which occur uniquely in their genome. De novo variants are some of the most interesting genetic variants to study, they frequently cause rare sporadic diseases such as KBG syndrome, and have been implicated in complex disorders such as autism.

In this post, we’ll demonstrate the power of running Sentieon tools on DNAnexus by performing alignment with BWA, duplicate removal, base-quality score recalibration, indel realignment, haplotype-based variant calling and joint genotyping of a 30x whole-genome trio. Using these data, it is possible to identify de novo variants, the parental origin of some interesting inherited mutations, and examine the carrier status of this individual for rare recessive mutations. With the Rapid DNAseq app on DNAnexus, processing an entire trio takes about an hour. Whether you have a cohort of three or 3,000, by leveraging the power of the DNAnexus Platform and the scalability of the cloud, any size cohort can be processed incredibly fast.

Running analyses on DNAnexus

For this trio analysis, we used data from the Illumina Platinum Genomes dataset for individuals NA12878, NA12891, and NA12892 downsampled to 30x. The original fastq files can be found at the European Nucleotide Archive. To process the data, we used the Sentieon rapid DNAseq app on DNAnexus. We called variants in GVCF mode and input the gVCF files into the Sentieon GVCFtyper resulting in a single multi-sample VCF file for the entire trio. We easily accomplished this by using the DNAnexus workflow shown below.

In total the analysis took just 73 minutes.

We performed the same analysis with the original 50x dataset in one hour and 46 minutes. Runtimes scale approximately linearly to the input coverage.

We identified 2,458 de novo mutations in NA12878, well above the expected 50, although this increase has been previously attributed to primary cell somatic mutations or mutations introduced during immortalization and subsequent passage of the sequenced cells. We can see that NA12878 is heterozygous for both rs2472297 and rs6968865, which have been associated with increased coffee consumption.

Utilizing the DNAnexus cloud-based platform and Sentieon tools, our rapid DNAseq and joint genotyping runtimes easily scale to thousands of samples. You can view everything we ran in this public project: Rapid trio genotyping.

Register here for a free trial of the rapid DNAseq tool.

Innovation Fueled by Collaboration and Regulatory Science

In mid-2015 the Food and Drug Administration’s (FDA) Office of Health Informatics awarded DNAnexus a research and development contract to build precisionFDA, an online, cloud-based platform for sharing genomic information. Since its launch, more than 2,000 members of the next-generation sequencing (NGS) community have contributed to this resource by sharing and comparing biomedical data, software tools, and testing methodologies.

It falls under the responsibility of the FDA to ensure new medical treatments and tests meet a high standard for safety and efficacy, while working to get advances to market as quickly as possible. Following the announcement of President Obama’s Precision Medicine Initiative, the genomics community saw an increase in the use of NGS-based technologies in diagnostics, yet no standardized way to evaluate the accuracy of those tests. If new diagnostics were to be developed based on the broad applications of NGS, the approaches needed to be understood, and proved reliable, before they could be applied in clinical contexts.

The FDA took a forward-thinking approach to the regulation of genomic-based technologies and sponsored the development of the precisionFDA platform to, in the words of FDA leaders, “foster innovation and develop regulatory science around NGS tests,” and accelerate the implementation of precision medicine. Instead of government regulators establishing and imposing a set of performance standards for NGS tests with the typical top-down approach, precisionFDA seeks to empower the genomics community to develop regulatory science, through a collaborative and secure online platform.

The collaborative nature of precisionFDA lets researchers perform analyses on the same datasets, compare approaches, figure out what is successful, and determine where refinements can be made. The platform provides a flexible environment for test developers to leverage the findings from these collaborations to evaluate the accuracy and reproducibility of NGS analysis workflows, and share those results with the FDA and the rest of the community. The power of this approach is that the FDA remains at the epicenter of ongoing discussions, enabling the community to continue innovating, while keeping a pulse on the rapidly evolving genomics research space.

Robert Califf, former FDA Commissioner, penned an op-ed piece on his way out of office: How The FDA Will Help Lead the Next Medical Revolution. Califf believes that with precisionFDA, the agency can simultaneously meet the goals of protecting patients and advancing genomic medicine. Regulatory oversight can often be seen as a hindrance to innovation in healthcare, but the former commissioner believes that with this novel approach to regulation, the FDA will play a big role in realizing the potential of basing an individual’s’ treatment plan on their unique characteristics and genetic profile.

PrecisionFDA was founded upon the principles of collaboration and creating networks of stakeholders from industry, academia, and government. This platform is a successful example of how innovative regulation can spur progress by giving the key community stakeholders the ability to work together to define regulatory science.

In recent years, improvements in NGS technology have enhanced our ability to interrogate the human genome with high-specificity and bring those insights together with clinical patient data, which has pushed us closer to delivering on the promise of precision medicine. In order to keep pace with these technological advancements, it is crucial to harness the network effect of scientific collaboration. By empowering the community members with regulatory input, innovation can be stimulated instead of suppressed, and these innovations in turn will improve upon the quality of genomic tests and lead to advancements in health outcomes for patients.

George Asimenos, VP at DNAnexus will be presenting on precisionFDA at Molecular Med Tri-Conference in San Francisco as part of the Best Practice in Personalized and Translational Medicine short course. Hear the presentation Monday February 20th from 8am-11am.

Learn more and get involved at precision.fda.gov.

New Org Admin Tools Available on DNAnexus

We are excited to announce the release of the Org Admin Interface, a new suite of tools to help manage groups of users and shared resources on the DNAnexus Platform.

What is an “Org”?

An Org, or Organization, is a DNAnexus entity that is used to manage a group of users. At a high level, Orgs can be used to associate users, projects, and other resources with one another in a way that models real-world collaborations. Orgs simplify management of data access, sharing, and billing.

Org Admins are users who are authorized to manage Org membership, configure access and projects associated with the Org, and oversee billing.

How do I use the Org Admin tools?

If you are the admin of an Org, you will be able to access the interface from the header of the DNAnexus Platform. From there, you’ll be able to navigate to all the Orgs you manage.

To learn more about Orgs on DNAnexus and how to use the Org Admin tools, please see the following guides:

Introduction to Orgs

Using the Org Admin tools

How do I create an Org?

If you would like to create an Org for your team, please contact support@dnanexus.com. One of our scientists will be happy to work with you to set up an org structure appropriate for your team.

I am a member of an Org. What tools are available for me?

Members can be granted access levels that vary, from basic access to shared projects and apps to the ability to create new projects billed to an Org. We have provided a guide for interacting with Orgs as a member here.

As always, if you have any questions or feedback about the Org Admin tools, please do not hesitate to contact us at support@dnanexus.com.