ACMG: A Look at Applying Genomic Data to Clinical Reports

The annual American College of Medical Geneticists (ACMG) conference meets this week (March 21-25, 2017) in Phoenix, Arizona, providing an outstanding forum to learn how genetics and genomics are being integrated into medical and clinical practice. Eric Venner, from the Human Genome Sequencing Center (HGSC) at Baylor College of Medicine, will present the following poster (Abstract Number 368): Generating Clinical Reports from Genomic Data on the Cloud-based Neptune Platform  on Friday March 24th 10:30AM-12:00PM

In order to meet the demand for timely and cost-efficient clinical reporting, HGSC developed Neptune, an automated analytical platform to sign out and deliver clinical reports. The process starts when a clinical site uploads a test requisition to the HIPAA compliant environment on DNAnexus. Next, de-identified samples are analyzed with HGSC’s variant calling pipeline, Mercury, which feeds into the reporting pipeline, Neptune. Variants of putative clinical relevance are identified for manual review and possible addition to a VIP database of clinically relevant variation. The VIP database currently holds 20,872 SNPs and 3,946 indels, as well as a curated set of copy number variants.

Neptune’s manual review interface was designed with a clinical geneticist in mind. Users can login, curate variants in their samples, update the VIP database accordingly and create clinical reports. Early applications include reporting for the NIH Electronic Medical Records and Genomics (eMERGE) Network III where more than 14,500 samples and a panel of 109 genes will be processed over the course of three years.

eMERGE is a national network that combines DNA biorepositories with electronic medical record (EMR) systems for large scale, high-throughput genetic research to support investigating how personalized treatments impact patient care. Research so far has led to significant discoveries across a wide range of diseases, including prostate cancer, leukemia, and diabetes.  DNAnexus and the Human Genome Sequencing Center (HGSC) at Baylor College of Medicine worked to build the eMERGE Commons, a data repository where genomic data are merged with patient electronic medical records (EMR), as well as analysis results and bioinformatics tools to be accessed and applied by eMERGE researchers.

Bringing Together Genomics and Patient Data in the Cloud

Please join us Tuesday, February 7, at 10am PT (1pm ET) to hear leading genetics expert, Dr. Jeffrey Reid, Executive Director and Head of Genome Informatics at the Regeneron Genetics Center (RGC), discuss RGC’s integrated approach across genetic trait architectures and phenotypes, the underlying cloud infrastructure that makes the center’s collaboration with multiple institutions possible, and key lessons learned from RGC’s pioneering genomic sequencing study.

Webinar Details
Title: Beyond 100,000 Exomes: Insights & Lessons from Large-Scale Sequencing in the Cloud
Speaker: Jeffrey Reid, Ph.D., Executive Director, Head of Genome Informatics, Regeneron Genetics Center
Date: Tuesday, February 7, 2017
Time: 10:00 AM PT, 1:00 PM ET

Despite growing investment in biopharma research and development, the number of new drugs is not increasing. It is estimated that more than 90% of drugs that enter Phase I clinical trials fail. Among failures in Phase II clinical trials, 51% are due to lack of efficacy and 19% due to toxicity. These statistics suggest that pre-clinical models may be poor predictors of benefit, and together with data on genetically-informed development programs, indicate that human genetics data can substantially improve the likelihood of success for new therapeutics.

Regeneron has a long history of commitment to genetics-based  science, and a track record of integrating human genetics into successful development programs, delivering new medicines to patients. Therefore, the company has made substantial investment in the Regeneron Genetics Center, a cloud-based large-scale sequencing and analysis effort supporting Regeneron development programs. The RGC is a natural extension of this decades-long commitment to genetics at Regeneron, integrating large-scale, diverse data types and fostering collaboration with a wide array of stakeholders, including biopharma, healthcare providers, research institutes, and patient advocacy groups.

The Regeneron Genetics Center has sequenced more than 120,000 people so far, and has created one of the world’s most comprehensive genetics databases pairing sequence data and de-identified electronic health records. The RGC research program involves trait architectures and phenotype collaboration across a network of more than 30 research and healthcare provider institutions. Securely and easily sharing data and tools at scale with so many partners is a major challenge. In order to enable frictionless collaboration across these disparate labs, Regeneron selected DNAnexus to provide the cloud-based bioinformatics platform necessary to securely share large-scale sequencing data and tools.

In this presentation Dr. Reid will explain the RGC vision for genetics-driven drug development, describe the automation and uniquely enabling infrastructure of the RGC, and discuss in detail some of the informatics innovations and early biological insights that have already come out of the RGC’s collaborative efforts.

DNAnexus & TCGA: Reanalyzing the World’s Largest Pan-Cancer Initiative Dataset

The Cancer Genome Atlas (TCGA), a joint effort between the National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI), was established in 2006 to create a detailed catalog of genetic mutations responsible for cancer using next-generation sequencing.  Over the years, TCGA collaborators have generated over 2.5 petabytes of data collected from nearly 11,000 patients, describing 34 different tumor types (including 10 rare cancers) based on paired tumor and normal tissue sets.

TCGtcgaA has been an incredible learning process for the genomics community. In the start of this initiative, for example,  researchers didn’t know as much about mutation calling.  Over the past decade, however, we’ve improved sensitivity of variant callers and have a much greater number of samples to assess genomic markers.

“In addition, mutation calling for TCGA samples was primarily done for individual tumor types, with projects using different mutation callers or different versions of the callers, meaning the data wasn’t uniform,” said Carolyn Hutter, PhD, Program Director, Division of Genomic Medicine at the NHGRI. “We now believe the best way to do analysis is to have a uniform set of calls generated by multiple mutation callers, with quality control and filtering, across multiple cancer types. That’s why the TCGA team decided to go back and recall the over 10,000 exomes in TCGA and produce this multi-caller somatic mutation dataset.”

Resequencing the TCGA dataset was a massive undertaking. The necessary compute resources for a large-scale project of this nature was not in place at TCGA member institutes. The DNAnexus Platform provided important requirements for the mutation calling project, including patient security, a scalable environment that could handle tens of thousands of exomes, and reproducibility of results. Over a four-week period approximately 1.8 million core-hours of computational time were used to process 400 TB of data, yielding reproducible results.

“Realigning TCGA data with a single methodology across new standardized mutation callers will make the tumor data much more relevant to the community. The DNAnexus Platform allowed us to create a uniform  and analytical treatment through version-controlled analyses and tools that would have been challenging to replicate at any single facility in a reasonable time frame,” said David Wheeler, PhD, Professor, Department of Molecular and Human Genetics at Baylor College of Medicine.  “With this standardized set of mutation calls obtained by several callers, we’ll be able to identify genetic alterations contributing to cancer that are shared between tumors independent of the tissue-of-origin. We are optimistic that having access to such information will spur advancement in precision medicine.”

Key TCGA results to date have been:

  • Improved understanding of the genomic underpinnings of cancer
  • Reclassification of cancer by identifying tumor subtypes with distinct sets of genomic alterations
  • Insights into treatment approaches based on currently available therapies or used to help with drug development.

The value of this reanalysis under a single methodology across new standardized mutation callers allows for the samples to be compared across cancer types. This will facilitate further new findings, such as if one individual’s breast cancer may show greater genomic similarity to a subtype of ovarian cancer than to other types of breast cancer. In the future, we believe patients will be treated based on their genomic profile rather than the origin of  their cancer. DNAnexus is proud to collaborate with TCGA in making this important dataset more useful to the cancer research community.

Researchers now have access to the TCGA pipelines via the DNAnexus Platform in addition to a GitHub repository. DNAnexus works to ensure mechanisms for data access requests and vending data to approved requestors meets security standards for dbGaP and TCGA data in the cloud.

Review the latest NIH Security Best Practices for Controlled-Access Data Subject to the NIH Genomic Data Sharing Policy, where the NIH cited DNAnexus Compliance White Paper.