Bringing Together Genomics and Patient Data in the Cloud

Please join us Tuesday, February 7, at 10am PT (1pm ET) to hear leading genetics expert, Dr. Jeffrey Reid, Executive Director and Head of Genome Informatics at the Regeneron Genetics Center (RGC), discuss RGC’s integrated approach across genetic trait architectures and phenotypes, the underlying cloud infrastructure that makes the center’s collaboration with multiple institutions possible, and key lessons learned from RGC’s pioneering genomic sequencing study.

Webinar Details
Title: Beyond 100,000 Exomes: Insights & Lessons from Large-Scale Sequencing in the Cloud
Speaker: Jeffrey Reid, Ph.D., Executive Director, Head of Genome Informatics, Regeneron Genetics Center
Date: Tuesday, February 7, 2017
Time: 10:00 AM PT, 1:00 PM ET

Despite growing investment in biopharma research and development, the number of new drugs is not increasing. It is estimated that more than 90% of drugs that enter Phase I clinical trials fail. Among failures in Phase II clinical trials, 51% are due to lack of efficacy and 19% due to toxicity. These statistics suggest that pre-clinical models may be poor predictors of benefit, and together with data on genetically-informed development programs, indicate that human genetics data can substantially improve the likelihood of success for new therapeutics.

Regeneron has a long history of commitment to genetics-based  science, and a track record of integrating human genetics into successful development programs, delivering new medicines to patients. Therefore, the company has made substantial investment in the Regeneron Genetics Center, a cloud-based large-scale sequencing and analysis effort supporting Regeneron development programs. The RGC is a natural extension of this decades-long commitment to genetics at Regeneron, integrating large-scale, diverse data types and fostering collaboration with a wide array of stakeholders, including biopharma, healthcare providers, research institutes, and patient advocacy groups.

The Regeneron Genetics Center has sequenced more than 120,000 people so far, and has created one of the world’s most comprehensive genetics databases pairing sequence data and de-identified electronic health records. The RGC research program involves trait architectures and phenotype collaboration across a network of more than 30 research and healthcare provider institutions. Securely and easily sharing data and tools at scale with so many partners is a major challenge. In order to enable frictionless collaboration across these disparate labs, Regeneron selected DNAnexus to provide the cloud-based bioinformatics platform necessary to securely share large-scale sequencing data and tools.

In this presentation Dr. Reid will explain the RGC vision for genetics-driven drug development, describe the automation and uniquely enabling infrastructure of the RGC, and discuss in detail some of the informatics innovations and early biological insights that have already come out of the RGC’s collaborative efforts.

DNAnexus & TCGA: Reanalyzing the World’s Largest Pan-Cancer Initiative Dataset

The Cancer Genome Atlas (TCGA), a joint effort between the National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI), was established in 2006 to create a detailed catalog of genetic mutations responsible for cancer using next-generation sequencing.  Over the years, TCGA collaborators have generated over 2.5 petabytes of data collected from nearly 11,000 patients, describing 34 different tumor types (including 10 rare cancers) based on paired tumor and normal tissue sets.

TCGtcgaA has been an incredible learning process for the genomics community. In the start of this initiative, for example,  researchers didn’t know as much about mutation calling.  Over the past decade, however, we’ve improved sensitivity of variant callers and have a much greater number of samples to assess genomic markers.

“In addition, mutation calling for TCGA samples was primarily done for individual tumor types, with projects using different mutation callers or different versions of the callers, meaning the data wasn’t uniform,” said Carolyn Hutter, PhD, Program Director, Division of Genomic Medicine at the NHGRI. “We now believe the best way to do analysis is to have a uniform set of calls generated by multiple mutation callers, with quality control and filtering, across multiple cancer types. That’s why the TCGA team decided to go back and recall the over 10,000 exomes in TCGA and produce this multi-caller somatic mutation dataset.”

Resequencing the TCGA dataset was a massive undertaking. The necessary compute resources for a large-scale project of this nature was not in place at TCGA member institutes. The DNAnexus Platform provided important requirements for the mutation calling project, including patient security, a scalable environment that could handle tens of thousands of exomes, and reproducibility of results. Over a four-week period approximately 1.8 million core-hours of computational time were used to process 400 TB of data, yielding reproducible results.

“Realigning TCGA data with a single methodology across new standardized mutation callers will make the tumor data much more relevant to the community. The DNAnexus Platform allowed us to create a uniform  and analytical treatment through version-controlled analyses and tools that would have been challenging to replicate at any single facility in a reasonable time frame,” said David Wheeler, PhD, Professor, Department of Molecular and Human Genetics at Baylor College of Medicine.  “With this standardized set of mutation calls obtained by several callers, we’ll be able to identify genetic alterations contributing to cancer that are shared between tumors independent of the tissue-of-origin. We are optimistic that having access to such information will spur advancement in precision medicine.”

Key TCGA results to date have been:

  • Improved understanding of the genomic underpinnings of cancer
  • Reclassification of cancer by identifying tumor subtypes with distinct sets of genomic alterations
  • Insights into treatment approaches based on currently available therapies or used to help with drug development.

The value of this reanalysis under a single methodology across new standardized mutation callers allows for the samples to be compared across cancer types. This will facilitate further new findings, such as if one individual’s breast cancer may show greater genomic similarity to a subtype of ovarian cancer than to other types of breast cancer. In the future, we believe patients will be treated based on their genomic profile rather than the origin of  their cancer. DNAnexus is proud to collaborate with TCGA in making this important dataset more useful to the cancer research community.

Researchers now have access to the TCGA pipelines via the DNAnexus Platform in addition to a GitHub repository. DNAnexus works to ensure mechanisms for data access requests and vending data to approved requestors meets security standards for dbGaP and TCGA data in the cloud.

Review the latest NIH Security Best Practices for Controlled-Access Data Subject to the NIH Genomic Data Sharing Policy, where the NIH cited DNAnexus Compliance White Paper.

The U.S. Cancer Moonshot and a Culture of Collaboration

Yesterday, United States Vice President Joe Biden hosted the National Cancer Moonshot Summit. Scientists, oncologists, donors, and patients convened for a daylong conference intended to pick up the pace of research towards curing cancer. Rather than focusing on one specific type of cancer, the conference broadly discussed more than 100 types of cancer; emphasizing strategies for prevention, early detection, wide access to treatment, and encouraging collaboration among researchers. You can read a first-hand account from our CMO, David Shaywitz, here.

As part of the Moonshot effort, DNAnexus, in partnership with PatientCrossroads, has committed to develop the Integrated Data Engagement Analytics (IDEA) platform to facilitate the collection, analysis, and sharing of genetic, proteomic, and EMR/phenotypic data to accelerate disease research. PatientCrossroads and DNAnexus are currently engaging in a pioneering effort to help patients obtain the raw genetic files and medical records and then integrate these data along with patient reported outcomes data obtained by PatientCrossroads on a secure and compliant platform that allows authorized researcher access to this information and use it to develop novel insights — the IDEA platform. You can review the complete list of public and private sector Cancer Moonshot commitments announced in the White House press release.

Here at DNAnexus, we are particularly devoted to reducing the technical barriers to accessing and working with research datasets.  We believe that a culture of openness in genomic research will lead to greater medical breakthroughs. Most data sharing in cancer genomics research has been centralized through rich, yet controlled-access databases like The Cancer Genome Atlas (TCGA) or International Cancer Genome Consortium (ICGC) — both of which properly approved researchers can easily access on the DNAnexus Platform. Read more about some of the genomic community collaborative initiatives DNAnexus is a part of: precisionFDA, open access cancer genomics pilot, and ICGC.