Bringing Together Genomics and Patient Data in the Cloud

Please join us Tuesday, February 7, at 10am PT (1pm ET) to hear leading genetics expert, Dr. Jeffrey Reid, Executive Director and Head of Genome Informatics at the Regeneron Genetics Center (RGC), discuss RGC’s integrated approach across genetic trait architectures and phenotypes, the underlying cloud infrastructure that makes the center’s collaboration with multiple institutions possible, and key lessons learned from RGC’s pioneering genomic sequencing study.

Webinar Details
Title: Beyond 100,000 Exomes: Insights & Lessons from Large-Scale Sequencing in the Cloud
Speaker: Jeffrey Reid, Ph.D., Executive Director, Head of Genome Informatics, Regeneron Genetics Center
Date: Tuesday, February 7, 2017
Time: 10:00 AM PT, 1:00 PM ET

Despite growing investment in biopharma research and development, the number of new drugs is not increasing. It is estimated that more than 90% of drugs that enter Phase I clinical trials fail. Among failures in Phase II clinical trials, 51% are due to lack of efficacy and 19% due to toxicity. These statistics suggest that pre-clinical models may be poor predictors of benefit, and together with data on genetically-informed development programs, indicate that human genetics data can substantially improve the likelihood of success for new therapeutics.

Regeneron has a long history of commitment to genetics-based  science, and a track record of integrating human genetics into successful development programs, delivering new medicines to patients. Therefore, the company has made substantial investment in the Regeneron Genetics Center, a cloud-based large-scale sequencing and analysis effort supporting Regeneron development programs. The RGC is a natural extension of this decades-long commitment to genetics at Regeneron, integrating large-scale, diverse data types and fostering collaboration with a wide array of stakeholders, including biopharma, healthcare providers, research institutes, and patient advocacy groups.

The Regeneron Genetics Center has sequenced more than 120,000 people so far, and has created one of the world’s most comprehensive genetics databases pairing sequence data and de-identified electronic health records. The RGC research program involves trait architectures and phenotype collaboration across a network of more than 30 research and healthcare provider institutions. Securely and easily sharing data and tools at scale with so many partners is a major challenge. In order to enable frictionless collaboration across these disparate labs, Regeneron selected DNAnexus to provide the cloud-based bioinformatics platform necessary to securely share large-scale sequencing data and tools.

In this presentation Dr. Reid will explain the RGC vision for genetics-driven drug development, describe the automation and uniquely enabling infrastructure of the RGC, and discuss in detail some of the informatics innovations and early biological insights that have already come out of the RGC’s collaborative efforts.

Just Published New England Journal of Medicine Paper From Geisinger and Regeneron Highlights Value Of Integrating Genetic and EHR Data on DNAnexus

Traditionally, clinical genetic studies have involved deliberate recruitment of patients with specific medical conditions, a process that tends to be lengthy and cumbersome, and generally must be repeated anew for each disease researchers want to study. Moreover, once the patients are finally recruited, the researchers still need to collect and analyze the data on each of these subjects.

Imagine how useful it would be to leverage the knowledge that already exists in a large health system, so that after you designed a study, and decided on the characteristics of patients you wanted to include, you could identify matching patients (and controls) immediately – essentially at the push of a button.

Furthermore, imagine that each of these patients already had rich genetic data, already sitting in an integrated database alongside information from each patient’s electronic health record (EHR).

This is the happy situation that Geisinger Health System and the Regeneron Genetics Center have deliberately created, powered by the DNAnexus platform. De-identified EMR data from consented Geisinger patients participating in Geisinger’s MyCode Community Health Initiative is integrated with whole exome sequencing data from these same patients (an effort known as the “DiscovEHR Project”) and used to drive medical discovery and inform clinical care (see this slide deck, and this front-page New York Times article).

The power (and really, the genius) of this approach was apparent in a paper published this week by researchers from Regeneron and Geisinger in the New England Journal of Medicine (NEJM), revealing a genetic variant that appears to result in reduced levels of triglycerides and a lower risk of coronary artery disease. These results dovetailed with another nice paper published in the same issue of the NEJM by a large academic collective.

In the Regeneron/Geisinger paper, researchers were able to use the genetic information in their integrated database to rapidly identify patients with a suspicious mutation, and use the EHR data to evaluate a range of parameters, including lipid levels and coronary artery disease status, in both patients with mutations as well as in appropriate controls. The Regeneron group also performed subsequent studies in several animal models to further substantiate the biological findings suggested by the human studies.

Not only do these findings point to a potential drug target, but the work represents just one of many similar studies that could be done with equal ease using the approach Geisinger and Regeneron have established. If the researchers want to look at a different gene, or a different condition, the basic process would be almost identical. Moreover, as the partnerships adds more and more patients (I’ve heard Regeneron founder and President George Yancopoulos say he is aiming for half a million) with associated EHR data and sequenced exomes, the power of such studies will only increase.

This approach also highlights the insights that might be achieved through integrative data efforts such as the President’s Precision Medicine Initiative, if executed in a similarly effective fashion.

The Geisinger/Regeneron collaboration is a brilliant vision for medical science and for drug discovery, and there are a number of key success factors that we shouldn’t take for granted.

First, on the Geisinger side, the foundational aspect of this entire effort is Geisinger’s trusted relationship with its patients, and Geisinger’s demonstrated commitment to treating patients as partners. Geisinger was at the leading edge of Open Notes (sharing physician notes with patients), for example.   Geisinger has put considerable thought into the process of patient consent, and also has ensured most patients who join the discovEHR cohort are recontactable.

Gesinger also was an early adopter of EHRs; consequently, Geisinger’s EHRs harbor unusually good longitudinal data, and often contain data from several generations of family members. Geisinger also systematically reviews and curates the EMR data used in clinical studies, to ensure adequate quality.

Regeneron, for it’s part, has a clear vision for the use of genetics in drug discovery, which in their hands seems to be a very deliberate, very dynamic process. Regeneron researchers aren’t randomly collecting information, stirring it in a pot, and asking a computer to sort it all out. To the contrary, they are pursuing an approach that seems generally hypothesis-oriented, evaluating either specific candidates genes and variants (as they did here), then looking at the phenotypes, or they are looking at specific phenotypes of interest, and asking whether there are particular genetic patterns to be found.

Two additional important elements of Regeneron’s strategy that may not be immediately obvious are: (a) the exceptional team of data scientists they’ve brought together to prosecute the analytics, and (b) their ability to quickly pressure-test suggestive results by rapidly creating both targeted antibodies and relevant mouse models – both of which were utilized in the work described in the recent NEJM paper.

Finally, of course, the success of this approach relies upon a powerful and secure, and intuitive platform – DNAnexus — where the data integration can occur, where distributed stakeholders can collaborate, and where a range of analyses can occur.

At DNAnexus, we feel privileged to contribute so foundationally to such great integrative science, and look forward to the next discovery – and to the one after that.

100% Cloud-based Genome Center Integrating Large Healthcare Data Flows

photo: The Cancer Genome Atlas
photo: The Cancer Genome Atlas

In a previous post, our new CMO, David Shaywitz, talked about his vision for DNAnexus and its role in helping fulfill the promise of genomic medicine:

“DNAnexus represents a natural home for these aspirations, offering a compelling, secure, cloud-based data management platform, an enabling tool for any healthcare organization – academic medical center, healthcare system, biopharma company, payor – who recognizes that getting a handle on large healthcare data flows is rapidly becoming table stakes, and that figuring out how to manage and leverage genomic data is a wise place to start.”

Fast-forward two months…  This week, we announced exciting progress in our efforts to accelerate genomic medicine.  The DNAnexus cloud-based genome informatics and data management platform is powering a number of collaborations between Regeneron Genetics Center (RGC) and its leading healthcare provider partners.

In a RGC press release, they announced these new collaborators, which include the Geisinger Health System, Columbia University Medical Center, Clinic for Special Children, and Baylor College of Medicine. The RGC will be using the DNAnexus platform to integrate sequencing data with de-identified clinical records from patient volunteers. To date, the RGC has sequenced samples from more than 10,000 individuals and is currently sequencing more than 50,000 samples per year.

The Geisinger collaboration, which has been described as the largest clinical sequencing project in the U.S., is on track to sequence more than 100,000 patient volunteer samples. This DNAnexus-powered initiative has resulted in the first 100% cloud-based biopharma genome center, and is now operating at scale.

Next-generation sequencing technologies, like Illumina’s HiSeq 2500 or X Ten platform, have reduced the cost and increased the speed of DNA sequencing outpacing Moore’s Law to the point where the new bottleneck is genome informatics. To address this issue, companies like Regeneron are adopting cloud-based solutions to handle the massive volume of sequencing data.

DNAnexus provides the technology backbone that enables the sharing and management of data and tools around large volumes of sequencing data between the RGC and its healthcare collaborators. Currently the RGC is processing more than 1,000 exomes per week and sharing the data easily and safely with their collaborators.

In order to improve patient care and ultimately human health, the integration of genomic and phenotypic data needs to happen on a massive scale (something David has recently discussed from the perspective of phenotype here and here). Combining large cohorts of deeply-phenotyped individuals with their genomic data offers a wide range of medical applications, the most obvious being a more personalized approach to medical interventions such as which therapy might work best for a given individual. These data can also be used to aid in the development of new companion diagnostics and clinical trial participant selection. As an article in GigaOM put it this week: Cloud Computing is Coming for Your DNA, and it Will Lead to Better Drugs and Health Care.

These collaborations are powerful examples of how the DNAnexus platform is enabling an integrated approach between biopharmaceutical companies and their partners to accelerate the research and discovery process. As David said, healthcare industry leaders who prioritize the management of large healthcare data flows will emerge as the pioneers who help us realize the full vision of precision medicine –delivery of the optimal therapy to the right patients at the right time – ideally before they are sick.