Traditionally, clinical genetic studies have involved deliberate recruitment of patients with specific medical conditions, a process that tends to be lengthy and cumbersome, and generally must be repeated anew for each disease researchers want to study. Moreover, once the patients are finally recruited, the researchers still need to collect and analyze the data on each of these subjects.
Imagine how useful it would be to leverage the knowledge that already exists in a large health system, so that after you designed a study, and decided on the characteristics of patients you wanted to include, you could identify matching patients (and controls) immediately – essentially at the push of a button.
Furthermore, imagine that each of these patients already had rich genetic data, already sitting in an integrated database alongside information from each patient’s electronic health record (EHR).
This is the happy situation that Geisinger Health System and the Regeneron Genetics Center have deliberately created, powered by the DNAnexus platform. De-identified EMR data from consented Geisinger patients participating in Geisinger’s MyCode Community Health Initiative is integrated with whole exome sequencing data from these same patients (an effort known as the “DiscovEHR Project”) and used to drive medical discovery and inform clinical care (see this slide deck, and this front-page New York Times article).
The power (and really, the genius) of this approach was apparent in a paper published this week by researchers from Regeneron and Geisinger in the New England Journal of Medicine (NEJM), revealing a genetic variant that appears to result in reduced levels of triglycerides and a lower risk of coronary artery disease. These results dovetailed with another nice paper published in the same issue of the NEJM by a large academic collective.
In the Regeneron/Geisinger paper, researchers were able to use the genetic information in their integrated database to rapidly identify patients with a suspicious mutation, and use the EHR data to evaluate a range of parameters, including lipid levels and coronary artery disease status, in both patients with mutations as well as in appropriate controls. The Regeneron group also performed subsequent studies in several animal models to further substantiate the biological findings suggested by the human studies.
Not only do these findings point to a potential drug target, but the work represents just one of many similar studies that could be done with equal ease using the approach Geisinger and Regeneron have established. If the researchers want to look at a different gene, or a different condition, the basic process would be almost identical. Moreover, as the partnerships adds more and more patients (I’ve heard Regeneron founder and President George Yancopoulos say he is aiming for half a million) with associated EHR data and sequenced exomes, the power of such studies will only increase.
This approach also highlights the insights that might be achieved through integrative data efforts such as the President’s Precision Medicine Initiative, if executed in a similarly effective fashion.
The Geisinger/Regeneron collaboration is a brilliant vision for medical science and for drug discovery, and there are a number of key success factors that we shouldn’t take for granted.
First, on the Geisinger side, the foundational aspect of this entire effort is Geisinger’s trusted relationship with its patients, and Geisinger’s demonstrated commitment to treating patients as partners. Geisinger was at the leading edge of Open Notes (sharing physician notes with patients), for example. Geisinger has put considerable thought into the process of patient consent, and also has ensured most patients who join the discovEHR cohort are recontactable.
Gesinger also was an early adopter of EHRs; consequently, Geisinger’s EHRs harbor unusually good longitudinal data, and often contain data from several generations of family members. Geisinger also systematically reviews and curates the EMR data used in clinical studies, to ensure adequate quality.
Regeneron, for it’s part, has a clear vision for the use of genetics in drug discovery, which in their hands seems to be a very deliberate, very dynamic process. Regeneron researchers aren’t randomly collecting information, stirring it in a pot, and asking a computer to sort it all out. To the contrary, they are pursuing an approach that seems generally hypothesis-oriented, evaluating either specific candidates genes and variants (as they did here), then looking at the phenotypes, or they are looking at specific phenotypes of interest, and asking whether there are particular genetic patterns to be found.
Two additional important elements of Regeneron’s strategy that may not be immediately obvious are: (a) the exceptional team of data scientists they’ve brought together to prosecute the analytics, and (b) their ability to quickly pressure-test suggestive results by rapidly creating both targeted antibodies and relevant mouse models – both of which were utilized in the work described in the recent NEJM paper.
Finally, of course, the success of this approach relies upon a powerful and secure, and intuitive platform – DNAnexus — where the data integration can occur, where distributed stakeholders can collaborate, and where a range of analyses can occur.
At DNAnexus, we feel privileged to contribute so foundationally to such great integrative science, and look forward to the next discovery – and to the one after that.