Just Published New England Journal of Medicine Paper From Geisinger and Regeneron Highlights Value Of Integrating Genetic and EHR Data on DNAnexus

Traditionally, clinical genetic studies have involved deliberate recruitment of patients with specific medical conditions, a process that tends to be lengthy and cumbersome, and generally must be repeated anew for each disease researchers want to study. Moreover, once the patients are finally recruited, the researchers still need to collect and analyze the data on each of these subjects.

Imagine how useful it would be to leverage the knowledge that already exists in a large health system, so that after you designed a study, and decided on the characteristics of patients you wanted to include, you could identify matching patients (and controls) immediately – essentially at the push of a button.

Furthermore, imagine that each of these patients already had rich genetic data, already sitting in an integrated database alongside information from each patient’s electronic health record (EHR).

This is the happy situation that Geisinger Health System and the Regeneron Genetics Center have deliberately created, powered by the DNAnexus platform. De-identified EMR data from consented Geisinger patients participating in Geisinger’s MyCode Community Health Initiative is integrated with whole exome sequencing data from these same patients (an effort known as the “DiscovEHR Project”) and used to drive medical discovery and inform clinical care (see this slide deck, and this front-page New York Times article).

The power (and really, the genius) of this approach was apparent in a paper published this week by researchers from Regeneron and Geisinger in the New England Journal of Medicine (NEJM), revealing a genetic variant that appears to result in reduced levels of triglycerides and a lower risk of coronary artery disease. These results dovetailed with another nice paper published in the same issue of the NEJM by a large academic collective.

In the Regeneron/Geisinger paper, researchers were able to use the genetic information in their integrated database to rapidly identify patients with a suspicious mutation, and use the EHR data to evaluate a range of parameters, including lipid levels and coronary artery disease status, in both patients with mutations as well as in appropriate controls. The Regeneron group also performed subsequent studies in several animal models to further substantiate the biological findings suggested by the human studies.

Not only do these findings point to a potential drug target, but the work represents just one of many similar studies that could be done with equal ease using the approach Geisinger and Regeneron have established. If the researchers want to look at a different gene, or a different condition, the basic process would be almost identical. Moreover, as the partnerships adds more and more patients (I’ve heard Regeneron founder and President George Yancopoulos say he is aiming for half a million) with associated EHR data and sequenced exomes, the power of such studies will only increase.

This approach also highlights the insights that might be achieved through integrative data efforts such as the President’s Precision Medicine Initiative, if executed in a similarly effective fashion.

The Geisinger/Regeneron collaboration is a brilliant vision for medical science and for drug discovery, and there are a number of key success factors that we shouldn’t take for granted.

First, on the Geisinger side, the foundational aspect of this entire effort is Geisinger’s trusted relationship with its patients, and Geisinger’s demonstrated commitment to treating patients as partners. Geisinger was at the leading edge of Open Notes (sharing physician notes with patients), for example.   Geisinger has put considerable thought into the process of patient consent, and also has ensured most patients who join the discovEHR cohort are recontactable.

Gesinger also was an early adopter of EHRs; consequently, Geisinger’s EHRs harbor unusually good longitudinal data, and often contain data from several generations of family members. Geisinger also systematically reviews and curates the EMR data used in clinical studies, to ensure adequate quality.

Regeneron, for it’s part, has a clear vision for the use of genetics in drug discovery, which in their hands seems to be a very deliberate, very dynamic process. Regeneron researchers aren’t randomly collecting information, stirring it in a pot, and asking a computer to sort it all out. To the contrary, they are pursuing an approach that seems generally hypothesis-oriented, evaluating either specific candidates genes and variants (as they did here), then looking at the phenotypes, or they are looking at specific phenotypes of interest, and asking whether there are particular genetic patterns to be found.

Two additional important elements of Regeneron’s strategy that may not be immediately obvious are: (a) the exceptional team of data scientists they’ve brought together to prosecute the analytics, and (b) their ability to quickly pressure-test suggestive results by rapidly creating both targeted antibodies and relevant mouse models – both of which were utilized in the work described in the recent NEJM paper.

Finally, of course, the success of this approach relies upon a powerful and secure, and intuitive platform – DNAnexus — where the data integration can occur, where distributed stakeholders can collaborate, and where a range of analyses can occur.

At DNAnexus, we feel privileged to contribute so foundationally to such great integrative science, and look forward to the next discovery – and to the one after that.

DNAnexus Made Ridiculously Simple

In medical school, perhaps the most indispensable texts were the “Ridiculously Simple” series – Clinical Anatomy Made Ridiculously Simple, Acid-Base Made Ridiculously Simple, etc. While you probably wouldn’t want to operate or dialyze based only on the knowledge in these short books, they nevertheless offered accessible overviews to complex and often intimidating topics.

In this spirit – and in response to questions from friends and family who regularly ask, “What does DNAnexus do” – I thought I might offer this short post.

What Is DNAnexus?
dnanexus genomics
DNAnexus is a platform – basically, a sophisticated software program – that makes it easier for users to do three things, each in a secure and compliant fashion:

  • Analyze large amounts of raw genetic data
  • Share and collaborate around large amounts of data (including but not limited to genetics)
  • Integrate genetic data with other types of data, such as data from electronic medical records or imaging data, to advance science and to improve clinical care

Let’s take these one at a time.

(1) Analysis Of Raw Sequencing Data
The basic idea here is that the machines that are used to read DNA sequence are incredibly powerful, but don’t generate a book of information that starts at the beginning of the first chromosome and concludes at the end of the last one. Rather, most sequencing machines spit out phrases of about 100 letters, phrases randomly located anywhere in the 3 billion letter book that is the human genome. A computer must figure out where each individual phrase fits in the book, and must also determine whether there are any typos. This can be a computationally intensive task, but DNAnexus provides a way to do this efficiently, by dividing the task into multiple parallel streams each of which can be tackled by a powerful computer.

The computers DNAnexus tends to use are run by Amazon (more precisely, by Amazon Web Services, or AWS), and our use of them is an example of what’s known as “cloud computing” because the computers operate from a massive, dedicated central facility, rather than from a user’s own institution. One advantage of using cloud computing is it’s very much “on demand” – i.e. you have essentially unlimited access to as many computers as you need, and you only pay for the computers that you actually use, and only when you are actually using them.

(2) Distributed Collaboration
Progress in both science and medicine can be accelerated when data can be easily shared. When there are large volumes of data, as is increasingly the case in research and clinical realms, this can be a real problem. Remarkably, the most common method of large-scale data sharing today is probably FedEx’ing hard drives between institutions. What DNAnexus enables is for a distributed team of researchers or clinicians to all have access to the same data at the same time; by bringing together the data, the experts, and the tools for analysis, DNAnexus facilitates collaboration and accelerates knowledge turns.

DNAnexus is ideally suited to power consortia, whether NIH investigators (as in the case with our work with CHARGE in the area of cardiovascular disease or our work with ENCODE in the area of genetic annotation), diagnostic companies (our work on precisionFDA), translational research partnerships (our work with Regeneron and Geisinger Health system), or a public/private partnership of cancer researchers (our work with ITOMIC led by University of Washington’s Tony Blau).

The ability to support distributed innovation also enables DNAnexus to provide global support for companies like Natera that send kits to sequencing labs worldwide, but collect and analyze the data centrally using DNAnexus.

(3) Integration With Other Data Types
The insights that may be available in genetic data are often revealed only when the information is considered and analyzed in the context of other data types, such as data from electronic health records (EHR) or imaging data (such as radiology images or pathology images). Integrating genetic and EHR data is fundamental to the drug discovery work of Regeneron, for example. In the same way our partners can easily access and efficiently utilize the fundamental tools of genetic analysis on our platform, so too can they access and utilize the tools required for integrating genetic data with other data types. DNAnexus is adding tools constantly, based on the needs expressed by our partners.

Looking Ahead
Guided by the visionary partners with whom we are privileged to work, DNAnexus continues to enhance our tools around each of these three areas: DNA analysis, distributed collaboration, and integration with other data types. We are constantly seeking opportunities to leverage the technology we’ve developed, as well as innovative leaders looking to bring the power of our platform to bear in original and impactful ways.

precisionFDA: Why It Matters

Screen Shot 2015-12-15 at 9.32.57 AMI hadn’t intended to write about precisionFDA going live – this post by Dr. Taha Kass-Hout and Elaine Johanson of the FDA provides a terrific summary, and this post by Angela Anderson of DNAnexus offers valuable additional context. However, I found myself today so excited by this project and what it represents that I can’t resist offering a few additional thoughts about what makes this initiative so special.

First, it addresses an important problem in the field: the analytic validity of NGS tests. The ready availability of relatively inexpensive sequencing has enabled us to contemplate diagnostic sequencing at a scale that would have been difficult to imagine even a decade ago. At the same time, the drive to apply sequencing in different clinical contexts raises a critically important question: do I trust this test? A key starting point for clinical interpretation of DNA data is to agree on the sequence itself. If your procedure and analysis reports that a particular sequence in a DNA sample is “GATCGATC” and my procedure and analysis of the same DNA says the sequence is “GATTGATC,” then we’ve got a problem. precisionFDA will allow users to compare approaches, figure out what’s working, and determine where refinements might be needed.

Second, precisionFDA represents a novel and forward-thinking approach to regulation. Rather than envisioning governmental regulators as the folks who will define and then impose a specific set of performance standards, precisionFDA instead sees the government as providing the platform that will enable the NGS community to evolve the standards on their own — organically and transparently.

Finally, the ability to design, refine, and deploy this platform in such a rapid and agile fashion reflects in part the value of well-conceptualized public-private partnerships, in this case between the FDA and DNAnexus. By intentionally leveraging the skills and capabilities of a company like ours, the FDA was able to implement and realize their exciting and ambitious vision.

The ultimate success of the precisionFDA platform will of course depend upon how well it serves the community it is intended to support. However, it’s hard to think of a more auspicious beginning, and my hope would be that success here will encourage more leaders to evaluate the potential of public/private partnerships to deploy platforms that leverage the power of a distributed innovation community to address important shared challenges.