DNAnexus: Powering AZ’s 2-Million Genome Translational Vision

AstraZenecaThe volume of biomedical data available for analysis is increasing at an exponential rate, yet translating this information into insight remains both a formidable challenge and a remarkable opportunity. The demonstrable success already achieved by the Regeneron Genetics Center (RGC) and Geisinger Health System in integrating genetic and phenotypic data to inform drug development and benefit patients points to the potential of this approach, and highlights what impassioned champions with a good plan and the right platform can accomplish. (RGC/Geisinger publications, powered by DNAnexus, are discussed here and here.)

The success and promise of the RGC/Geisinger collaboration has prompted an expansion of this vision – including at Regeneron itself, which, in partnership with GSK and the UK Biobank, has announced plans to analyze the genetic data of another 500,000 individuals, powered again by the DNAnexus Platform.

These studies, aimed to catalyze the discovery and development of consequential new medicines, are motivated in part by what translational scientist Robert Plenge (formerly of Merck, now at Celgene) has termed “causal human biology” – the ability to use rare, highly informative genetic variants to better understand the staggering complexity of human biology and human disease. (Plenge has discussed this concept in Science Translational Medicine, in a fantastic Timmerman Report post reprinted in his must-read Plenge Gen blog, and on the Tech Tonics podcast.)

Plenge – and the industry, more generally – is hopeful that leveraging causal human biology can help pharma companies select better targets and more intelligently prosecute them, hopefully resulting in dramatically improved success in phase 2/3 trials; the high failure rates in these expensive mid- and late-stage studies are one of the main reasons drug development is so costly.

In this context, DNAnexus is especially excited to announce today its partnership with Astrazeneca in a particularly ambitious genetics project, the AstraZeneca Centre for Genomic Research, which was established by AZ in 2016 “to transform drug discovery and development across its entire research and development pipeline.

The vision, has been nicely articulated by Menelas Pangalos, Executive Vice President, Innovative Medicines & Early Development at AstraZeneca:

“Using the power of genomics is the foundation of our ambition to develop the most innovative and impactful treatments for patients. With the advent of next generation sequencing and the increased sophistication of data analysis, the time is now right to immerse ourselves fully in the international genomics community through these pioneering collaborations and through the creation of our own genome centre. We will leverage information from up to 2 million genome sequences, including over 500,000 from our own clinical trials, to drive drug discovery and development across all our therapeutic areas. Genomics will be fundamental to our laboratory research, our clinical trials and the launch of our medicines for patients.”

The requirements of this project – including, in particular, the ability to (a) manage high volumes of genomic data in a secure and compliant fashion; (b) facilitate the integration of genetic data with other data types, and (c) enable global collaboration around these data – were a natural fit for the DNAnexus translational informatics platform.

The DNAnexus team is tremendously exciting by the opportunity to power the efforts of visionary industry leaders such as Regeneron and AstraZeneca in translating the promise of precision medicine and data analytics into discrete novel medicines that can meaningfully improve the lives of patients.

Just Published New England Journal of Medicine Paper From Geisinger and Regeneron Highlights Value Of Integrating Genetic and EHR Data on DNAnexus

Traditionally, clinical genetic studies have involved deliberate recruitment of patients with specific medical conditions, a process that tends to be lengthy and cumbersome, and generally must be repeated anew for each disease researchers want to study. Moreover, once the patients are finally recruited, the researchers still need to collect and analyze the data on each of these subjects.

Imagine how useful it would be to leverage the knowledge that already exists in a large health system, so that after you designed a study, and decided on the characteristics of patients you wanted to include, you could identify matching patients (and controls) immediately – essentially at the push of a button.

Furthermore, imagine that each of these patients already had rich genetic data, already sitting in an integrated database alongside information from each patient’s electronic health record (EHR).

This is the happy situation that Geisinger Health System and the Regeneron Genetics Center have deliberately created, powered by the DNAnexus platform. De-identified EMR data from consented Geisinger patients participating in Geisinger’s MyCode Community Health Initiative is integrated with whole exome sequencing data from these same patients (an effort known as the “DiscovEHR Project”) and used to drive medical discovery and inform clinical care (see this slide deck, and this front-page New York Times article).

The power (and really, the genius) of this approach was apparent in a paper published this week by researchers from Regeneron and Geisinger in the New England Journal of Medicine (NEJM), revealing a genetic variant that appears to result in reduced levels of triglycerides and a lower risk of coronary artery disease. These results dovetailed with another nice paper published in the same issue of the NEJM by a large academic collective.

In the Regeneron/Geisinger paper, researchers were able to use the genetic information in their integrated database to rapidly identify patients with a suspicious mutation, and use the EHR data to evaluate a range of parameters, including lipid levels and coronary artery disease status, in both patients with mutations as well as in appropriate controls. The Regeneron group also performed subsequent studies in several animal models to further substantiate the biological findings suggested by the human studies.

Not only do these findings point to a potential drug target, but the work represents just one of many similar studies that could be done with equal ease using the approach Geisinger and Regeneron have established. If the researchers want to look at a different gene, or a different condition, the basic process would be almost identical. Moreover, as the partnerships adds more and more patients (I’ve heard Regeneron founder and President George Yancopoulos say he is aiming for half a million) with associated EHR data and sequenced exomes, the power of such studies will only increase.

This approach also highlights the insights that might be achieved through integrative data efforts such as the President’s Precision Medicine Initiative, if executed in a similarly effective fashion.

The Geisinger/Regeneron collaboration is a brilliant vision for medical science and for drug discovery, and there are a number of key success factors that we shouldn’t take for granted.

First, on the Geisinger side, the foundational aspect of this entire effort is Geisinger’s trusted relationship with its patients, and Geisinger’s demonstrated commitment to treating patients as partners. Geisinger was at the leading edge of Open Notes (sharing physician notes with patients), for example.   Geisinger has put considerable thought into the process of patient consent, and also has ensured most patients who join the discovEHR cohort are recontactable.

Gesinger also was an early adopter of EHRs; consequently, Geisinger’s EHRs harbor unusually good longitudinal data, and often contain data from several generations of family members. Geisinger also systematically reviews and curates the EMR data used in clinical studies, to ensure adequate quality.

Regeneron, for it’s part, has a clear vision for the use of genetics in drug discovery, which in their hands seems to be a very deliberate, very dynamic process. Regeneron researchers aren’t randomly collecting information, stirring it in a pot, and asking a computer to sort it all out. To the contrary, they are pursuing an approach that seems generally hypothesis-oriented, evaluating either specific candidates genes and variants (as they did here), then looking at the phenotypes, or they are looking at specific phenotypes of interest, and asking whether there are particular genetic patterns to be found.

Two additional important elements of Regeneron’s strategy that may not be immediately obvious are: (a) the exceptional team of data scientists they’ve brought together to prosecute the analytics, and (b) their ability to quickly pressure-test suggestive results by rapidly creating both targeted antibodies and relevant mouse models – both of which were utilized in the work described in the recent NEJM paper.

Finally, of course, the success of this approach relies upon a powerful and secure, and intuitive platform – DNAnexus — where the data integration can occur, where distributed stakeholders can collaborate, and where a range of analyses can occur.

At DNAnexus, we feel privileged to contribute so foundationally to such great integrative science, and look forward to the next discovery – and to the one after that.

DNAnexus Made Ridiculously Simple

In medical school, perhaps the most indispensable texts were the “Ridiculously Simple” series – Clinical Anatomy Made Ridiculously Simple, Acid-Base Made Ridiculously Simple, etc. While you probably wouldn’t want to operate or dialyze based only on the knowledge in these short books, they nevertheless offered accessible overviews to complex and often intimidating topics.

In this spirit – and in response to questions from friends and family who regularly ask, “What does DNAnexus do” – I thought I might offer this short post.

What Is DNAnexus?
dnanexus genomics
DNAnexus is a platform – basically, a sophisticated software program – that makes it easier for users to do three things, each in a secure and compliant fashion:

  • Analyze large amounts of raw genetic data
  • Share and collaborate around large amounts of data (including but not limited to genetics)
  • Integrate genetic data with other types of data, such as data from electronic medical records or imaging data, to advance science and to improve clinical care

Let’s take these one at a time.

(1) Analysis Of Raw Sequencing Data
The basic idea here is that the machines that are used to read DNA sequence are incredibly powerful, but don’t generate a book of information that starts at the beginning of the first chromosome and concludes at the end of the last one. Rather, most sequencing machines spit out phrases of about 100 letters, phrases randomly located anywhere in the 3 billion letter book that is the human genome. A computer must figure out where each individual phrase fits in the book, and must also determine whether there are any typos. This can be a computationally intensive task, but DNAnexus provides a way to do this efficiently, by dividing the task into multiple parallel streams each of which can be tackled by a powerful computer.

The computers DNAnexus tends to use are run by Amazon (more precisely, by Amazon Web Services, or AWS), and our use of them is an example of what’s known as “cloud computing” because the computers operate from a massive, dedicated central facility, rather than from a user’s own institution. One advantage of using cloud computing is it’s very much “on demand” – i.e. you have essentially unlimited access to as many computers as you need, and you only pay for the computers that you actually use, and only when you are actually using them.

(2) Distributed Collaboration
Progress in both science and medicine can be accelerated when data can be easily shared. When there are large volumes of data, as is increasingly the case in research and clinical realms, this can be a real problem. Remarkably, the most common method of large-scale data sharing today is probably FedEx’ing hard drives between institutions. What DNAnexus enables is for a distributed team of researchers or clinicians to all have access to the same data at the same time; by bringing together the data, the experts, and the tools for analysis, DNAnexus facilitates collaboration and accelerates knowledge turns.

DNAnexus is ideally suited to power consortia, whether NIH investigators (as in the case with our work with CHARGE in the area of cardiovascular disease or our work with ENCODE in the area of genetic annotation), diagnostic companies (our work on precisionFDA), translational research partnerships (our work with Regeneron and Geisinger Health system), or a public/private partnership of cancer researchers (our work with ITOMIC led by University of Washington’s Tony Blau).

The ability to support distributed innovation also enables DNAnexus to provide global support for companies like Natera that send kits to sequencing labs worldwide, but collect and analyze the data centrally using DNAnexus.

(3) Integration With Other Data Types
The insights that may be available in genetic data are often revealed only when the information is considered and analyzed in the context of other data types, such as data from electronic health records (EHR) or imaging data (such as radiology images or pathology images). Integrating genetic and EHR data is fundamental to the drug discovery work of Regeneron, for example. In the same way our partners can easily access and efficiently utilize the fundamental tools of genetic analysis on our platform, so too can they access and utilize the tools required for integrating genetic data with other data types. DNAnexus is adding tools constantly, based on the needs expressed by our partners.

Looking Ahead
Guided by the visionary partners with whom we are privileged to work, DNAnexus continues to enhance our tools around each of these three areas: DNA analysis, distributed collaboration, and integration with other data types. We are constantly seeking opportunities to leverage the technology we’ve developed, as well as innovative leaders looking to bring the power of our platform to bear in original and impactful ways.