100% Cloud-based Genome Center Integrating Large Healthcare Data Flows

photo: The Cancer Genome Atlas
photo: The Cancer Genome Atlas

In a previous post, our new CMO, David Shaywitz, talked about his vision for DNAnexus and its role in helping fulfill the promise of genomic medicine:

“DNAnexus represents a natural home for these aspirations, offering a compelling, secure, cloud-based data management platform, an enabling tool for any healthcare organization – academic medical center, healthcare system, biopharma company, payor – who recognizes that getting a handle on large healthcare data flows is rapidly becoming table stakes, and that figuring out how to manage and leverage genomic data is a wise place to start.”

Fast-forward two months…  This week, we announced exciting progress in our efforts to accelerate genomic medicine.  The DNAnexus cloud-based genome informatics and data management platform is powering a number of collaborations between Regeneron Genetics Center (RGC) and its leading healthcare provider partners.

In a RGC press release, they announced these new collaborators, which include the Geisinger Health System, Columbia University Medical Center, Clinic for Special Children, and Baylor College of Medicine. The RGC will be using the DNAnexus platform to integrate sequencing data with de-identified clinical records from patient volunteers. To date, the RGC has sequenced samples from more than 10,000 individuals and is currently sequencing more than 50,000 samples per year.

The Geisinger collaboration, which has been described as the largest clinical sequencing project in the U.S., is on track to sequence more than 100,000 patient volunteer samples. This DNAnexus-powered initiative has resulted in the first 100% cloud-based biopharma genome center, and is now operating at scale.

Next-generation sequencing technologies, like Illumina’s HiSeq 2500 or X Ten platform, have reduced the cost and increased the speed of DNA sequencing outpacing Moore’s Law to the point where the new bottleneck is genome informatics. To address this issue, companies like Regeneron are adopting cloud-based solutions to handle the massive volume of sequencing data.

DNAnexus provides the technology backbone that enables the sharing and management of data and tools around large volumes of sequencing data between the RGC and its healthcare collaborators. Currently the RGC is processing more than 1,000 exomes per week and sharing the data easily and safely with their collaborators.

In order to improve patient care and ultimately human health, the integration of genomic and phenotypic data needs to happen on a massive scale (something David has recently discussed from the perspective of phenotype here and here). Combining large cohorts of deeply-phenotyped individuals with their genomic data offers a wide range of medical applications, the most obvious being a more personalized approach to medical interventions such as which therapy might work best for a given individual. These data can also be used to aid in the development of new companion diagnostics and clinical trial participant selection. As an article in GigaOM put it this week: Cloud Computing is Coming for Your DNA, and it Will Lead to Better Drugs and Health Care.

These collaborations are powerful examples of how the DNAnexus platform is enabling an integrated approach between biopharmaceutical companies and their partners to accelerate the research and discovery process. As David said, healthcare industry leaders who prioritize the management of large healthcare data flows will emerge as the pioneers who help us realize the full vision of precision medicine –delivery of the optimal therapy to the right patients at the right time – ideally before they are sick.

On A Day When Apple Sidesteps Healthcare Technology, Mary-Claire King Shows How To Confront It

Mary Claire KingThe most interesting healthcare news of this week was manifestly not Apple AAPL +3.06%’s new watch; I can only assume that the Cupertino-based company concluded, after meeting with the FDA and consulting with a range of experts, that it made far more sense to go down the path of nutritional supplements, and stay as far from regulators as possible — as many of their brethren here in the Valley have emphatically suggested.

No, the more substantive healthcare contribution of the week came from the latest issue of JAMA, where 2014 Lasker Award winner Mary-Claire Kingwriting with several colleagues from Israel, audaciously suggested that all adult women should be screened for defined categories of BRCA1 and BRCA2 mutations – specifically, on “unambiguously loss-of-function mutations with definitive effect on cancer risk.”

(Disclosure reminder: I work at a genomic data management company.)

Today, patients with a family history of breast or ovarian cancer may be referred for BRCA1 and BRCA2 testing, but King is suggesting something more: she wants every adult woman to receive this testing, based on recent research she and her colleagues have published suggesting that relying on family history may miss half of the families with relevant BRCA1 or BRCA2 mutations; these families without a known history of breast or ovarian cancer tend to be smaller, King says, but members carrying the mutations have roughly the same chances of getting cancer as carriers from families with an established history of the disease.

The questions to ask about screening are captured by the ACCE framework (which I recently highlighted in the context of data from wearables, but which was originally developed for genetic testing).

Analytic validity – do the tests reliably and consistently measure the mutations they say they measure?

Clinical validity – how well does a positive test predict the likelihood of a cancer due to BRCA gene dysfunction?  To what extent can a negative test be relied on to conclude that a patient is not at elevated risk of cancer due to BRCA gene mutation?

Clinical utility – does a positive test provide actionable information?  King writes that “Among women who carry mutations in BRCA1 or BRCA2, surgical intervention, in particular risk-reducing salpingo-oophorectomy, reduces risk of both ovarian and breast cancer and reduces overall mortality.”

Ethical, legal, and social implications: What are the implications of population-level screening?  For example, might a negative screening test provide a false sense of security, resulting in reduced vigilance, and an ultimately an increase in non-BRCA-related breast cancers?

Is population-based testing for BRCA1 and BRCA2 mutations warranted?  The New York Times discussed this with King:

“Critics may object that ‘women aren’t ready for this,’ [King] said. But she argued: ‘Why should women be protected from information that will empower them and allow them to control their lives? We don’t need that kind of protection.’”

This is really the essential challenge of the rapidly-growing field of genomic testing, and the question King is pressing all of us to contemplate: when are the data good enough to share with patients?

Set the bar too high, and it raises the ugly specter of paternalism (as King suggests), as well as the very real concern that regulators, with the best of intentions, may let the perfect be the enemy of the good, and make it more difficult (and more expensive) for patients to access important information that could impact their lives.

However, share too early (before analytic validity is established, for example), and you risk providing bad data to patients that could result in devastating, life-changing decisions; this is the logic behind the FDA’s drive to regulate high-risk laboratory developed tests, for example (nicely discussed on this Mendelspod podcast).

Similarly, if you share data you don’t understand (which is a fair characterization of many mutations that are found during genetic screening), you risk scaring patients without helping them.  King, according to the Times, feels “ women should not be told about other rare mutations whose significance is unknown.”  (Others feel even these data should be shared.)

As the molecular basis of cancer and other diseases becomes increasingly well-understood, and additional risk factors are identified and characterized, more and more genes are likely to enter the BRCA1 and BRCA2 category, and merit serious consideration for population screening.

Moreover, as the cost of sequencing plummets, and the amount of actionable data increases, we may start to ask (as some already have) whether it makes sense to offer newborn infants not a handful of biochemical tests, as we do today, but rather genetic screening – perhaps even sequencing of their complete genomes.

In an era where many parents already bank cord blood (as my wife and I did), based on the slight chance that it might be useful one day, is it such a stretch to imagine parents might want to obtain the complete genomic sequence of their kids, in hopes that over time, and with ever-increasing annotation, it might prove at least as beneficial as cord blood?  Such testing would raise a host of thorny issues, as a group from McGill University discussed thoughtfully in Science Translational Medicine earlier this year.

In contemplating the astonishing complexity around bleeding edge medical technologies, including the very real operational challenges, and the attendant ethical issues that are appropriately raised, you can certainly appreciate why an incumbent consumer electronics company might elect to steer clear of controversy, and opt instead for a watch that occasionally reminds you to stand up.

The First Publicly Available “$1000 Genome” Test Dataset!

At DNAnexus we’re always looking for ways to collaborate on projects that are outside the norm, and this latest collaboration is no exception. We’ve teamed up with the Garvan Institute and AllSeq to offer the genomics community open access to the first publicly available datasets generated using Illumina’s HiSeq X Ten sequencing system.

$1000 Genome X Ten

 

Why are we doing this?

Our goal is to provide sample data that will give scientists a glimpse into what to expect from the technological advances of the HiSeq X Ten. Has the new sequencing technology lived up to Illumina’s promise?

 

Here’s what went down

AllSeq arranged this data-sharing endeavor as a part of its Sequencing Marketplace effort, which aims to educate scientists about different sequencing technologies and match them with providers that offer these technologies.

The Garvan Institute, located in Sydney, Australia, was one of the first  three organizations in the world to acquire the Illumina HiSeq X Ten sequencer. In an effort to educate the genomics community about the potential of this exciting new technology, they made available two whole-genome sequencing data sets, using the popular Coriell Cell Repository NA12878 reference sample, which has been extensively analyzed by the Genome in a Bottle Consortium.

Thanks to the Garvan, visitors have access to two different, high quality data sets (NA12878D and NA12878J), each of which was sequenced on a single lane of an Illumina HiSeq X patterned flow cell, achieving over 120 Gb of yield, with >87% bases with quality > Q30 in just 2.8 days. Each dataset meets the minimum coverage and quality guaranteed by Illumina and is indicative of the potential for the Illumina HiSeq X Ten sequencing system.

DNAnexus stepped in to sponsor the data storage and the bandwidth for downloading the data. In addition, DNAnexus ran analyses on the two genomes to produce metrics providing a benchmark for the scientific community by which to gauge results from the “$1000 genome”.

Visitors can gain access to view and download the data without a DNAnexus account via the AllSeq webpage, which takes you to the original FASTQ files, as well as analysis results (BAM and VCF files), and quality metrics calculated using off the shelf tools like FastQC and Picard (MarkDuplicates, CollectInsertSizeMetrics, and CollectWgsMetrics). We’ve also provided a web-based genome browser to visualize one data set (NA12878D). You can access and download all of this data until September 30, 2014.

Those with DNAnexus accounts can also access the data via the “HiSeq X Ten Data” featured project, located on the left hand side of the dashboard. Users are able to copy any of the files to their own DNAnexus projects for further downstream analysis.

We’d love to hear from you! Tell us what you think about the HiSeq X Ten data: info@dnanexus.com.