Once in a Blue Moon Competition: precisionFDA Truth Challenge

The FDA, the Global Alliance for Genomics and Health (GA4GH)  and National Institute for Standards and Technology (NIST) recently teamed up to create a once-in-a-blue-moon challenge for genomic scientists! Dubbed the precisionFDA Truth Challenge, genomic innovators were invited to test their informatics pipelines on two datasets, the well-characterized Genome in a Bottle’s (GiaB) NA12878 (HG001) reference sample and a new reference sample HG002, of which the results were unknown.


PrecisionFDA is an online, cloud-based, virtual research space where members of the genomics community can experiment, share data and tools, collaborate, and define standards for evaluating analytical pipelines. Community members span academia, industry, healthcare organizations and government.  All of these organizations are working together to further innovation and develop regulatory science around NGS tests. So far, the community currently includes more than 1,500 users across 600 organizations, with more than 10 terabytes of genetic data stored.

This is the second challenge issued through precisionFDA, following the precisionFDA Consistency Challenge.   The Truth Challenge is about discovering the consistency and accuracy of informatics pipelines when analyzing a human sample whose truth data is unknown. NIST and GiaB released the truth data May 26, 2016, after the close of the challenge.

What makes this challenge so exciting?

NIST released NA12878 in 2014, the first gold standard whole human reference genome, in collaboration with GiaB and the FDA. Since then,  it has arguably become one of the most studied biospecimens. Researchers from around the world use NA12878 as training data for assessing pipeline performance.

Since many pipelines use some sort of machine learning algorithm when trying to determine whether a reported mutation is real or not,  the difficulty that arises is ensuring a pipeline doesn’t overfit the training data. Pipelines can ultimately be tuned, in order to maximize performance on the training dataset, and if the test data happens to be similar to the training data the pipeline’s performance would be abnormally consistent and accurate. A great resource in understanding why scientists split data into train and test roles in order to assess the accuracy, reliability, and credibility of their predictive models (the algorithm that goes into a pipeline) can be found here.

In order to test performance of pipelines in real-life, scientists needed a second reference sample and associated truth callset of which NGS pipelines have not been trained on. This is exactly what NIST and GiaB have provided in reference sample HG002.

Scientists can now evaluate algorithms using test data that is separate from the training data, an attribute  that is broadly accepted as fundamental to the evaluation methodology. Moreover, unlike NA12878, the new reference sample HG002 is male, which poses new challenges to algorithms since there is only one copy of the X chromosome, and brings new opportunity for evaluating NGS methods along this dimension.

The winners

As the clock struck midnight EST on May 25, 2016, the precisionFDA Truth Challenge closed with 36 entries across 21 teams, spanning 5 countries;  truly an international competition of epic proportions!

The winners of the Truth Challenge will be announced at the upcoming Festival of Genomics in Boston on June 29th at 8:45am EST by Elizabeth Mansfield, PhD, Deputy Director for Personalized Medicine in FDA’s Center for Devices and Radiological Health’s Office of In Vitro Diagnostics and Radiological Health. Registration is free. Want to learn more about precisionFDA?  Stop by the DNAnexus booth (# 240)  during the Festival to receive a demo of the precisionFDA platform from a member of the precisionFDA team.

Want to recreate the Truth Challenge for yourself? Join the precisionFDA community today and evaluate a pipeline of your choice against HG002. Happy testing!

Cloud-based Genomics at the White House

The Launch of precisionFDA Consistency Challenge

Last week the White House held a Precision Medicine Initiative (PMI) Summit, where government agencies discussed their progress on genomics-based personalized care, real people shared their success stories with precision medicine, and President Obama himself reiterated his vision for precision medicine. It was thrilling for DNAnexus to hear precisionFDA, the “online, cloud-based portal” that DNAnexus built for the FDA, based upon FDA requirements, specifically acknowledged (29:45-30:33). You can watch full coverage of the PMI Summit here.


It is no surprise that we have finally reached the era where cloud-based genomics is making it into White House announcements. The announcement also included the launch of the first precisionFDA consistency challenge, calling on members of the genomics community to assess their bioinformatics software on supplied reference human datasets. Called the precisionFDA Consistency Challenge, the goal is to engage genomics innovators to improve reproducibility and accuracy of next-generation sequencing (NGS) pipelines in order to achieve more consistent results from genetic tests and advance precision medicine.

Currently a single NGS test can identify genetic variants ranging from thousands to in the millions. The results, in some instances, are already being used to diagnose and treat disease. While NGS tests are currently used in clinical applications in oncology, non-invasive prenatal testing (NIPT), and rare disease, there is still room to improve consistency as they become more broadly adopted in clinical practice. A better understanding of accuracy and reliability of the results for specific NGS tests will help us get closer to personalized treatments and improve patient care.

Multiple sequencing technologies (Illumina XTen, Illumina HiSeq, PacBio RSII) can be used for human whole genome sequencing, and each has their own accuracy or reproducibility error profiles for different parts of the genome. The means to establish methodologies to assess the accuracy of a technology for specific variant types or regions across the genome would help advance evaluation of novel NGS-based tests for clinical applications. Unfortunately, there are still inconsistencies where an NGS-based test can report differing results. However, by establishing appropriate standards, inter-test variation can be minimized, allowing patients and physicians to place greater confidence in test results and the resulting treatments.

The FDA has delivered a new approach, precisionFDA, to help establish standards around secondary analysis – the process of mapping, aligning, and calling the variants of DNA sequence data. To jumpstart the engagement and improve techniques on the precisionFDA platform, the FDA’s first challenge is focused on consistency.

The process of human WGS pipeline development and validation typically relies on mapping the sequenced reads to a well-known reference genome, then identifying the differences between the results and the reference dataset. Participants who join the challenge are asked to download two FDA-provided datasets (one contributed by the Garvan Institute of Medicine, and one by Human Longevity Inc. — both corresponding to sequencing of the well-characterized NA12878 sample), process it through their pipelines, upload results back to precisionFDA, and compare them to other files. The challenge provides a common frame of reference for measuring some of the aspects of reproducibility and accuracy of the participant’s pipeline.

precisionFDA Challenge

The challenge is open to all innovators in the field of human genomics. If you’re not already a member of precisionFDA, you will need to request access to get started. You have until April 25th to submit your software assessments.

Results will be ranked on the precisionFDA website for achievements in eight categories. See the Determining Winners section on the challenge webpage for full details. In addition to exclusive bragging rights, your results, comparisons, and methods will be featured on the precisionFDA platform highlighting your technical contributions.

DNAnexus is incredibly excited about this challenge – the idea of genomics innovators working together to advance quality standards is something that gets us fired up. As of this blog post, precisionFDA hosts more than 1000 community members on the platform representing nearly 500 organizations. We are proud to support this novel community-contribution model for evaluating bioinformatics pipelines to help address the challenges of precision medicine.

* Winning a precisionFDA category is an acknowledgement by the precisionFDA community and does not imply FDA endorsement of any organization, tool, software, etc.

The Future of Precision Medicine

Collaboration, Integration, Participationprecision medicine initiative

Today the White House hosted the Precision Medicine Initiative (PMI) Summit, celebrating the first year since the President’s announcement of the Precision Medicine Initiative. The entire team at DNAnexus shares President Obama’s sense of excitement as we reflect upon what’s been accomplished; even as we roll our up sleeves and get busy doing the hard work the lies ahead. John Holdren, Director of the White House Office of Science and Technology kicked off the event announcing the progress of a few remarkable initiatives 1) the NIH advancing cancer clinical trials through the development of large research cohorts, 2) the Million Veteran Program which has enrolled 150,000 vets to date and is now open to active duty women and men, and 3) the precisionFDA community platform for NGS assay evaluation and regulatory science, which just launched its first “Consistency Challenge”.

There were a number of inspirational stories from patients and families recalling their own struggle with disease and how precision medicine aided in diagnosis and treatment. It is so rewarding to see real-life examples of how genomic sequencing is being used to diagnose genetic disorders in the clinic and advance treatment and, ultimately, help patients lead healthy and happy lives.

What resonated most for us from the White House PMI panel was the President’s remarks on the healthcare system. Although it is a system labeled for “health care”, it is actually a “disease care” system. The U.S. healthcare system has been designed based on patient passivity; patients wait until they are sick and then it’s the doctor’s job to treat the disease/condition. Instead, we need to transform the healthcare industry to play a more active role in health and get health information into the hands of consumers. This will allow patients to remain healthy and keep disease from manifesting in the first place.

In order to make the President’s PMI a reality, we need to make anonymized patient data available to researchers and to merge information from different studies in order to advance medical research. We believe in a secure and unified platform; one that connects thousands of scientists around the world. True scientific breakthroughs are possible when researchers are able to collaborate openly, securely, and transparently around petabyte-sized datasets.

It’s been a remarkable twelve months for DNAnexus. , We’ve had the opportunity to contribute to a range of efforts expected to advance the science of precision medicine and accelerate its translation into tangible clinical benefits. Examples include:

Looking ahead, we believe that the realization of precision medicine’s promise requires the intelligent integration of genetic data with electronic health record data and a range of other data types. Successfully doing this requires the technology to effortlessly collaborate around large volumes of data in a secure and compliant fashion – which DNAnexus provides. But meaningful progress also requires the commitment to share relevant data, which we believe is developing, especially as (a) stakeholders increasingly recognize the scientific advantages of collaboration as opposed to a history of secrecy and data silos; (b) stakeholders begin to appreciate the ease and security of cloud computing in this process; (c) patients appropriately demand and belatedly receive more ownership of their own (damn!) data, and can easily elect to contribute that data to scientific research.

We’ve been privileged to be so deeply involved in the mission of precision medicine, and are excited by the opportunities to drive this work forward. As a business, and as a U.S. business, we are excited to see President Obama’s vision for precision medicine and improved health care align with our own.