precisionFDA: Why It Matters

Screen Shot 2015-12-15 at 9.32.57 AMI hadn’t intended to write about precisionFDA going live – this post by Dr. Taha Kass-Hout and Elaine Johanson of the FDA provides a terrific summary, and this post by Angela Anderson of DNAnexus offers valuable additional context. However, I found myself today so excited by this project and what it represents that I can’t resist offering a few additional thoughts about what makes this initiative so special.

First, it addresses an important problem in the field: the analytic validity of NGS tests. The ready availability of relatively inexpensive sequencing has enabled us to contemplate diagnostic sequencing at a scale that would have been difficult to imagine even a decade ago. At the same time, the drive to apply sequencing in different clinical contexts raises a critically important question: do I trust this test? A key starting point for clinical interpretation of DNA data is to agree on the sequence itself. If your procedure and analysis reports that a particular sequence in a DNA sample is “GATCGATC” and my procedure and analysis of the same DNA says the sequence is “GATTGATC,” then we’ve got a problem. precisionFDA will allow users to compare approaches, figure out what’s working, and determine where refinements might be needed.

Second, precisionFDA represents a novel and forward-thinking approach to regulation. Rather than envisioning governmental regulators as the folks who will define and then impose a specific set of performance standards, precisionFDA instead sees the government as providing the platform that will enable the NGS community to evolve the standards on their own — organically and transparently.

Finally, the ability to design, refine, and deploy this platform in such a rapid and agile fashion reflects in part the value of well-conceptualized public-private partnerships, in this case between the FDA and DNAnexus. By intentionally leveraging the skills and capabilities of a company like ours, the FDA was able to implement and realize their exciting and ambitious vision.

The ultimate success of the precisionFDA platform will of course depend upon how well it serves the community it is intended to support. However, it’s hard to think of a more auspicious beginning, and my hope would be that success here will encourage more leaders to evaluate the potential of public/private partnerships to deploy platforms that leverage the power of a distributed innovation community to address important shared challenges.

FDA Advancing Innovation Through Deep Collaboration

Today, the FDA announced the beta release of precisionFDA, a community platform for NGS assay evaluation and regulatory science exploration. They are now accepting applications to the precisionFDA community, you can learn more and request access here.

Over the years, genetic testing has become increasingly useful in the diagnosis and treatment of disease in a few areas, including cancer, birth defects and rare diseases. However, for the majority of the population, precision medicine is still far from becoming a reality. FDA’s Center for Devices and Radiological Health and the agency’s Chief Health Informatics Officer, Dr. Taha Kass-Hout, embarked on a bold, new initiative for realizing precision medicine, by establishing precisionFDA. You can read more about their endeavor here.

Currently, most diagnostic tests follow a “one test-one disease paradigm” for evaluating analytical performance. However, diagnostic tests employing next-generation sequencing (NGS) technology can scan up to the entire genome, producing a massive amount of data and are capable of potentially detecting multiple conditions in a single test. The FDA realized that due to the advances in NGS-based technology, a new approach would be required for evaluating the accuracy of a test.

PrecisionFDA was established to help advance the regulatory science needed to assess the accuracy of genome tests and software. By providing a secure cloud-based platform that is open and transparent to the genomics community, researchers and test developers can explore NGS methodologies in order to spur innovation needed to develop necessary standards.

PrecisionFDA is a research sandbox that provides the genomics community with a web portal where they can experiment, share data and tools, collaborate, and define standards for evaluating analytical pipelines. Requirements for the precisionFDA platform were based on suggestions received through a public forum as well as use cases the FDA has gathered throughout the years.

Here are some key features of precisionFDA:

  • FILES – Upload your own files on cloud storage or generate files through running apps. You can publish reference data or any other files, or browse other members’ contributions.
  • APPS – Run mapping & variation calling pipelines or other Linux-based software apps on the cloud. Contribute your own software and scripts and let others explore them.
  • COMPARISONS – Quantify the similarity between two sets of genomic variants (VCF files). Compare your own test set (for a given biospecimen) to establish benchmark sets.
  • NOTES – Write and publish rich notes describing your work. Attach any files, comparisons or apps to your notes. Read what others are reporting and reproduce their workflows.

The concept of comparing two sets of variants (VCF files) is central to the exploration of regulatory science, and to the evaluation of NGS assays. The problem of comparing VCF files constitutes an active area of research. The precisionFDA building crew is represented in the Global Alliance for Genomics and Health (GA4GH) Benchmarking Task Team, which is expected (within the next year) to provide recommendations and/or software solutions for comparing VCFs and for counting, classifying, and reporting results. In the meantime, precisionFDA offers an initial VCF comparison framework, put together in consultation with NIST.

Check out the precisionFDA documentation for some great ideas for using comparisons, including assessing reproducibility and accuracy of NGS tests and bioinformatics variation calling pipelines.

PrecisionFDA follows a robust, audited set of policies, processes, and controls for security and compliance. When your data is in your private area, it is indeed private. It’s not visible to the FDA, members of the precisionFDA community, or any other entity. The platform provides users with access controls for their artifacts (files, apps, jobs, app assets, comparisons, and notes), so that they can either remain private, or published to the precisionFDA community.

Lastly, precisionFDA would be nothing without the support and engagement from its community members. As of today, early adopters have already contributed many valuable tools and reference datasets to the platform, and there are many more in the works! Here is a preview of what you can find on precisionFDA today:

  • NA12878 benchmark calls made by NIST (Genome in a Bottle v2.19)
  • NA12878 benchmark calls made by Illumina (Platinum Genome v8.0.1 and an updated kmer-filtered v7.1.0)
  • HuRef (J. Craig Venter) benchmark calls made by Roche/Bina
  • NA12878 exome test calls made by the Broad Institute
  • NA12878 whole-genome sequencing and test calls made by the Garvan Institute (using the Illumina HiSeq X Ten)
  • Software assets and apps for simulation and evaluation using VarSim, added by Roche
  • An app for local ancestry analysis with RFMIX, added by Stanford

Additional early members of the precisionFDA community:

  • 23andMe
  • Baylor College of Medicine
  • Counsyl
  • Emory Genetics Laboratory
  • GeneDX
  • Human Longevity Institute
  • Intel
  • Natera
  • Personalis
  • SeraCare

Above everything else, precisionFDA is a community, where people can collaborate, communicate, and even argue for the future of precision medicine. We are privileged to have been selected as the contractor for this pilot, and look forward to our collaboration with the FDA as the platform and community evolves. At this time, the precisionFDA platform includes features such as App Forking, Item Tracking, and Notes, which ignite collaboration, content expansion, and workflow validation and reproduction.

The Notes section, in particular, lets participants write and publish rich notes describing their thoughts and their work; for example, they can discuss how they used files, comparisons, and apps—which they can also attach to the note—to prove a certain point or to document a procedure. Community members can read what others have reported and access their attachments to take a closer look at that work or even reproduce it on their own.

We believe this new level of collaboration and reporting, together with everything else that precisionFDA has to offer, will define new frontiers for people to showcase to the FDA and to the rest of the community how to address the challenges of precision medicine in the 21st century.

Supporting Freebayes, to Serve Our Customers and the Community

Freebayes is a variant calling tool for short-read sequencing by Erik Garrison, Gabor Marth, and others, which played a significant role in the 1000 Genomes Project. It’s widely appreciated for its quality results, cost-effective performance, and permissive open-source license. At DNAnexus, many of our customers have come to rely on it in their sequencing pipelines. But, like many software tools in genome informatics, its development might have stopped at the conclusion of its (hugely successful) sponsor project.

We listened to our customers, and heard clearly that freebayes is too valuable to let that happen. A few months ago, we began working with Erik on a roadmap for ongoing development and maintenance with our support. Through our collaboration, Erik recently delivered a capability to generate gVCF output files, a significant feature both for individual genome interpretation and for aggregate analysis of vast cohorts. We’re continuing to refine that feature, and we have many more queued up to ensure freebayes remains a tool of choice for both research and clinical sequencing pipelines.


Importantly, freebayes and our collective contributions to it will remain free for all to use and build upon, under the MIT license. Furthermore, best efforts will be made to assist all its users through public forums. We’d love to hear about your use cases and ideas to further improve freebayes – reach us on GitHub or Gitter, or send us a tweet. Erik remains in his day job, realizing a totally new paradigm in genome informatics, and we’re delighted he can also work with us to make freebayes endure as a tool the community can count on. So to all the genome hackers out there: please hack on freebayes too! You can read more on ‘How to Freebayes’ on Erik’s blog.

Because no single tool can possibly serve all applications, DNAnexus continues to work with numerous collaborators toward advancing methods in genome informatics, both free and commercially licensed. We also continue to wholeheartedly support our customers’ choice of methodologies to deploy on our platform, whether sourced from our partner network or elsewhere. We’re delighted by this opportunity to both deliver value to our customers and give back to the broader community. To the genome hackers again: we’re on the lookout for more of these opportunities! (We’re hiring, too!)