Supporting Freebayes, to Serve Our Customers and the Community

Freebayes is a variant calling tool for short-read sequencing by Erik Garrison, Gabor Marth, and others, which played a significant role in the 1000 Genomes Project. It’s widely appreciated for its quality results, cost-effective performance, and permissive open-source license. At DNAnexus, many of our customers have come to rely on it in their sequencing pipelines. But, like many software tools in genome informatics, its development might have stopped at the conclusion of its (hugely successful) sponsor project.

We listened to our customers, and heard clearly that freebayes is too valuable to let that happen. A few months ago, we began working with Erik on a roadmap for ongoing development and maintenance with our support. Through our collaboration, Erik recently delivered a capability to generate gVCF output files, a significant feature both for individual genome interpretation and for aggregate analysis of vast cohorts. We’re continuing to refine that feature, and we have many more queued up to ensure freebayes remains a tool of choice for both research and clinical sequencing pipelines.


Importantly, freebayes and our collective contributions to it will remain free for all to use and build upon, under the MIT license. Furthermore, best efforts will be made to assist all its users through public forums. We’d love to hear about your use cases and ideas to further improve freebayes – reach us on GitHub or Gitter, or send us a tweet. Erik remains in his day job, realizing a totally new paradigm in genome informatics, and we’re delighted he can also work with us to make freebayes endure as a tool the community can count on. So to all the genome hackers out there: please hack on freebayes too! You can read more on ‘How to Freebayes’ on Erik’s blog.

Because no single tool can possibly serve all applications, DNAnexus continues to work with numerous collaborators toward advancing methods in genome informatics, both free and commercially licensed. We also continue to wholeheartedly support our customers’ choice of methodologies to deploy on our platform, whether sourced from our partner network or elsewhere. We’re delighted by this opportunity to both deliver value to our customers and give back to the broader community. To the genome hackers again: we’re on the lookout for more of these opportunities! (We’re hiring, too!)

Run the Mercury Variant-Calling Pipeline on Your Own Data

HGSC Baylor College of MedicineMercury, designed by the Human Genome Sequencing Center at Baylor College of Medicine (HGSC), is used as the core variant-calling pipeline for the CHARGE consortium. The Mercury pipeline is a semi-automated and modular set of tools for the analysis of NGS data in clinically focused studies. HGSC designed the pipeline to identify mutations from genomic data, setting the stage for determining the significance of these mutations as a cause of serious disease.

Thanks to HGSC’s work with us, the Mercury pipeline is now freely available to any DNAnexus user. The Mercury pipeline is located in the applets folder of the  HGSC_Mercury project. You can find the project, along with everything you need to run the applet, under the ‘Featured Projects’ section on your home page.  Login to DNAnexus or create an account today to get started immediately.

Inside the Mercury Project

  • Both whole genome and exome samples
  • All annotation and reference data required
  • Pre–configured workflow (just drag & drop your inputs)

Results from the Mercury pipeline will be made up of a set of annotated variants from your data sample. You’ll also see all of the biologically significant data that applies to the variants from the Baylor College of Medicine database, using their Cassandra annotation tool. You can easily visualize the mappings and variant calls within our integrated genome browser.

New Single-Cell Genomic Studies Demonstrate Utility of SPAdes Assembler

spades de novo assemblerThis summer we saw some new publications underscoring the need for a high-quality assembler for single-cell genomic sequencing projects — particularly in clinical settings.

Two papers demonstrate this well, and both use the assembler SPAdes to perform needed assemblies. (SPAdes, which can be used for both standard isolates and for single-cell MDA bacterial assemblies, is available as an app through the DNAnexus platform.)

“Candidate phylum TM6 genome recovered from a hospital sink biofilm provides genomic insights into this uncultivated phylum” came out in PNAS in June, and “Genome of the pathogen Porphyromonas gingivalis recovered from a biofilm in a hospital sink using a high-throughput single-cell genomics platform” was published in Genome Research in May. Both papers come from the J. Craig Venter Institute and highlight the critical need for single-cell genomics to characterize organisms that cannot be cultured with traditional methods.

“Single-cell genomics is becoming an accepted method to capture novel genomes, primarily in the marine and soil environments,” the scientists write in Genome Research. “Here we show for the first time that it also enables comparative genomic analysis of strain variation in a pathogen captured from complex biofilm samples in a healthcare facility.”

One of the key limitations to performing single-cell genomics has been that most assemblers are not optimized to handle this type of data. Lack of uniformity in read coverage and increased numbers of chimeric reads and sequencing errors are common problems in single-cell work.

SPAdes, developed by researchers at the St. Petersburg Academic University Algorithmic Biology Laboratory in collaboration with Pavel Pevzner at the University of California, San Diego, fills this niche. The assembly tool, which was recognized as a top performing assembler in the GAGE-B Evaluation, generates single-cell assemblies, providing far more information about microbial genomes from metagenomic studies than traditional assemblers. SPAdes can be used with standard isolates as well as single-cell bacteria assemblies.

SPAdes has been ported to DNAnexus and is available as an app to any user of the new platform. Input for the app is a set of reads in FASTQ format. In SPAdes 2.5, the user can specify multiple libraries, which all will be used for repeat resolution and gap closing. SPAdes does not yet have a scaffolder, so in the case of mate pair sequence data, using an external scaffolder is recommended. You can check out the app by logging in to DNAnexus and searching the app library for SPAdes.