Announcing the Winners of Mosaic Microbiome Community Challenge: Strains #1

The application of next-generation sequencing in the study of microbial communities has fueled the rapid growth of interest in microbiome research. However, difficulties with the accuracy of computational analyses of these complex datasets have limited the translation of microbiome science into novel biotherapeutic products. In order to unleash the potential that metagenomics holds for human health, computational methods to identify unique microbial strains must be improved.

The Mosaic Community Challenge: Strains #1, sponsored by the Janssen Research & Development, LLC, through the Janssen Human Microbiome Institute, aims to benchmark and improve the performance of computational tools in analyzing these data, in order to provide better quality profiling of microbiome samples at high resolution. The challenge gave participants the opportunity to validate their bioinformatics tools in realtime on a neutral, unbiased platform, and see how they performed against other industry tools.

Participants of the challenge worked with datasets that were composed of four different sample types: a metagenomics dataset generated from real mouse fecal samples (of known bacterial composition), and three simulated datasets of varying complexity. Besides the challenge dataset, a distinct training dataset, which included the truth files, was provided to enable participants to train and improve their methods. Participants were then able to conduct analysis by either creating their own app on the Mosaic Platform, or by downloading the dataset and running their method in their own system. Over the four-month course of the challenge, participants could take advantage of a “Testing Ground” to get immediate feedback on their work with training datasets before submitting their final challenge entries.

Challenge Winners & Their Methods

We would like to congratulate the winners as well as thank all who participated for helping to take microbiome science to the next level.


CosmosID, a bioinformatics and NGS service laboratory, scored highest in the Profiling part of the challenge. The CosmosID analysis pipeline achieved the highest cumulative F1-score, which is a measure of precision and recall. According to Nur Hasan, Chief Science Officer at CosmosID, the strength of their approach lies with the manually curated database, whose structure follows the phylogenetic hierarchy of all represented microorganisms which enables reliable microbial identification at all taxonomic levels, down to strain-level.

CosmosID’s submission scored the highest in the analysis of the Biological Sample (80%), which was 64% higher than the score of the second submission (48.9%). Interestingly, however, submissions based on the popular Metaphlan tool, performed better across the simulated datasets. The observation that the performance of tools vary based on the source of the sequencing data highlights the importance of benchmarking the tools on both biological and simulated datasets.

Figure 1. Precision/Recall Curve for the winning submission for each of the challenge datasets (to view this chart visit the submissions page on Mosaic).

To interactively compare the Profiling submissions and view Precision Recall Curves, visit the Strains #1 Profiling comparison page. 


Rayan Chikhi, PhD, Computer Scientist at the French National Center for Scientific Research (CNRS) and CRIStAL research center, and an advisor at Clarity Genomics, scored highest in the Assembly part of the challenge by using the Minia assembler to assemble the metagenomic data provided for the challenge. The assembly portion was judged on the total number of aligned bases divided by the reference genome size (Genome Fraction). The winning submission scored well across all other metrics reported in the leaderboard, namely Misassemblies and Mismatches.

Figure 2. Genome fraction scores across 13 biological sample reference strains 

Honorable mentions go to two other participants. Peter McCaffrey came a close second with his DeepBiome submission, while his submitted assemblies were longer than the winning submissions. Additionally, the submissions from Sergey Nurk (Metaspades assembler) had consistently the largest contigs.

To make your own comparisons between the submissions and dive in deeper in the rich comparison data available, visit the Strains #1 Assembly comparison page. 

Learn about the winners’ methods during our webinar confirmed for Tuesday, June 26th at 10am PT (1pm ET).

Want More Ways to Participate in the Mosaic Microbiome Community?

Learn more and get involved at

Visit Us at Microbiome Drug Development Summit!  

DNAnexus will present Translation of Microbiome Research into Clinical Applications, this Friday, June 22nd at 12pm at the Microbiome Drug Development Summit in Boston. Join our talk, and stop by our exhibition table to learn more about DNAnexus microbiome capabilities, and the Mosaic Community Platform & Challenges. Email us to schedule a meeting in advance.

Translation of Microbiome Research into Clinical Applications

  • Crowdsourcing the advancement of microbiome research with the Mosaic Community platform and challenges
  • Considerations for incorporating microbiome data into clinical trials
  • Complying with GLP, 21 CFR Part 11, and more


   Omar Serang, Chief Cloud Officer, DNAnexus

  Michalis Hadjithomas, PhD, Microbiome Lead, DNAnexus

PrecisionFDA Receives FDA Commissioner’s Award for Outstanding Achievement

Today, the precisionFDA Next Generation Sequencing (NGS) Team received the FDA Commissioner’s Special Citation Award for Outstanding Achievement and Collaboration in the development of the precisionFDA platform promoting innovative regulatory science research to modernize regulation of NGS-based genomic tests. This award recognizes superior achievement of the Agency’s mission through teamwork, partnership, shared responsibility, and fostering collaboration to achieve the FDA goals.


PrecisionFDA is an online, cloud-based, virtual research space where members of the genomics community can experiment, share data and tools, collaborate, and define standards for evaluating and validating analytical pipelines. This open-source community platform, which has become a global reference standard for variant comparison, includes members from academia, industry, healthcare, and government, all working together to further innovation and develop regulatory standards for NGS-based drugs and devices. Launched in December 2015, the precisionFDA community includes nearly 5,000 users across 1,200 organizations, with more than 38 terabytes of genomic data stored.

To date, the precisionFDA NGS Team has engaged the genomics community through a series of community challenges:

  • The Consistency Challenge (Feb-Apr 2016): Invited participants to manipulate datasets with their software pipelines and conduct performance comparisons.
  • The Truth Challenge (Apr-May 2016): Gave participants the unique opportunity to test their NGS pipelines on an uncharacterized sample (HG002) and publish results for subsequent evaluation against a newly-revealed ‘truth’ dataset.
  • App-a-thon in a Box (Aug-Dec 2016): Invited the community to contribute NGS software to the precisionFDA app library, enabling the community to explore new tools.
  • Hidden Treasures Competition (Jul-Sep 2017): Participants beta-tested the in-silico analyses of NGS datasets for the purpose of determining the reliability and accuracy of different NGS tests.
  • CFSAN Pathogen Detection Challenge (Feb-Apr 2018): Participants helped to improve bioinformatics pipelines for detecting pathogens in samples sequenced using metagenomics.

We are thrilled that precisionFDA has been recognized for its efforts in fostering shared responsibility for the evaluation and validation of analytical pipelines. PrecisionFDA’s proven success has driven other scientific communities such as St. Jude Cloud to promote pediatric cancer research, and the Mosaic microbiome platform for advancing microbial strains analysis, to establish their own collaborative ecosystem for members to contribute and innovate. DNAnexus is proud to be the platform that powers precisionFDA and other community portals to advance scientific research through a secure and collaborative online environment.

To learn more about DNAnexus community portals please visit:

SMRT Leiden Assembly Grant

Submit your unique plant or animal genome proposal for a chance to win free de novo assembly services on PacBio SMRT Sequencing data. See details below.

We are excited to participate in our partner, PacBio’s, annual SMRT Leiden Conference in Leiden, Netherlands from June 12th – 14th. This back to back conference will include the SMRT Scientific Symposium on June 12th & 13th, featuring presentations from key experts and opinion leaders sharing their scientific discoveries and latest achievements from a variety of fields. The SMRT Informatics Developers Conference will follow on June 14th, focused on developing and improving analysis tools for PacBio SMRT Sequencing data. Software developers and bioinformaticians will spend the day focused on advancing new and existing tools for de novo assembly, genome phasing, structural variation, base modifications and Iso-Seq analysis.

During the SMRT Informatics Developers Conference on June 14th, DNAnexus will be presenting “Evaluating haplotype phasing from FALCON Unzip” at 10:30am in the session titled “DE NOVO ASSEMBLY.” In this talk, we evaluate the performance of FALCON Unzip in forming phased haplotypes by assembling and phasing the genomes of an artificial human.  By examining SNP’s that are known to be unique to one of the parents, we show that FALCON Unzip is able to produce impressive phasing information requiring nothing more than a little additional time in the compute environment to process the data.

DNAnexus is also honored to be a sponsor of PacBio’s Leiden Conference by providing the “SMRT Leiden Grant powered by DNAnexus” offering free de novo assembly for the most unique plant or animal genome in the world. One lucky winner will be selected for DNAnexus de novo assembly services on PacBio SMRT Sequencing data. Participants can submit proposals on the SMRT Leiden Grant website, with information on organism type and its impact on the scientific community. Proposals should be approximately 250 words in length and the genome size up to 1.5 Gbp, (>1.5 Gbp will be considered under special circumstances). Please note de novo assembly services will only be applied to data generated through PacBio SMRT Sequencing and sequencing is not included in the SMRT Leiden Grant.

Deadline for submission is June 29th, and the winner will be announced the week of July 9th.

Requiring massive computational resources to assemble reads or run structural variation detection across datasets, genome assembly is made even more challenging due to high levels of genetic diversity, repetitive elements, and duplicated genomic regions. Our bioinformatics expertise and computational power enable the delivery of high quality results, leveraging multi-omics data and tools in a collaborative and secure ecosystem. You can learn more about our fast, accurate, and cost efficient reference-quality assembly services that enable complex genome assembly, structural variation analysis, and physical mapping to achieve complete and accurate views of all types of genomic variation on our de novo assembly website.

Questions about DNAnexus de novo assembly or the SMRT Leiden Grant? Email us!