New Single-Cell Genomic Studies Demonstrate Utility of SPAdes Assembler

spades de novo assemblerThis summer we saw some new publications underscoring the need for a high-quality assembler for single-cell genomic sequencing projects — particularly in clinical settings.

Two papers demonstrate this well, and both use the assembler SPAdes to perform needed assemblies. (SPAdes, which can be used for both standard isolates and for single-cell MDA bacterial assemblies, is available as an app through the DNAnexus platform.)

“Candidate phylum TM6 genome recovered from a hospital sink biofilm provides genomic insights into this uncultivated phylum” came out in PNAS in June, and “Genome of the pathogen Porphyromonas gingivalis recovered from a biofilm in a hospital sink using a high-throughput single-cell genomics platform” was published in Genome Research in May. Both papers come from the J. Craig Venter Institute and highlight the critical need for single-cell genomics to characterize organisms that cannot be cultured with traditional methods.

“Single-cell genomics is becoming an accepted method to capture novel genomes, primarily in the marine and soil environments,” the scientists write in Genome Research. “Here we show for the first time that it also enables comparative genomic analysis of strain variation in a pathogen captured from complex biofilm samples in a healthcare facility.”

One of the key limitations to performing single-cell genomics has been that most assemblers are not optimized to handle this type of data. Lack of uniformity in read coverage and increased numbers of chimeric reads and sequencing errors are common problems in single-cell work.

SPAdes, developed by researchers at the St. Petersburg Academic University Algorithmic Biology Laboratory in collaboration with Pavel Pevzner at the University of California, San Diego, fills this niche. The assembly tool, which was recognized as a top performing assembler in the GAGE-B Evaluation, generates single-cell assemblies, providing far more information about microbial genomes from metagenomic studies than traditional assemblers. SPAdes can be used with standard isolates as well as single-cell bacteria assemblies.

SPAdes has been ported to DNAnexus and is available as an app to any user of the new platform. Input for the app is a set of reads in FASTQ format. In SPAdes 2.5, the user can specify multiple libraries, which all will be used for repeat resolution and gap closing. SPAdes does not yet have a scaffolder, so in the case of mate pair sequence data, using an external scaffolder is recommended. You can check out the app by logging in to DNAnexus and searching the app library for SPAdes.

Developer Spotlight: A De Novo Assembler Named Ray

sebastien boisvertWe recently launched the DNAnexus developer program, and to our delight one user was able to contribute a valuable new app in less than a day. Sébastien Boisvert, a doctoral student at the Université Laval in Québec, Canada, converted a software application he had previously written for short-read de novo assembly to an app for the DNAnexus community.

Boisvert is the mind behind Ray, a scalable genome assembler built specifically for next-gen, short-read sequence data and related applications, such as metagenomics. Ray was first reported in 2010 in the Journal of Computational Biology. Written in C++, it is an MPI-based parallel tool using a single executable to eliminate the need for writing perl scripts. Ray is sequencing platform-agnostic, so it can be used with data from any short-read sequencer.

Today, Ray is primarily used by bioinformaticians who have ongoing access to a supercomputer. The software’s peer-to-peer design makes it ideal to run on systems with hundreds or thousands of nodes — which also makes it just right for a cloud computing environment. When Boisvert heard that DNAnexus was opening its doors to developer-contributed apps, he immediately looked into how to submit Ray so even more users could have access to the tool. From his perspective, cloud computing offers a more instantaneous experience with massively parallel computing to people who don’t readily have supercomputer access, and also provides the type of infrastructure management that allows users to focus on what they want to compute, rather than how to manage queries and coding.

Boisvert remarked that the DNAnexus documentation for contributing an app was straightforward and that the interface in particular was easy to use. Writing the wrapper to convert the software code into an app took less than a day. He worked with the Developer Program support team at DNAnexus to make sure everything was working properly, and now Ray is available for any DNAnexus user to add to an analysis pipeline — and it’s free. (Check out Boisvert’s own blog about cloud computing options, where he notes that it’s fun to start an app in DNAnexus!)

As our developer program continues to grow, we look forward to working with more contributors to get their great apps into our platform so they can be broadly available to our growing community of users. If you’re interested in learning more about our Developer Program, please visit

Looking Forward to ISMB and Meeting Developers in Berlin

ISMB Berlin 2013It’s July, and you know what that means — we’re getting ready for the annual Intelligent Systems for Molecular Biology meeting! This year ISMB will be held in Berlin (July 19-23), and we are looking forward to mingling and meeting with fellow computational biologists, software developers, and IT experts.

ISMB regularly attracts excellent speakers, and this year’s keynote list is no exception. We are eager to hear from Lior Pachter from the University of California, Berkeley; Gil Ast from Tel Aviv University; Gary Stormo from Washington University in St. Louis; and Carole Goble from the University of Manchester. Two other keynote speakers are also award winners: David Eisenberg from the University of California, Los Angeles, and Goncalo Abecasis from the University of Michigan.

In addition to the stellar talks, we are looking forward to speaking with developers about our new app program, which aims to deliver new genomics tools to our user community. With the launch of our new platform this year, DNAnexus has provided developers with a well documented software development kit (SDK) and application programming interface (API), as well as an app “wizard” and templates. Anyone can participate in this program and we are providing participants with an initial $1,000 credit for use in accessing cloud-based storage and computational resources and free technical support from the DNAnexus engineering team.

From a developer’s perspective, there are lots of benefits to uploading an app to DNAnexus. A couple of main advantages to the DNAnexus platform include a standardized environment in which your application can run and a great distribution mechanism to gain users among a much broader audience. Our platform is also the most flexible and configurable API-based infrastructure for enabling genomic data analysis and data sharing, with access to command-line interface for bioinformatics experts and from the web-based GUI for non-expert users. Features are scheduled for monetization opportunities that will allow developers to earn income when their apps are used.

If you’re interested in finding out more about the developer program, feel free to let us know — a few DNAnexus team members will be attending ISMB. We’d be happy to set up meetings or quick chats ahead of time. Drop us a line at