Meet the new DNAnexus and Its Extensible Genomics Toolbox

genomics platformThis week we continue our look at unique facets of the new DNAnexus  with a focus on the “Extensible Genomics Toolbox” — that is, the platform’s ability to allow bioinformaticians to tailor their analyses through custom Apps and workflows. The Apps provided within the platform serve as a starting toolkit, but users can also build new Apps from scratch or modify and combine existing ones to create truly custom pipelines.

Customization is one of the most important components when it comes to data analysis. Depending on the question that has spurred their experiments, researchers have a broad range of data analysis needs. The tools, algorithms, or annotations they find relevant will vary greatly from one experiment to the next — no single solution works for all of them.

extensible genomics toolboxDNAnexus’ goal is to provide a turn-key platform with a comprehensive menu of built-in functionality while also providing the freedom to add new capabilities as you need them. You can use what is already available in the constantly growing Apps library, which includes a rich set of industry-recognized tools for data QC, DNA resequencing, and RNA-seq, such as FastQC, RSeQC, BWA, GATK, SAMtools, Picard, SomaticSniper, Tophat, Cufflinks, and many more. In addition to these tools, you can take advantage of an expanding set of integrated reference genomes and variant databases, including dbSNP and COSMIC. All the Apps provided are open source. In addition, we provide a large set of useful example applets that can be used as a starting point for developers to build their own Apps.

genomics apps

For example, if you need your own reference genome and annotation database for annotating and interpreting your data, DNAnexus provides the functionality to let you upload and integrate your own proprietary reference genome into your custom workflow. Combine and configure multiple tools — whether they’re provided by DNAnexus or built in your own lab — into best-practice workflows that can be used in your lab or shared with collaborators within or across institutions.

The flexibility of the new DNAnexus platform allows you to run Linux programs written in any language. You can develop your own parallelized tools using APIs and the SDK for Bash shell, Python, C++, and Java (with more coming soon). In addition, you can now automate your batch data analysis via scripting using the command line, or with an easy-to-use, drag-and-drop web interface with dynamic validation of App compatibility to let you know whether you have selected the proper input file type for a specified App.

genomics workflows

The new DNAnexus platform will allow upload and storage of any file type via the API. To achieve this and to take full advantage of the programmatic capabilities of the new platform, DNAnexus will automatically convert certain file types into objects optimized for fast programmatic access. These new objects are called Genomic Tables or simply gtables, and they can be generated with file-conversion Apps provided by DNAnexus. As a result, users can store and retrieve any file type from their account workspace in its original form. File types supported for import and export are FASTA, FASTQ, SAM, BAM, VCF, BED, GFF, GTF, and WIG (more soon).

The rich genomics toolbox also contains an HTML5-based integrated and interactive Genome Browser which lets you create custom tracks to view your data alongside reference data without any additional downloads or plugins. You can immediately leverage the Genome Browser and stream data as needed across the internet. Add from our included reference data sets and variant databases, such as dbSNP and COSMIC, or whatever data you choose to upload.

dnanexus genome browser

All together, this results in an environment that makes it possible for you to design, script, and fully automate custom workflows — filtering data, querying massive data sets, and handling batch analysis with ease thanks to the extensible genomics toolbox within the configurable cloud.

This ability to create Apps and workflows tailored to the needs of your project and lab is the foundation of the platform’s ability to facilitate “Instant Collaboration” — another core capability that we’ll discuss in next week‘s close-up look at the new DNAnexus. Haven’t taken it for a test drive yet? Take advantage of our free beta trial period and sign up for an account here to explore it for yourself.


Meet the new DNAnexus and its Configurable Cloud Infrastructure

dnanexus betaIt’s been a busy first week since we launched the beta of the new DNAnexus, our cloud-based DNA analysis platform designed for bioinformaticians. We’ve been blown away by the number of people who have signed up for the program and provided a lot of very positive and constructive feedback. We encourage all of our beta users to continue to comment on their experience. Request access today and see for yourself what it is all about.


configurable cloud infrastructureThis week we’d like to highlight one of the core capabilities of the new DNAnexus platform, the configurable cloud infrastructure, which lets you take full advantage of Amazon’s scalable and cost-effective Web Services. It not only allows you to scale your computational and data storage needs to any level, it is also fully scriptable and allows you to create an analysis solution that fits your specific needs. The benefit is eliminating capacity planning since you can now store and process any data on demand and only pay for what you use.


At DNAnexus we have always used the pay-as-you-go model for computational and storage services; this will continue with the new DNAnexus. The benefit of a pay-as-you-go approach is that you can cost-effectively address your needs today and scale up or down as those needs change. Whether you are familiar with or new to sequence data analysis, you can immediately get started with your data analysis projects without any setup costs or capacity planning risks — regardless of how many samples you might have. This is because the new platform, with its configurable infrastructure, processes samples in parallel, resolving resource contention issues among different teams.


When we set out to build the new platform, one of the most common requests we heard was for a fully configurable solution — allowing bioinformaticians and computational analysts the ability to run custom programs, tune compute performance through parallelization, and more. All of this is now possible with the new platform, through well-documented APIs and SDK, as these allow rich scripting for any data management, analysis, visualization, or reporting desires.


configurable genomics platform

Another advantage of this new infrastructure is that you can now manage and manipulate your data not only via the web interface, but also through the command-line, which is compatible with Linux and Mac OS X. The open and flexible new DNAnexus platform, with its SDK language support, allows you to run any tool in any language and perform platform operations through API bindings in Python, C++, Java, and the Bash Shell. This allows you to fully automate entire workflows from sequencing data upload to analysis and report generation. You may also create best practices workflows that can be easily shared with non-bioinformaticians within or across institutions.


In the weeks to come, we’ll explore the many additional capabilities of the new DNAnexus (e.g., the “Extensible Genomics Toolbox”, “Instant Collaboration”, and “Security and Compliance”). In the meantime, please take advantage of our beta program and sign up for your own account and explore firsthand what the new DNAnexus has to offer.