Preserving and Enhancing an Important Community Resource

Today, DNAnexus is pleased to announce the launch of our hosted SRA site!

The DNAnexus SRA site is a hosted version of NCBI’s Sequence Read Archive (SRA). As the most comprehensive archive of publicly available next-generation sequencing data, the SRA is an important resource to researchers around the world. The SRA remains the single best resource of useful sequence data from research initiatives such as the 1,000 Genomes Project and institutions like the Broad Institute, Washington University, and the Wellcome Trust Sanger Institute.

DNAnexus has created a mirrored site of this resource by teaming up with Google, to provide access to all publicly accessible datasets for specific studies, experiments, samples, and runs that are currently available via the NCBI SRA website. (Note: Currently these data do not include the analysis data and the Trace Archive repository.)

The New Interface

In addition to maintaining free access to the SRA database, we have taken this opportunity to improve the experience of using and accessing these data. The new web-based user interface was built using the latest cloud-based technologies and genomic data standards. Central to this effort were the many conversations we had with researchers about how they search and interact with data of this type. Their feedback was the basis of our development plan, which drew on our own experience in developing web-based sequence data analysis solutions as well as Google’s big data expertise.

Searching and Browsing

Our main goal in developing the new interface was to vastly enhance the way you find data of interest, understand sample-to-project associations, and download files for subsequent analysis in your tool of choice.

The most significant difference you will notice is the new web-based searching and browsing interface. The new search tool allows you to simultaneously look across multiple data annotations and keywords for objects of interest that are embedded in the SRA database. Each search returns a ranked list of results with relevant metadata for easy follow-on browsing.

We have developed a number of features to simplify how you can scan results and quickly narrow in on relevant data. We are particularly excited about the links to published data. PubMed references now permit users to link directly to journals for descriptions of samples, experiments, studies, or runs as they appeared in the referenced publication.

Once you have identified samples of interest, you can easily download them. In addition to the SRA standard format, we have also made it possible to download these data in the more popular FASTQ format.

For more details on the sra.dnanexus.com functionality and how the website works, please visit the SRA FAQ.

Transforming Data into Real Insights Using DNAnexus

Since the SRA primarily contains raw sequence data, the ability to import them into a platform such as DNAnexus is essential for further analyses. For example, by uploading your results into DNAnexus you can access tools that will map your data to a reference genome so you can better understand data quality, a critical step in determining whether to move forward with the data. DNAnexus also allows you to analyze and visualize these data as a standalone dataset or in conjunction with other data already in the system, using our interactive web-based Genome Browser.

Analyze and Visualize SRA Data for Free

For the next 30 days, you can import SRA data directly into DNAnexus at no cost. If you already have a DNAnexus account, simply log in and import your SRA data. If you are not yet a user of DNAnexus, you can sign up for a free trial account and import your data. Once logged in, you can perform mapping, RNA-seq, ChIP-seq, variant analysis, and data visualization on your SRA data for a total of two years.

Special note for our users from academic institutions… We have just reduced the standard academic pricing by half!

Seeing The Trees In The Forest

One of the biggest challenges associated with the identification of genomic variation, is finding those that have a real and measurable impact and help explain, for example, a disease or drug response under investigation. Weeding through more than 5 million variants associated with the human genome is a huge effort that requires significant computational infrastructure and staff time to manually validate and correlate the identified biological findings associated with the data obtained. To expedite this process and free up more time for focusing on relevant data, these data must be narrowed down to a manageable size – ideally less than a few hundred variants.

We have just released a number of new features that will help solve this challenge by providing:

  1. Smart variation results filtering
  2. Linkouts to public and commercial data sources with gene to disease information

With this new functionality, you can – with a few simple queries – home in on the most relevant variants, whether they are associated with a specific gene, a coding region, a specific chromosome, or annotations that fulfill a specific set of characteristics. The result is quicker insight into affected processes that directly translates into faster hypothesis generation and decision making.

More Specifically…

To help you rapidly drill down on biologically interesting and relevant results, we have created a flexible query tool for filtering your variation analysis results within the DNAnexus Genome Browser. With just a few clicks, you can apply any number of filters to a results table, yielding a set of variant calls that allow easy navigation through the browser and further investigation.

In this release, we have added 13 distinct filters, including chromosome, variant type, gene/transcript name, zygosity, location relative to gene/transcript, among others. These filters are currently available for the DNAnexus Nucleotide-Level Variation (see screenshot below) and Population Allele Frequency analyses results. We are also working towards making them available for any data type, including RNA-seq and ChIP-seq data. All of the filtered results can be exported out of DNAnexus for further analyses in other tools, such as Excel or statistical tools.

Understanding And Validating Variant To Gene To Disease Results

To help you understand a prioritized list of variants as well as the genes and processes impacted as a result of these variants, we have included the ability to link out to other third party data sources, both public and commercial data sources that contain relevant gene-to-disease knowledge, allowing you to study how identified variations in DNA affect the response to diseases, bacteria, viruses, toxins and chemicals, including drugs and other therapies.

It’s All About The Data

DNAnexus specializes in addressing the data storage, management and analysis challenges inherent in next-generation sequencing. We believe that by leveraging the cloud, being data-source/platform agnostic we can provide the best possible support for anyone using these data in their work. We also believe that your input regarding what data is accessible through DNAnexus is critical and because our platform is flexible we can easily integrate with many of the data sources you would like to access or need for your research.

DNAnexus currently supports direct linkouts to 12 public and commercial data sources including: AmiGo, BioBase, Cosmic, dbSNP, Entrez Gene, GeneCards®, IPA®, KEGG, NextBio, OMIM, PharmGKB, Pubmed. For commercial data sources, we can provide integrated access for users who have licenses to access these data.

Please let us know if there are specific data that you would like to access via DNAnexus by emailing us at support@dnanexus.com.

Take Me To The Data

To access these data sources we have added the new Gene Info pages (see the BRCA1 Gene Info page as an example below), which provide a gene overview and a list of all the data sources accessible. Gene Info pages are meant to give you a preview of the gene, with linkouts to additional information.

Gene Info pages are accessible through hyperlinked gene names within the DNAnexus Genome Browser and analysis results tables, as shown here.

We now support 22 reference genomes, the latest additions include Staphylococcus genome S. epidermidis ATCC 12228 and the Macaque genome M. mulatta.

Tell Us What You Think

Much of the new functionality that makes its way into the DNAnexus platform is the result of requests by our many active users. We cannot emphasize enough how much we value user feedback; it is a critical component of our product development and feature prioritization process.

To simplify the process of providing feedback, we have added feedback links to both the filterable results tables and the Gene Info pages. You are also welcome to email us at support@dnanexus.com with any feature requests or questions you may have. We look forward to hearing from you and keeping you posted on the many new features we are working on and will be releasing in the coming months.

Navigating the Exome with DNAnexus

With a growing number of targeted exome capture solutions being integrated into the next-generation sequencing workflow, targeted exome analysis has become the go-to and cost-effective approach for obtaining sequence coverage of protein-coding regions of the genome. As a result, researchers are starting projects with larger populations, with larger, more complicated datasets. One reason targeted exome capture is gaining steam is because whole-genome sequencing is still cost prohibitive for most researchers.

To support this important methodology we have added Exome Analysis to our repertoire of analyses tools. With the DNAnexus Exome Analysis method, we’ve simplified a critical step in the processing and analysis of these datasets. Using this analysis you can quickly analyze exome sequence data by determining whether regions of interest have been sequenced with sufficient coverage to allow for further analyses.


For each exon, DNAnexus reports on the number and fraction of bases covered by sequence reads, along with the average coverage within the exon. Exons that are overlapping genes in a gene annotation track are labeled with the gene name to allow for easy follow-on searching for exons from a gene of interest.