Preserving and Enhancing an Important Community Resource

Today, DNAnexus is pleased to announce the launch of our hosted SRA site!

The DNAnexus SRA site is a hosted version of NCBI’s Sequence Read Archive (SRA). As the most comprehensive archive of publicly available next-generation sequencing data, the SRA is an important resource to researchers around the world. The SRA remains the single best resource of useful sequence data from research initiatives such as the 1,000 Genomes Project and institutions like the Broad Institute, Washington University, and the Wellcome Trust Sanger Institute.

DNAnexus has created a mirrored site of this resource by teaming up with Google, to provide access to all publicly accessible datasets for specific studies, experiments, samples, and runs that are currently available via the NCBI SRA website. (Note: Currently these data do not include the analysis data and the Trace Archive repository.)

The New Interface

In addition to maintaining free access to the SRA database, we have taken this opportunity to improve the experience of using and accessing these data. The new web-based user interface was built using the latest cloud-based technologies and genomic data standards. Central to this effort were the many conversations we had with researchers about how they search and interact with data of this type. Their feedback was the basis of our development plan, which drew on our own experience in developing web-based sequence data analysis solutions as well as Google’s big data expertise.

Searching and Browsing

Our main goal in developing the new interface was to vastly enhance the way you find data of interest, understand sample-to-project associations, and download files for subsequent analysis in your tool of choice.

The most significant difference you will notice is the new web-based searching and browsing interface. The new search tool allows you to simultaneously look across multiple data annotations and keywords for objects of interest that are embedded in the SRA database. Each search returns a ranked list of results with relevant metadata for easy follow-on browsing.

We have developed a number of features to simplify how you can scan results and quickly narrow in on relevant data. We are particularly excited about the links to published data. PubMed references now permit users to link directly to journals for descriptions of samples, experiments, studies, or runs as they appeared in the referenced publication.

Once you have identified samples of interest, you can easily download them. In addition to the SRA standard format, we have also made it possible to download these data in the more popular FASTQ format.

For more details on the functionality and how the website works, please visit the SRA FAQ.

Transforming Data into Real Insights Using DNAnexus

Since the SRA primarily contains raw sequence data, the ability to import them into a platform such as DNAnexus is essential for further analyses. For example, by uploading your results into DNAnexus you can access tools that will map your data to a reference genome so you can better understand data quality, a critical step in determining whether to move forward with the data. DNAnexus also allows you to analyze and visualize these data as a standalone dataset or in conjunction with other data already in the system, using our interactive web-based Genome Browser.

Analyze and Visualize SRA Data for Free

For the next 30 days, you can import SRA data directly into DNAnexus at no cost. If you already have a DNAnexus account, simply log in and import your SRA data. If you are not yet a user of DNAnexus, you can sign up for a free trial account and import your data. Once logged in, you can perform mapping, RNA-seq, ChIP-seq, variant analysis, and data visualization on your SRA data for a total of two years.

Special note for our users from academic institutions… We have just reduced the standard academic pricing by half!

Taking DNAnexus to the next level

Today we’re proud to announce the addition of two partners to DNAnexus: Google Ventures and TPG Biotech, the best big data investor and the best life sciences investor. Together with previous investors First Round Capital, SoftTech VC, K9 Ventures, and Felicis VC, we’ve raised $15 million in capital in this round to continue building our company and vision for the genomics revolution.

We also welcome the addition of two individuals to our board: Geoff Duyk, partner and managing director at TPG Biotech, and Krishna Yeshwant, partner at Google Ventures, whose insight and experience we value deeply. We consider Google Ventures and TPG Biotech to be extensions of our team, providing unrivaled industry reach and access to top technical expertise and infrastructure.

How will this affect the company? For starters, we’ve moved our headquarters to Mountain View and grown the company substantially, and continue to recruit the absolute best individuals to build our team. A big emphasis is on identifying the best software engineers and computer scientists, people who appreciate working on complex, big data problems, and want to make a meaningful impact on the world. Intrigued? Know someone who might be a fit? Take a look at our openings, or refer someone you know for a unique referral bonus.

Next, we’re investing in a number of efforts with the goal to bring together data and tools to allow the medical and biotech communities to extract meaning from sequencing data. One such effort is being announced today: a commitment to continue providing access to one of the most comprehensive archives of publicly available sequence data: the Sequence Read Archive. You can read more in the blog post below.

Congratulations to the DNAnexus team and everyone involved. We see this as a huge opportunity to pursue our mission: to unlock the potential of DNA-based medicine and biotechnology by creating scalable and collaborative data technologies.

Case Study Highlight: Differential Expression of Splicing Factor Genes

Dr. Miriam Bucheli, an instructor at Harvard University, is this month’s use case focus. She demonstrates in an elegant way that one can analyze differential expression of splicing factor genes by using DNAnexus to determine the expression profiles in undifferentiated mouse embryonic stem cells (MeS) and differentiated neurons in a senataxin (SETX) knockdown compared to wild type control. During her research-intensive class, Miriam together with her students are beginning to characterize the transcription and RNA processing activities of SETX which is associated with Amyotrophic Lateral Sclerosis (ALS), a disease which affects nerve cells in the brain and spine and leads to progressive loss of motor control.

They used DNAnexus to analyze, visualize and compare the different samples. RPKM values were computed for each gene of interest and an analysis was run to compare the differential expression of splicing factors in SETX knockdown compared to a wild type control. Using this approach, they confirmed that SETX expression is significantly reduced in the knockdown samples (see Figure 1). “The straightforward features and tools made available at DNAnexus helped us complete the project within the deadlines for the students’ presentations,” said Miriam Bucheli, “DNAnexus is a powerful and efficient cloud-based web solution for the analysis of NGS data”.

Figure 1: SETX expression in normal vs. knockdown sample as visualized in the DNAnexus Genome Browser.

In their user case study, Miriam shared additional data which was presented while visiting Columbia Medical School and will be shared at a future Keystone Meeting.

Figure 2: Differential expression of splicing factors in SETX knockdown compared to normal control. Values calculated from RPKM ratios of cells non-induced or induced for neuronal differentiation by SAG.

Miriam shared her experience with us by submitting to our “Tell us how you use DNAnexus” contest and was selected as January’s winner. View Miriam Bucheli’s complete submission.