Taking DNAnexus to the next level

Today we’re proud to announce the addition of two partners to DNAnexus: Google Ventures and TPG Biotech, the best big data investor and the best life sciences investor. Together with previous investors First Round Capital, SoftTech VC, K9 Ventures, and Felicis VC, we’ve raised $15 million in capital in this round to continue building our company and vision for the genomics revolution.

We also welcome the addition of two individuals to our board: Geoff Duyk, partner and managing director at TPG Biotech, and Krishna Yeshwant, partner at Google Ventures, whose insight and experience we value deeply. We consider Google Ventures and TPG Biotech to be extensions of our team, providing unrivaled industry reach and access to top technical expertise and infrastructure.

How will this affect the company? For starters, we’ve moved our headquarters to Mountain View and grown the company substantially, and continue to recruit the absolute best individuals to build our team. A big emphasis is on identifying the best software engineers and computer scientists, people who appreciate working on complex, big data problems, and want to make a meaningful impact on the world. Intrigued? Know someone who might be a fit? Take a look at our openings, or refer someone you know for a unique referral bonus.

Next, we’re investing in a number of efforts with the goal to bring together data and tools to allow the medical and biotech communities to extract meaning from sequencing data. One such effort is being announced today: a commitment to continue providing access to one of the most comprehensive archives of publicly available sequence data: the Sequence Read Archive. You can read more in the blog post below.

Congratulations to the DNAnexus team and everyone involved. We see this as a huge opportunity to pursue our mission: to unlock the potential of DNA-based medicine and biotechnology by creating scalable and collaborative data technologies.

Case Study Highlight: Differential Expression of Splicing Factor Genes

Dr. Miriam Bucheli, an instructor at Harvard University, is this month’s use case focus. She demonstrates in an elegant way that one can analyze differential expression of splicing factor genes by using DNAnexus to determine the expression profiles in undifferentiated mouse embryonic stem cells (MeS) and differentiated neurons in a senataxin (SETX) knockdown compared to wild type control. During her research-intensive class, Miriam together with her students are beginning to characterize the transcription and RNA processing activities of SETX which is associated with Amyotrophic Lateral Sclerosis (ALS), a disease which affects nerve cells in the brain and spine and leads to progressive loss of motor control.

They used DNAnexus to analyze, visualize and compare the different samples. RPKM values were computed for each gene of interest and an analysis was run to compare the differential expression of splicing factors in SETX knockdown compared to a wild type control. Using this approach, they confirmed that SETX expression is significantly reduced in the knockdown samples (see Figure 1). “The straightforward features and tools made available at DNAnexus helped us complete the project within the deadlines for the students’ presentations,” said Miriam Bucheli, “DNAnexus is a powerful and efficient cloud-based web solution for the analysis of NGS data”.

Figure 1: SETX expression in normal vs. knockdown sample as visualized in the DNAnexus Genome Browser.

In their user case study, Miriam shared additional data which was presented while visiting Columbia Medical School and will be shared at a future Keystone Meeting.

Figure 2: Differential expression of splicing factors in SETX knockdown compared to normal control. Values calculated from RPKM ratios of cells non-induced or induced for neuronal differentiation by SAG.

Miriam shared her experience with us by submitting to our “Tell us how you use DNAnexus” contest and was selected as January’s winner. View Miriam Bucheli’s complete submission.

Seeing The Trees In The Forest

One of the biggest challenges associated with the identification of genomic variation, is finding those that have a real and measurable impact and help explain, for example, a disease or drug response under investigation. Weeding through more than 5 million variants associated with the human genome is a huge effort that requires significant computational infrastructure and staff time to manually validate and correlate the identified biological findings associated with the data obtained. To expedite this process and free up more time for focusing on relevant data, these data must be narrowed down to a manageable size – ideally less than a few hundred variants.

We have just released a number of new features that will help solve this challenge by providing:

  1. Smart variation results filtering
  2. Linkouts to public and commercial data sources with gene to disease information

With this new functionality, you can – with a few simple queries – home in on the most relevant variants, whether they are associated with a specific gene, a coding region, a specific chromosome, or annotations that fulfill a specific set of characteristics. The result is quicker insight into affected processes that directly translates into faster hypothesis generation and decision making.

More Specifically…

To help you rapidly drill down on biologically interesting and relevant results, we have created a flexible query tool for filtering your variation analysis results within the DNAnexus Genome Browser. With just a few clicks, you can apply any number of filters to a results table, yielding a set of variant calls that allow easy navigation through the browser and further investigation.

In this release, we have added 13 distinct filters, including chromosome, variant type, gene/transcript name, zygosity, location relative to gene/transcript, among others. These filters are currently available for the DNAnexus Nucleotide-Level Variation (see screenshot below) and Population Allele Frequency analyses results. We are also working towards making them available for any data type, including RNA-seq and ChIP-seq data. All of the filtered results can be exported out of DNAnexus for further analyses in other tools, such as Excel or statistical tools.

Understanding And Validating Variant To Gene To Disease Results

To help you understand a prioritized list of variants as well as the genes and processes impacted as a result of these variants, we have included the ability to link out to other third party data sources, both public and commercial data sources that contain relevant gene-to-disease knowledge, allowing you to study how identified variations in DNA affect the response to diseases, bacteria, viruses, toxins and chemicals, including drugs and other therapies.

It’s All About The Data

DNAnexus specializes in addressing the data storage, management and analysis challenges inherent in next-generation sequencing. We believe that by leveraging the cloud, being data-source/platform agnostic we can provide the best possible support for anyone using these data in their work. We also believe that your input regarding what data is accessible through DNAnexus is critical and because our platform is flexible we can easily integrate with many of the data sources you would like to access or need for your research.

DNAnexus currently supports direct linkouts to 12 public and commercial data sources including: AmiGo, BioBase, Cosmic, dbSNP, Entrez Gene, GeneCards®, IPA®, KEGG, NextBio, OMIM, PharmGKB, Pubmed. For commercial data sources, we can provide integrated access for users who have licenses to access these data.

Please let us know if there are specific data that you would like to access via DNAnexus by emailing us at support@dnanexus.com.

Take Me To The Data

To access these data sources we have added the new Gene Info pages (see the BRCA1 Gene Info page as an example below), which provide a gene overview and a list of all the data sources accessible. Gene Info pages are meant to give you a preview of the gene, with linkouts to additional information.

Gene Info pages are accessible through hyperlinked gene names within the DNAnexus Genome Browser and analysis results tables, as shown here.

We now support 22 reference genomes, the latest additions include Staphylococcus genome S. epidermidis ATCC 12228 and the Macaque genome M. mulatta.

Tell Us What You Think

Much of the new functionality that makes its way into the DNAnexus platform is the result of requests by our many active users. We cannot emphasize enough how much we value user feedback; it is a critical component of our product development and feature prioritization process.

To simplify the process of providing feedback, we have added feedback links to both the filterable results tables and the Gene Info pages. You are also welcome to email us at support@dnanexus.com with any feature requests or questions you may have. We look forward to hearing from you and keeping you posted on the many new features we are working on and will be releasing in the coming months.