DNAnexus Platform and Product Updates: 2020 in Review

DNAnexus Platform and Product Updates

Everyone is happy to see 2020 receding in the rearview mirror. But while last year was difficult in so many respects, it was productive for the DNAnexus Product team. Here’s a list of some of the features and enhancements we released in 2020:

Improved Cohort Building and Analysis Experience

We made several updates to the Cohort Browser, to improve the experience of building and analyzing cohorts. The Cohort Browser now offers users:

  • More options for customizing dashboards, through the use of custom filters and tiles, and Cohort Table field display options
  • The ability to create and compare the phenotypic data for two cohorts, in support of use cases such as case-control comparisons, survival curve comparisons, and comparing a subpopulation to the whole

See our documentation for more on the improved Cohort Browser

Cohort Browser Product Update

New Association Browser 

Early in 2020, we released a new Association Browser, targeted to users conducting Genome Wide Association Studies (GWAS) and Phenome Wide Association Studies (PheWAS).  The Association Browser enables these users to view and use an array of powerful visualizations – including zoomable Manhattan Plots – across a compendium of curated GWAS and PheWAS the results to more rapidly gain insight.  See our documentation for more on the Association Browser.

Association Browser

Run GWAS Directly on Cohorts

Last summer, we made it possible for users quickly and easily to run PLATO and PLINK-based GWAS on cohorts built through the Cohort Browser, with additional configurations for case-control and specifying covariants. The results can then be viewed in an interactive Manhattan plot and variant table. 

Data Ingestion Guidelines & Data Model Loader Documentation

In the fall, we published new documentation providing detailed guidance on how to ingest phenotypic data:

Extend Datasets with New/Derived Phenotype Information

The Dataset Extender app is an application meant to help expand your core dataset so that the entire team can access newly generated data. It is a lightweight app focused on quickly expanding core datasets with newly generated or acquired data that is to be shared with collaborators.  

Running RStudio Shiny Server and Apps

With a few lines of R code, you can use RStudio Shiny to turn your command line apps and scripts into feature-rich, easy-to-use applications, accessible via a beautiful graphical user interface (GUI).  This past summer, we published detailed guidelines on how to build and run RStudio Shiny apps on the DNAnexus Platform.

RStudio Shiny

DX JupyterLab now supports papermill

DX JupyterLab now supports the use of papermill for automatically executing Jupyter notebooks and collecting the results. For more details, see the in-product DX JupyterLab documentation.

Migration to Python Version 3.0

The Python Software Foundation sunset Python version 2.x on January 1, 2020, and, moving forward, will support only Python 3.x. To learn more about how this affects the security of your Python apps and use of the DNAnexus Platform, please read the detailed FAQ we published last January.

Archive Service For Azure

For customers using Titan or Apollo on Azure, file-based archiving is now available in all Azure regions. Customers can take advantage of this feature to keep storage costs in check, by archiving files they use infrequently. Rest assured that archived files will be stored in accord with your organization’s data-retention policies, while maintaining security, file provenance tracking, and meta-data searchability. Learn more from this April 2020 post.

Leverage the Power of GPU Compute

Are you running ML- or AI-driven analyses that need the power of GPU compute? DNAnexus can help. DNAnexus now supports the use of all available AWS GPU instance types. Get the details from our documentation.

https Apps with Custom Hostnames

App developers can now provide access to their apps via user-friendly URLs. Each URL is generated by concatenating the app name with the name of the app host organization, e.g., https://myBestTool-acmeOrg.dnanexus.cloud/. Learn more by checking out our documentation.

dxWDL 2.0

dxWDL 2.0, the next generation of dxWDL, is on the way. Version 2.0 provides an improved WDL development experience, while following the WDL standard more closely, and removing dxWDL’s dependence on the Cromwell codebase. It relies on the DNAnexus wdlTools library for WDL parsing. If you’d like to check it out in advance of release, release candidate version 2.0.0-rc4 is available on Github

New File Info Pane

On enabling our new product UI, you’ll have access to a better way to get info on files. When viewing a list of files, click the “i” icon over the upper right corner of the list. Then tick the checkbox to the left of the name of the file about which you’d like to know more. Full file details will appear in the pane to the right of the list.

New File Info Pane

Looking Beyond Causative Variants to Help Rare Disease Researchers Identify Elusive Treatment Options

Rare Disease Research

Ever heard of schwannomatosis? How about neurofibromatosis? They are rare diseases that are not as rare as you might think, affecting around one in every 3,000 people worldwide. 

At DNAnexus, we are proud to help enable research into these and other ‘rare’ genetic conditions. 

One research partner is the Children’s Tumor Foundation, which is dedicated to developing treatments for the three identified forms of neurofibromatosis — NF1, NF2, and schwannomatosis — that cause tumors to grow on nerves throughout the body.

NF1 (formerly known as von Recklinghausen NF or Peripheral NF), is usually diagnosed in childhood and is one of the most common inherited neurological disorders, affecting about 1 in 3,000 people throughout the world. Characterized by multiple café au lait (light brown) skin spots and neurofibromas on or under the skin, it can cause tumors to develop in the brain, on cranial nerves, or on the spinal cord. In addition to having a much higher chance of developing cancer, patients can experience severe pain, blindness, curvature of bones, and learning issues, among other serious symptoms. 

NF2 is much less common, affecting about 1 in 25,000 people worldwide. The disorder is characterized by the development of benign tumors on the nerve that carries sound and balance information from the inner ear to the brain, often leading to partial or complete hearing loss. NF2 patients can incur brain and cranial nerve damage, facial weakness, swallowing difficulties, tinnitus, and/or seizures as some of the consequences of their tumors or because of surgery interventions. 

Schwannomatosis, the least common and most recently identified form of neurofibromatosis, affects less than 1 in 40,000 people, and causes the development of benign tumors to grow on nerves. It can cause severe chronic localized or diffuse pain.

While researchers have pinpointed several genetic mutations in neurofibromatosis patients, little is known about the epigenomic alterations that drive the disorders. 

Such knowledge is crucial when investigating potential treatments, according to DNAnexus scientist Ben Busby, and he believes computational biology can help provide solutions.

“We’ve gotten very good as a research community at calling variants from rare disease cases, which has helped hasten many difficult diagnostic odysseys,” he said. “But that doesn’t give you information that might be indicative of potential treatments.” 

By looking beyond causative variants, and into gene expression and transcriptome data, rare disease researchers might be able to identify elusive therapeutic options.

Busby will demonstrate just how this is possible on DNAnexus Platforms in a special February 25 webinar, organized to mark Rare Disease Day. 

He will be joined by Salvatore La Rosa, of the Children’s Tumor Foundation, who will talk about the organization’s approach to research.

Busby said the CTF provides a great model for other rare disease research organizations, by working with the researchers they fund to discover what data roadblocks exist, and to help the scientists overcome them through the sharing and distribution of critical patient datasets. 

Advancing Rare Disease Research with Multi-Omic Data Analysis
Thursday, February 25, 2021
11 a.m PT / 2 p.m ET

St. Jude Cloud Provides Model For Cancer Collaboration

St. Jude Cloud Cancer Collaboration

When researching a rare disease with many subtypes driven by diverse and distinct genetic alterations, data sharing is key. Samples acquired by a single institute, a single research initiative, or even a single nation may lack sufficient power for genomic discovery and clinical correlative analysis, and the mass of raw data from whole genome sequencing presents challenges. 

Which is why we were proud to partner with St. Jude Children’s Research Hospital and Microsoft to create a solution: a cloud-based, data-sharing ecosystem that has proved to be a model for harmonized genetic data and collaboration across the pediatric cancer community. 

Since the initial announcement of the partnership in 2018, more than 1.25 petabytes of data have been incorporated into the St. Jude Cloud, including:

  • 12,104 whole genomes;
  • 7,697 whole exomes; and 
  • 2,202 transcriptomes, from more than 
  • 10,000 pediatric cancer patients and long-term survivors, and 
  • 800 pediatric sickle cell patients

As reported recently in the journal Cancer Discovery, this makes it the largest publicly available genomic data resource for pediatric cancer, and it has already helped advance research. 

For example, Camille Keenan and colleagues gained new insight into a rare C11orf95 fusion in ependymoma by uploading and analyzing their RNA-Seq samples using the RNA Classification workflow on St. Jude Cloud. The Cancer Discovery paper includes additional use cases that classify 135 pediatric cancer subtypes by gene expression profiling and map mutational signatures across 35 pediatric cancer subtypes. 

How does the St. Jude Cloud work? Raw and curated genomic data, analysis and visualization tools are structured into three inter-connected apps: 

  • Genomics Platform, for accessing data and analysis workflows; 
  • PeCan, a Pediatric Cancer Knowledgebase for exploring a curated knowledgebase of more than 5,000 pediatric cancer genomes; and 
  • Visualization Community, for exploring published pediatric cancer genomic or epigenomic landscape maps, and for visualizing user data using ProteinPaint or GenomePaint. 
St Jude Cloud Visual Pipeline

Common use cases, such as assessing the recurrence of a rare genomic variant or the expression status of a gene of interest, are built into these apps, eliminating the need to download data and perform custom analyses. To enable researchers with little to no formal computational training to perform sophisticated genomic analysis, we also developed eight end-to-end analysis workflows designed with a point-and-click interface for uploading input files and graphically visualizing the results. 

“Effective sharing of genomic data and a community effort to elucidate etiology are…critical to developing effective therapeutic strategies,” the Cancer Discovery paper authors wrote. “The complementarity amongst the three apps within the St. Jude Cloud ecosystem enables the optimal use of computational resources so that researchers can focus on innovative analyses leading to new insights.”

The project leverages Microsoft Azure data storage and our open and flexible DNAnexus Portals™ workspace to create a secure environment compliant with all of the major data privacy standards (HIPAA, CLIA, CGP, 21 CFR Parts 22, 58, 493, and European data privacy laws and regulations). 

As the paper authors note, St. Jude Cloud currently hosts genomic data generated primarily by St. Jude studies, but they envision it will serve as a collaborative research platform for the broader pediatric cancer community in the future. 

“User-uploaded data can be analyzed and explored alongside the wealth of curated and raw pediatric genomic data on St. Jude Cloud, and deposition of user data into St. Jude Cloud requires minimal effort. In this regard St. Jude Cloud represents a community resource, framework, and significant contribution to the pediatric genomic sequencing data sharing landscape.”