Inside DNAnexus

Product updates, industry insights, opinions and references. From the team powering the Genomics Revolution.

St. Jude Cloud Provides Model For Cancer Collaboration

St. Jude Cloud Cancer Collaboration

When researching a rare disease with many subtypes driven by diverse and distinct genetic alterations, data sharing is key. Samples acquired by a single institute, a single research initiative, or even a single nation may lack sufficient power for genomic discovery and clinical correlative analysis, and the mass of raw data from whole genome sequencing presents challenges. 

Which is why we were proud to partner with St. Jude Children’s Research Hospital and Microsoft to create a solution: a cloud-based, data-sharing ecosystem that has proved to be a model for harmonized genetic data and collaboration across the pediatric cancer community. 

Since the initial announcement of the partnership in 2018, more than 1.25 petabytes of data have been incorporated into the St. Jude Cloud, including:

  • 12,104 whole genomes;
  • 7,697 whole exomes; and 
  • 2,202 transcriptomes, from more than 
  • 10,000 pediatric cancer patients and long-term survivors, and 
  • 800 pediatric sickle cell patients

As reported recently in the journal Cancer Discovery, this makes it the largest publicly available genomic data resource for pediatric cancer, and it has already helped advance research. 

For example, Camille Keenan and colleagues gained new insight into a rare C11orf95 fusion in ependymoma by uploading and analyzing their RNA-Seq samples using the RNA Classification workflow on St. Jude Cloud. The Cancer Discovery paper includes additional use cases that classify 135 pediatric cancer subtypes by gene expression profiling and map mutational signatures across 35 pediatric cancer subtypes. 

How does the St. Jude Cloud work? Raw and curated genomic data, analysis and visualization tools are structured into three inter-connected apps: 

  • Genomics Platform, for accessing data and analysis workflows; 
  • PeCan, a Pediatric Cancer Knowledgebase for exploring a curated knowledgebase of more than 5,000 pediatric cancer genomes; and 
  • Visualization Community, for exploring published pediatric cancer genomic or epigenomic landscape maps, and for visualizing user data using ProteinPaint or GenomePaint. 
St Jude Cloud Visual Pipeline

Common use cases, such as assessing the recurrence of a rare genomic variant or the expression status of a gene of interest, are built into these apps, eliminating the need to download data and perform custom analyses. To enable researchers with little to no formal computational training to perform sophisticated genomic analysis, we also developed eight end-to-end analysis workflows designed with a point-and-click interface for uploading input files and graphically visualizing the results. 

“Effective sharing of genomic data and a community effort to elucidate etiology are…critical to developing effective therapeutic strategies,” the Cancer Discovery paper authors wrote. “The complementarity amongst the three apps within the St. Jude Cloud ecosystem enables the optimal use of computational resources so that researchers can focus on innovative analyses leading to new insights.”

The project leverages Microsoft Azure data storage and our open and flexible DNAnexus Portals™ workspace to create a secure environment compliant with all of the major data privacy standards (HIPAA, CLIA, CGP, 21 CFR Parts 22, 58, 493, and European data privacy laws and regulations). 

As the paper authors note, St. Jude Cloud currently hosts genomic data generated primarily by St. Jude studies, but they envision it will serve as a collaborative research platform for the broader pediatric cancer community in the future. 

“User-uploaded data can be analyzed and explored alongside the wealth of curated and raw pediatric genomic data on St. Jude Cloud, and deposition of user data into St. Jude Cloud requires minimal effort. In this regard St. Jude Cloud represents a community resource, framework, and significant contribution to the pediatric genomic sequencing data sharing landscape.” 

About DNAnexus

DNAnexus the leader in biomedical informatics and data management, has created the global network for genomics and other biomedical data, operating in 33 countries including North America, Europe, China, Australia, South America, and Africa. The secure, scalable, and collaborative DNAnexus Platform helps thousands of researchers across a spectrum of industries — biopharmaceutical, bioagricultural, sequencing services, clinical diagnostics, government, and research consortia — accelerate their genomics programs.

The DNAnexus team is made up of experts in computational biology and cloud computing who work with organizations to tackle some of the most exciting opportunities in human health, making it easier—and in many cases feasible—to work with genomic data. With DNAnexus, organizations can stay a step ahead in leveraging genomics to achieve their goals. The future of human health is in genomics. DNAnexus brings it all together.