When researching a rare disease with many subtypes driven by diverse and distinct genetic alterations, data sharing is key. Samples acquired by a single institute, a single research initiative, or even a single nation may lack sufficient power for genomic discovery and clinical correlative analysis, and the mass of raw data from whole genome sequencing presents challenges.
Which is why we were proud to partner with St. Jude Children’s Research Hospital and Microsoft to create a solution: a cloud-based, data-sharing ecosystem that has proved to be a model for harmonized genetic data and collaboration across the pediatric cancer community.
Since the initial announcement of the partnership in 2018, more than 1.25 petabytes of data have been incorporated into the St. Jude Cloud, including:
- 12,104 whole genomes;
- 7,697 whole exomes; and
- 2,202 transcriptomes, from more than
- 10,000 pediatric cancer patients and long-term survivors, and
- 800 pediatric sickle cell patients
As reported recently in the journal Cancer Discovery, this makes it the largest publicly available genomic data resource for pediatric cancer, and it has already helped advance research.
For example, Camille Keenan and colleagues gained new insight into a rare C11orf95 fusion in ependymoma by uploading and analyzing their RNA-Seq samples using the RNA Classification workflow on St. Jude Cloud. The Cancer Discovery paper includes additional use cases that classify 135 pediatric cancer subtypes by gene expression profiling and map mutational signatures across 35 pediatric cancer subtypes.
How does the St. Jude Cloud work? Raw and curated genomic data, analysis and visualization tools are structured into three inter-connected apps:
- Genomics Platform, for accessing data and analysis workflows;
- PeCan, a Pediatric Cancer Knowledgebase for exploring a curated knowledgebase of more than 5,000 pediatric cancer genomes; and
- Visualization Community, for exploring published pediatric cancer genomic or epigenomic landscape maps, and for visualizing user data using ProteinPaint or GenomePaint.
Common use cases, such as assessing the recurrence of a rare genomic variant or the expression status of a gene of interest, are built into these apps, eliminating the need to download data and perform custom analyses. To enable researchers with little to no formal computational training to perform sophisticated genomic analysis, we also developed eight end-to-end analysis workflows designed with a point-and-click interface for uploading input files and graphically visualizing the results.
“Effective sharing of genomic data and a community effort to elucidate etiology are…critical to developing effective therapeutic strategies,” the Cancer Discovery paper authors wrote. “The complementarity amongst the three apps within the St. Jude Cloud ecosystem enables the optimal use of computational resources so that researchers can focus on innovative analyses leading to new insights.”
The project leverages Microsoft Azure data storage and our open and flexible DNAnexus Portals™ workspace to create a secure environment compliant with all of the major data privacy standards (HIPAA, CLIA, CGP, 21 CFR Parts 22, 58, 493, and European data privacy laws and regulations).
As the paper authors note, St. Jude Cloud currently hosts genomic data generated primarily by St. Jude studies, but they envision it will serve as a collaborative research platform for the broader pediatric cancer community in the future.
“User-uploaded data can be analyzed and explored alongside the wealth of curated and raw pediatric genomic data on St. Jude Cloud, and deposition of user data into St. Jude Cloud requires minimal effort. In this regard St. Jude Cloud represents a community resource, framework, and significant contribution to the pediatric genomic sequencing data sharing landscape.”