Cancer Genomes Dataset Now Hosted on Amazon Web Services

Today, Amazon Web Services (AWS) and the Ontario Institute for Cancer Research (OICR) made available on Amazon’s Simple Storage Service (S3) the International Cancer Genomes Consortium (ICGC) Pan-Cancer dataset, with more than 2,400 consistently analyzed whole genomes from over 1,100 unique ICGC donors.

Hundreds of terabytes of genome sequence alignments and variant information are now hosted on AWS and can be used to explore the genomic basis of cancer, accelerate research, and develop more targeted therapies. In addition to the raw data, the dataset contains a total of nearly 4 million identified mutations and known differences between tumor and normal genes.

We are thrilled by this news, which reduces the technical barriers to accessing and working with ICGC data. Those DNAnexus users who are also authorized ICGC researchers will have the convenience and speed of data access without needing to transfer it from far-flung repositories or to their local infrastructure, a process that previously could take months and required substantial compute resources. This will allow the rich set of DNAnexus tools and pipelines, and the easy tool development environment of DNAnexus to be applied to this cancer data, all with security, compliance, and practically unlimited scalability.

“The DNAnexus Platform provides an environment where our own data will be able to live with ICGC data, performing sophisticated analysis to extract knowledge and the ability to share with other researchers from around the world in real time,” said Steve Rozen, PhD, Director of Duke-NUS Center for Computational Biology.

ICGC researchers can apply for cloud data access through the ICGC DACO and use their access token with a DNANexus app wrapping the ICGC DCC storage client. The controlled-access data transferred from S3 can then be either processed ephemerally or stored in a secure DNAnexus project for later use. Learn more at our ICGC_fetcher GitHub repository and DNAnexus project, or schedule a scientific consultation with our team.

ICGC

With ICGC data now hosted on AWS, researchers at institutions large and small will be able to access this large dataset easily on the DNAnexus Platform, easing the technical and computational barriers for cancer genomics analysis and data sharing. We’re committed to further the collective understanding of cancer at the genomic level and move science forward. Stay tuned for more!