Skip to content

Inside DNAnexus

Product updates, industry insights, opinions and references. From the team powering the Genomics Revolution.

Cancer Genomes Dataset Now Hosted on Amazon Web Services

Today, Amazon Web Services (AWS) and the Ontario Institute for Cancer Research (OICR) made available on Amazon’s Simple Storage Service (S3) the International Cancer Genomes Consortium (ICGC) Pan-Cancer dataset, with more than 2,400 consistently analyzed whole genomes from over 1,100 unique ICGC donors.

Hundreds of terabytes of genome sequence alignments and variant information are now hosted on AWS and can be used to explore the genomic basis of cancer, accelerate research, and develop more targeted therapies. In addition to the raw data, the dataset contains a total of nearly 4 million identified mutations and known differences between tumor and normal genes.

We are thrilled by this news, which reduces the technical barriers to accessing and working with ICGC data. Those DNAnexus users who are also authorized ICGC researchers will have the convenience and speed of data access without needing to transfer it from far-flung repositories or to their local infrastructure, a process that previously could take months and required substantial compute resources. This will allow the rich set of DNAnexus tools and pipelines, and the easy tool development environment of DNAnexus to be applied to this cancer data, all with security, compliance, and practically unlimited scalability.

“The DNAnexus Platform provides an environment where our own data will be able to live with ICGC data, performing sophisticated analysis to extract knowledge and the ability to share with other researchers from around the world in real time,” said Steve Rozen, PhD, Director of Duke-NUS Center for Computational Biology.

ICGC researchers can apply for cloud data access through the ICGC DACO and use their access token with a DNANexus app wrapping the ICGC DCC storage client. The controlled-access data transferred from S3 can then be either processed ephemerally or stored in a secure DNAnexus project for later use. Learn more at our ICGC_fetcher GitHub repository and DNAnexus project, or schedule a scientific consultation with our team.


With ICGC data now hosted on AWS, researchers at institutions large and small will be able to access this large dataset easily on the DNAnexus Platform, easing the technical and computational barriers for cancer genomics analysis and data sharing. We’re committed to further the collective understanding of cancer at the genomic level and move science forward. Stay tuned for more!

About DNAnexus

DNAnexus the leader in biomedical informatics and data management, has created the global network for genomics and other biomedical data, operating in 33 countries including North America, Europe, China, Australia, South America, and Africa. The secure, scalable, and collaborative DNAnexus Platform helps thousands of researchers across a spectrum of industries — biopharmaceutical, bioagricultural, sequencing services, clinical diagnostics, government, and research consortia — accelerate their genomics programs.

The DNAnexus team is made up of experts in computational biology and cloud computing who work with organizations to tackle some of the most exciting opportunities in human health, making it easier—and in many cases feasible—to work with genomic data. With DNAnexus, organizations can stay a step ahead in leveraging genomics to achieve their goals. The future of human health is in genomics. DNAnexus brings it all together.