DNAnexus Blog - Product updates, industry insights, opinions and references. From the team powering the Genomics Revolution.

The First Publicly Available “$1000 Genome” Test Dataset! - Inside DNAnexus

Written by Angela Anderson | Jul 30, 2014 9:12:22 AM

At DNAnexus we’re always looking for ways to collaborate on projects that are outside the norm, and this latest collaboration is no exception. We’ve teamed up with the Garvan Institute and AllSeq to offer the genomics community open access to the first publicly available datasets generated using Illumina’s HiSeq X Ten sequencing system.

 

Why are we doing this?

Our goal is to provide sample data that will give scientists a glimpse into what to expect from the technological advances of the HiSeq X Ten. Has the new sequencing technology lived up to Illumina’s promise?

 

Here’s what went down

AllSeq arranged this data-sharing endeavor as a part of its Sequencing Marketplace effort, which aims to educate scientists about different sequencing technologies and match them with providers that offer these technologies.

The Garvan Institute, located in Sydney, Australia, was one of the first  three organizations in the world to acquire the Illumina HiSeq X Ten sequencer. In an effort to educate the genomics community about the potential of this exciting new technology, they made available two whole-genome sequencing data sets, using the popular Coriell Cell Repository NA12878 reference sample, which has been extensively analyzed by the Genome in a Bottle Consortium.

Thanks to the Garvan, visitors have access to two different, high quality data sets (NA12878D and NA12878J), each of which was sequenced on a single lane of an Illumina HiSeq X patterned flow cell, achieving over 120 Gb of yield, with >87% bases with quality > Q30 in just 2.8 days. Each dataset meets the minimum coverage and quality guaranteed by Illumina and is indicative of the potential for the Illumina HiSeq X Ten sequencing system.

DNAnexus stepped in to sponsor the data storage and the bandwidth for downloading the data. In addition, DNAnexus ran analyses on the two genomes to produce metrics providing a benchmark for the scientific community by which to gauge results from the “$1000 genome”.

Visitors can gain access to view and download the data without a DNAnexus account via the AllSeq webpage, which takes you to the original FASTQ files, as well as analysis results (BAM and VCF files), and quality metrics calculated using off the shelf tools like FastQC and Picard (MarkDuplicates, CollectInsertSizeMetrics, and CollectWgsMetrics). We’ve also provided a web-based genome browser to visualize one data set (NA12878D). You can access and download all of this data until September 30, 2014.

Those with DNAnexus accounts can also access the data via the “HiSeq X Ten Data” featured project, located on the left hand side of the dashboard. Users are able to copy any of the files to their own DNAnexus projects for further downstream analysis.

We’d love to hear from you! Tell us what you think about the HiSeq X Ten data: info@dnanexus.com.