Skip to content

Inside DNAnexus

Product updates, industry insights, opinions and references. From the team powering the Genomics Revolution.

The First Publicly Available “$1000 Genome” Test Dataset!

At DNAnexus we’re always looking for ways to collaborate on projects that are outside the norm, and this latest collaboration is no exception. We’ve teamed up with the Garvan Institute and AllSeq to offer the genomics community open access to the first publicly available datasets generated using Illumina’s HiSeq X Ten sequencing system.

$1000 Genome X Ten


Why are we doing this?

Our goal is to provide sample data that will give scientists a glimpse into what to expect from the technological advances of the HiSeq X Ten. Has the new sequencing technology lived up to Illumina’s promise?


Here’s what went down

AllSeq arranged this data-sharing endeavor as a part of its Sequencing Marketplace effort, which aims to educate scientists about different sequencing technologies and match them with providers that offer these technologies.

The Garvan Institute, located in Sydney, Australia, was one of the first  three organizations in the world to acquire the Illumina HiSeq X Ten sequencer. In an effort to educate the genomics community about the potential of this exciting new technology, they made available two whole-genome sequencing data sets, using the popular Coriell Cell Repository NA12878 reference sample, which has been extensively analyzed by the Genome in a Bottle Consortium.

Thanks to the Garvan, visitors have access to two different, high quality data sets (NA12878D and NA12878J), each of which was sequenced on a single lane of an Illumina HiSeq X patterned flow cell, achieving over 120 Gb of yield, with >87% bases with quality > Q30 in just 2.8 days. Each dataset meets the minimum coverage and quality guaranteed by Illumina and is indicative of the potential for the Illumina HiSeq X Ten sequencing system.

DNAnexus stepped in to sponsor the data storage and the bandwidth for downloading the data. In addition, DNAnexus ran analyses on the two genomes to produce metrics providing a benchmark for the scientific community by which to gauge results from the “$1000 genome”.

Visitors can gain access to view and download the data without a DNAnexus account via the AllSeq webpage, which takes you to the original FASTQ files, as well as analysis results (BAM and VCF files), and quality metrics calculated using off the shelf tools like FastQC and Picard (MarkDuplicates, CollectInsertSizeMetrics, and CollectWgsMetrics). We’ve also provided a web-based genome browser to visualize one data set (NA12878D). You can access and download all of this data until September 30, 2014.

Those with DNAnexus accounts can also access the data via the “HiSeq X Ten Data” featured project, located on the left hand side of the dashboard. Users are able to copy any of the files to their own DNAnexus projects for further downstream analysis.

We’d love to hear from you! Tell us what you think about the HiSeq X Ten data:


About DNAnexus

DNAnexus the leader in biomedical informatics and data management, has created the global network for genomics and other biomedical data, operating in 33 countries including North America, Europe, China, Australia, South America, and Africa. The secure, scalable, and collaborative DNAnexus Platform helps thousands of researchers across a spectrum of industries — biopharmaceutical, bioagricultural, sequencing services, clinical diagnostics, government, and research consortia — accelerate their genomics programs.

The DNAnexus team is made up of experts in computational biology and cloud computing who work with organizations to tackle some of the most exciting opportunities in human health, making it easier—and in many cases feasible—to work with genomic data. With DNAnexus, organizations can stay a step ahead in leveraging genomics to achieve their goals. The future of human health is in genomics. DNAnexus brings it all together.