New 3000 Rice Genomes AWS Public Dataset – Easy Access on DNAnexus Platform

shutterstock_110850977In June we announced that DNAnexus was powering the 3000 Rice Genomes Project (3K RGP).  You can read the press release here and blog here.   The project, a partnership between the Chinese Academy of Agricultural Sciences (CAAS), the International Rice Research Institute (IRRI), and BGI in China along with their numerous collaborators globally, are attempting to help feed the world’s growing population.  Rice is a diet staple for half of the world’s human population.  It is estimated that the production of rice must increase by at least 25% by 2030 in order to keep up with global population growth and demand.

3K RGP partnered with DNAnexus to develop the bioinformatics pipeline to analyze the sequence data of 3,000 different rice varieties against five published draft genomes. Performing the analysis on the DNAnexus Platform allowed them to leverage the scalable computing capability at AWS to process more than 100 TB of source genomic data across 37,000 concurrent compute cores in just two days — more than 200 times faster than would have been possible on local computing infrastructure. Located across the globe in 10 countries, the 3K RGP investigators were able to access results and collaborate in real time. The result has been the identification of hundreds of new genetic markers, each a potential pathway to improving outcomes for rice production.

Today we are happy to share with you that AWS has made available the genomic analysis data of 3,000 rice varieties as an AWS Public Dataset. The data contains over 30 million genetic variations spanning across all known and predicted rice genes. By making this dataset public, AWS hopes to accelerate research efforts and breeding programs. Knowing the genetic makeup of a rice variety will allow researchers to identify critical genetic markers related to specific phenotypic traits. With this information breeders will be able to make more intelligent choices in variety selection for cross breeding, resulting in more rapid development of rice varieties of higher nutritional content, or improved climate stress tolerance and disease resistance.

DNAnexus has made it easy to access Amazon public resources on the platform. Documentation for accessing the AWS 3K RGP public dataset can be found in the DNAnexus wiki. In addition, the 3K RGP analytical tools and pipelines used to produce the results are available on the DNAnexus Platform listed as a featured project: ‘3000 Rice Genomes’.