Supported by grants from the Bill and Melinda Gates Foundation and the Chinese Ministry of Science and Technology, the 3000 Rice Genomes Project (3K RGP) is a multi-agency project directed at acquiring genomic information needed to accelerate rice breeding programs. Led by the Chinese Academy of Agricultural Science (CAAS), the International Rice Research Institute (IRRI) and BGI, researchers hope to enable the rapid development of higher-yielding, drought-tolerant, pest- or disease-resistant strains of rice with high nutritional value. The urgency and importance of this project would be difficult to overstate, considering that the production of rice must increase by at least 25% by 2030 in order to keep up with global population growth and demand. Without these new strains, food shortages could affect half of the world’s human population that depends on rice as a dietary staple.
A Fast Start And A Temporary Delay
Last year, the 3K RGP completed the sequencing of 3,000 rice genomes and released the project’s initial 13 TB dataset. The team then encountered a bottleneck consisting of two related issues. First, the project’s local computing infrastructure was not capable of managing the massive analysis load required to process the dataset within an acceptable timeframe. Second, the 3K RGP calls for widespread distribution of rice genome sequencing data and analysis results in order to stimulate collaboration among the global rice genome research community. Conventional methods of distribution, such as shipping hard drives between the collaborators are not feasible at this scale.
The Solution That Saved More Than A Year
Working with DNAnexus, the 3K RGP was able to deploy a rapid solution to analyze the 3,000 rice genomes dataset and generate more than 100 TB of useful data, without any of the costs or delays typically involved in purchasing and bringing new infrastructure online. Taking advantage of the computing capability of 37,000 compute cores working together, the DNAnexus genome informatics platform completed sequence mapping and variant calling in just two days — more than 200 times faster than would have been possible on local computing infrastructure. This cloud-based solution also solved the issue of data distribution by providing immediate access to data and analysis results, and enabling real-time collaboration among 3K RGP investigators worldwide, who have already discovered hundreds of new genetic markers.
Practical Solutions For The Whole World
Each of these new genetic markers has the potential to be linked to valuable traits that can improve the nutrition, climate tolerance, and disease resistance of new rice varieties. Significantly, this new genetic information generated by the 3K RGP can be used to accelerate the development of new strains using highly efficient cross-breeding, a centuries old technique but now informed by “big data,” rather than direct genetic modification. The result would be robust, high-yielding strains of rice, without the concerns surrounding genetically modified organisms (GMO) from commercial, political or public opinion stakeholders.
We are thrilled to be powering the 3000 Rice Genomes Project, a collaboration that is tackling one of the most exciting opportunities to improve human wellbeing with big data and genomics. DNAnexus is proud to be bringing it all together.