No other market segment has felt the profound impact of the cloud more than the life sciences industry. In March, a major roadblock was eliminated when the National Institutes of Health lifted its ban on the use of government datasets (dbGaP, TCGA, etc.) in the cloud and updated its security best practices white paper. In the past, individual researchers would download data hosted from a variety of locations, attempt to integrate their own data, and run analyses on their local hardware; a time-consuming endeavor wrought with headaches. This approach has become unsustainable given that data sizes have grown exponentially as the cost of genome sequencing has been driven down. There is now a collective push within the genomics research community to create a cloud commons, something in which DNAnexus wholeheartedly believes.
Just how much data are we talking about? According to recent research, Earth contains around 5.3 x 1037 DNA base pairs. They add: “By analogy, it would require 1021 computers with the mean storage capacity of the world’s four most powerful supercomputers (Tianhe-2, Titan, Sequoia, and K computer) to store this information.”
Fortunately, no one has been asked to manage the DNA for our entire planet’s biomass, but the research points to a very real challenge. With this rising tide of data comes the need for more computational resources, and the question of whether to build or buy infrastructure comes into play. A recent article in The Platform, takes a fascinating dive into how the genomics community is utilizing life sciences clouds. In her article, author Nicole Hemsoth (@NicoleHemsoth) raises the issue of “what life sciences companies are missing is a management system for dealing with petabytes of data and billions of objects.”
She continues to discuss how as the sophistication of data management, storage, compute, security and compliance features become hardened, bursting into the cloud is the most efficient way to utilize local and cloud resources. While most large-scale genome centers have their own local infrastructure, their workloads tend to occur in spikes. In order to mitigate overprovisioning, genome centers are finding their sweet spot by combining in-house infrastructure with bursting into the cloud. And with the advent of genomics cloud solutions such as DNAnexus, there are ways to seamlessly integrate workloads to the cloud.
Another notable trend we’re seeing is how life science companies like Regeneron Genetics Center are moving to a 100% cloud-based solution. Instead of the heavy investment that comes with managing and maintaining IT infrastructure, companies can invest in intellectual property; shifting their focus to R&D to accelerate medical discovery.
While freeing up bandwidth on building and maintaining local hardware is a big appeal for the cloud, the real reason institutions decide to go with DNAnexus is for its genomics platform’s state-of-the-art compliance and security measures. While it’s true Amazon Web Services offers a lot of built-in features to ensure security and privacy and potentially any skilled engineer can go out and spin up Amazon EC2 instances themselves, when handling personal health information it’s just not that simple. DNAnexus has invested a tremendous amount of resources in creating a genomics platform that complies with ISO 27001 international security standards and provides data provenance to certify all operations can be tracked and reported for up to 6 years.
Just as there isn’t one way to genotype, there isn’t one way to take your data to the cloud. The field is constantly evolving, which means you’re constantly doing variant call bake-offs, working with many different tools to assess whether you are getting the correct variants of interest. The important question to ask is how will you manage all these diverse data and research requirements? Do you want to do it yourself or go with a proven genomics platform that offers a complete set of systems already in place to control and manage your data? DNAnexus can help. Drop us a line when you’re ready.