The Rising Tide of Genomic Data Points to the Cloud

No other market segment has felt the profound impact of the cloud more than the life sciences industry. In March, a major roadblock was eliminated when the National Institutes of Health lifted its ban on the use of government datasets (dbGaP, TCGA, etc.) in the cloud and updated its security best practices white paper. In the past, individual researchers would download data hosted from a variety of locations, attempt to integrate their own data, and run analyses on their local hardware; a time-consuming endeavor wrought with headaches. This approach has become unsustainable given that data sizes have grown exponentially as the cost of genome sequencing has been driven down. There is now a collective push within the genomics research community to create a cloud commons, something in which DNAnexus wholeheartedly believes.

Just how much data are we talking about? According to recent research, Earth contains around 5.3 x 1037 DNA base pairs. They add: “By analogy, it would require 1021 computers with the mean storage capacity of the world’s four most powerful supercomputers (Tianhe-2, Titan, Sequoia, and K computer) to store this information.”

Platform logoFortunately, no one has been asked to manage the DNA for our entire planet’s biomass, but the research points to a very real challenge. With this rising tide of data comes the need for more computational resources, and the question of whether to build or buy infrastructure comes into play. A recent article in The Platform, takes a fascinating dive into how the genomics community is utilizing life sciences clouds. In her article, author Nicole Hemsoth (@NicoleHemsoth) raises the issue of “what life sciences companies are missing is a management system for dealing with petabytes of data and billions of objects.”

She continues to discuss how as the sophistication of data management, storage, compute, security and compliance features become hardened, bursting into the cloud is the most efficient way to utilize local and cloud resources. While most large-scale genome centers have their own local infrastructure, their workloads tend to occur in spikes. In order to mitigate overprovisioning, genome centers are finding their sweet spot by combining in-house infrastructure with bursting into the cloud. And with the advent of genomics cloud solutions such as DNAnexus, there are ways to seamlessly integrate workloads to the cloud.

Another notable trend we’re seeing is how life science companies like Regeneron Genetics Center are moving to a 100% cloud-based solution. Instead of the heavy investment that comes with managing and maintaining IT infrastructure, companies can invest in intellectual property; shifting their focus to R&D to accelerate medical discovery.

While freeing up bandwidth on building and maintaining local hardware is a big appeal for the cloud, the real reason institutions decide to go with DNAnexus is for its genomics platform’s state-of-the-art compliance and security measures. While it’s true Amazon Web Services offers a lot of built-in features to ensure security and privacy and potentially any skilled engineer can go out and spin up Amazon EC2 instances themselves, when handling personal health information it’s just not that simple. DNAnexus has invested a tremendous amount of resources in creating a genomics platform that complies with ISO 27001 international security standards and provides data provenance to certify all operations can be tracked and reported for up to 6 years.

Just as there isn’t one way to genotype, there isn’t one way to take your data to the cloud. The field is constantly evolving, which means you’re constantly doing variant call bake-offs, working with many different tools to assess whether you are getting the correct variants of interest. The important question to ask is how will you manage all these diverse data and research requirements? Do you want to do it yourself or go with a proven genomics platform that offers a complete set of systems already in place to control and manage your data? DNAnexus can help. Drop us a line when you’re ready.

NIH Security Best Practices Update

NIH logoDNAnexus has always taken a proactive approach to security and compliance. We’ve worked closely in partnership with AWS (Amazon Web Services) to provide our mutual customers best-in-class security for genome informatics and data management in the cloud. These efforts have been acknowledged in the updated publication of the National Institutes of Health Genomic Data Sharing Policies. The updated guidelines make it clear that researchers may use AWS and DNAnexus to store and analyze controlled-access genomic data, including dbGaP and TCGA.

Prior to this policy change, NIH guidance would not allow use of commercial cloud computing services for work involving controlled-access genomic data, which, though it has been stripped of identifying information, may be unique to individuals. With the new NIH policy, such data can be used in the cloud after investigators obtain project-specific approval for its use.

Applications for such approval must include a data security plan. At DNAnexus, we were pleased to discover that a publication from our white paper library detailing our own security compliance practices was listed as an information resource for individuals seeking a working understanding of the requirements.

DNAnexus platform features such as two-factor authentication, end-to-end encryption, need-based network access control, 24/7 security monitoring and updates, audit and access logging allow us to satisfy the new requirements and to exceed the security of many local datacenter installations.

DNAnexus is working with AWS and the NIH to establish mechanisms for processing data access requests and vending the access-controlled data to approved requestors, and we expect to be providing seamless access to these important datasets in DNAnexus.

The combination of data security and powerful computing made possible by the partnership between DNAnexus and AWS has created the ideal platform for global scientific collaboration in genomics. We are thrilled now to be witnessing the beginnings of a genomic discovery “Commons,” where data are brought together and analyzed by researchers around the world.  We will continue to meet and exceed existing security standards, working with our partners to enable new kinds of innovation driven by genomic big data.


A Safe (and Compliant) Haven for Genomic Data in the Cloud

Despite a general comfort with putting personal information on Facebook or LinkedIn or plugging our credit card numbers into websites to book travel, buy birthday presents or rent movies, one of the earliest and most lasting concerns raised about storing genomic data in the cloud has been whether the data are secure.

And rightfully so. Data security isn’t a “nice-to-have” when it comes to personally-identifiable DNA sequence data; it’s essential. With genomic sequencing emerging as essential to clinical development and the delivery of both diagnostics and therapies, compliance with regulations that apply to the handling of genetic data and its subsequent integration into other medical data systems are equally critical. As raw data are converted into more meaningful information, they become an asset as valuable and sensitive as any other personal information, currency, or intellectual property.

We’ve taken a very proactive approach to security and compliance at DNAnexus. Just as hospitals put the highest possible premium on security of their data, so too do cloud platform providers — because their entire business rides on utilizing best-in-class measures to assure the security, integrity and availability of their customers’ data. Our platform was developed from the ground up with this in mind and includes a number of features that allow each user to create a secure and compliant environment that will meet their unique needs today and in the future.

More specifically, the DNAnexus platform was developed with the internationally accepted ISO 27002 controls for best practices in information security and includes a number of features to ensure the highest level of data security for both research and clinical use, including:

  • Data integrity:
    • SAS-70 and PCI certified physical security of data centers
    • Data encryption (with full-disk AES-256 for data storage and SSL/TLS for data transport)
    • Third-party security audit
  • Access control:
    • Member administrators control access and retention policies
    • Passwords must be complex and periodically changed
    • Accounts timeout when idle, and lock when unused, and after too many incorrect login attempts
  • Administrator restrictions:
    • Two-factor authentication required
    • All administrative access is controlled and logged
  • API access restrictions:
    • API key required and limited to a validity period

To provide additional assurance to our users, we received an independent auditors’ certification of our compliance with ISO-27001 with respect to the management of our information systems.

To comply with clinical requirements relating to data integrity and reproducability, DNAnexus supports data logging and auditability for 6 years, and versioned and reproducible analysis tools and results.  Collectively, the security and compliance features implemented in our platform enable compliance with HIPAA, CLIA, Good Clinical Practice (GCP), 21 CFR Parts 11, 58, 42 CFR part 493, European Data Privacy laws and regulations (EU Directive 95/46/EC) and dbGaP Best Practices. For additional details please review our following white papers on our security and compliance practices:

We also work closely with our partners at Amazon Web Services to develop and deploy security strategies that are often far more sophisticated than those used in, or even available to, most premises-based data centers. Whether your data is at rest or in motion as you share it across your project group, you can be sure it’s protected within the DNAnexus platform.

If you are interested in learning more about our security and compliance measures, please visit