NIH Security Best Practices Update

NIH logoDNAnexus has always taken a proactive approach to security and compliance. We’ve worked closely in partnership with AWS (Amazon Web Services) to provide our mutual customers best-in-class security for genome informatics and data management in the cloud. These efforts have been acknowledged in the updated publication of the National Institutes of Health Genomic Data Sharing Policies. The updated guidelines make it clear that researchers may use AWS and DNAnexus to store and analyze controlled-access genomic data, including dbGaP and TCGA.

Prior to this policy change, NIH guidance would not allow use of commercial cloud computing services for work involving controlled-access genomic data, which, though it has been stripped of identifying information, may be unique to individuals. With the new NIH policy, such data can be used in the cloud after investigators obtain project-specific approval for its use.

Applications for such approval must include a data security plan. At DNAnexus, we were pleased to discover that a publication from our white paper library detailing our own security compliance practices was listed as an information resource for individuals seeking a working understanding of the requirements.

DNAnexus platform features such as two-factor authentication, end-to-end encryption, need-based network access control, 24/7 security monitoring and updates, audit and access logging allow us to satisfy the new requirements and to exceed the security of many local datacenter installations.

DNAnexus is working with AWS and the NIH to establish mechanisms for processing data access requests and vending the access-controlled data to approved requestors, and we expect to be providing seamless access to these important datasets in DNAnexus.

The combination of data security and powerful computing made possible by the partnership between DNAnexus and AWS has created the ideal platform for global scientific collaboration in genomics. We are thrilled now to be witnessing the beginnings of a genomic discovery “Commons,” where data are brought together and analyzed by researchers around the world.  We will continue to meet and exceed existing security standards, working with our partners to enable new kinds of innovation driven by genomic big data.


100% Cloud-based Genome Center Integrating Large Healthcare Data Flows

photo: The Cancer Genome Atlas
photo: The Cancer Genome Atlas

In a previous post, our new CMO, David Shaywitz, talked about his vision for DNAnexus and its role in helping fulfill the promise of genomic medicine:

“DNAnexus represents a natural home for these aspirations, offering a compelling, secure, cloud-based data management platform, an enabling tool for any healthcare organization – academic medical center, healthcare system, biopharma company, payor – who recognizes that getting a handle on large healthcare data flows is rapidly becoming table stakes, and that figuring out how to manage and leverage genomic data is a wise place to start.”

Fast-forward two months…  This week, we announced exciting progress in our efforts to accelerate genomic medicine.  The DNAnexus cloud-based genome informatics and data management platform is powering a number of collaborations between Regeneron Genetics Center (RGC) and its leading healthcare provider partners.

In a RGC press release, they announced these new collaborators, which include the Geisinger Health System, Columbia University Medical Center, Clinic for Special Children, and Baylor College of Medicine. The RGC will be using the DNAnexus platform to integrate sequencing data with de-identified clinical records from patient volunteers. To date, the RGC has sequenced samples from more than 10,000 individuals and is currently sequencing more than 50,000 samples per year.

The Geisinger collaboration, which has been described as the largest clinical sequencing project in the U.S., is on track to sequence more than 100,000 patient volunteer samples. This DNAnexus-powered initiative has resulted in the first 100% cloud-based biopharma genome center, and is now operating at scale.

Next-generation sequencing technologies, like Illumina’s HiSeq 2500 or X Ten platform, have reduced the cost and increased the speed of DNA sequencing outpacing Moore’s Law to the point where the new bottleneck is genome informatics. To address this issue, companies like Regeneron are adopting cloud-based solutions to handle the massive volume of sequencing data.

DNAnexus provides the technology backbone that enables the sharing and management of data and tools around large volumes of sequencing data between the RGC and its healthcare collaborators. Currently the RGC is processing more than 1,000 exomes per week and sharing the data easily and safely with their collaborators.

In order to improve patient care and ultimately human health, the integration of genomic and phenotypic data needs to happen on a massive scale (something David has recently discussed from the perspective of phenotype here and here). Combining large cohorts of deeply-phenotyped individuals with their genomic data offers a wide range of medical applications, the most obvious being a more personalized approach to medical interventions such as which therapy might work best for a given individual. These data can also be used to aid in the development of new companion diagnostics and clinical trial participant selection. As an article in GigaOM put it this week: Cloud Computing is Coming for Your DNA, and it Will Lead to Better Drugs and Health Care.

These collaborations are powerful examples of how the DNAnexus platform is enabling an integrated approach between biopharmaceutical companies and their partners to accelerate the research and discovery process. As David said, healthcare industry leaders who prioritize the management of large healthcare data flows will emerge as the pioneers who help us realize the full vision of precision medicine –delivery of the optimal therapy to the right patients at the right time – ideally before they are sick.

DNAnexus Introduces Faster Cloud Options

Spring has arrived at DNAnexus, ushering in important updates! Starting May 1, 2014, we are excited to announce your analyses on DNAnexus will be faster, thanks to new instance types .

What does that really mean? Here’s an example before we dive into all the details…  A specific exome pipeline (e.g., BWA-MEM, GATK-Lite) now runs in less than 4 hours! Previously, the run would have taken nearly 6 hours.

New instance types

We believe, and hope you do too, that DNAnexus is the best choice for expanding your genomic analysis infrastructure. Because, unlike local equipment, which from day one starts collecting dust in your server room while technological advances pile up, the cloud is always on the forefront of computing technology as newer, faster hardware is made available.

These new hardware options are in the form of new instance types (virtual computer configurations) on which your cloud analyses can run. And thanks to the flexibility and reproducibility aspects of the DNAnexus platform, you can start using these new instance types right away—simply launch your existing analyses on one of those new instance types (e.g., using the “–instance-type <…>” option of our “dx run” command-line tool) and enjoy a completely effortless hardware upgrade!

The new instance types are built on high-frequency Intel® processors of the Sandy Bridge and Ivy Bridge microarchitectures, support the Intel® Advanced Vector Extensions (Intel® AVX), and have solid-state drive (SSD) local storage technology for fast I/O performance.

The following table summarizes these new instance types. For a given column (which represents a certain number of cores and local storage capacity), there are up to three different instance types to choose from (with different amounts of memory). Overall these new instance types span a large spectrum, starting at 2 cores, 32 GB SSD, and 3.8 GB RAM, all the way to 32 cores, 640 GB SSD, and 244 GB RAM:

summary new instance types
In an effort to be more informative and transparent, we have also come up with a new, easy to remember, and consistent naming scheme:

  • The prefix (mem1, mem2, or mem3) denotes the memory capacity per core;
  • the infix (ssd1) denotes that these instances have solid-state drive technology;
  • the suffix (x2 through x32) denotes the number of cores.

New names for existing instance types

We liked the convenient new naming scheme so much that we have applied it to existing instance types as well, as shown in the following table.

Compared to the new instance types mentioned earlier, the existing instance types are distinguished by a different storage infix (hdd2), given their regular hard disk drive technology. More information is available on our wiki page, which explains the new naming conventions and includes a detailed list of all instance types.

new instance names
To ease the transition, existing instances can currently be called by either their original name or the new name; the DNAnexus system understands both. However, we encourage you to adopt the new names in a timely manner to avoid any future interruption.

We are very excited to announce these important updates, and we cannot wait to hear your success stories out of them. Drop us a note at if you’d like to get in touch with us.