Introducing htsget, a new GA4GH protocol for genomic data delivery

DNAnexus is here in Orlando for the fifth plenary meeting of the Global Alliance for Genomics and Health (GA4GH), the standards-making body advancing interoperability and data sharing for genomic medicine. We’re especially pleased this year to join in launching version 1.0 of htsget, a new protocol for the secure web delivery of large genomic datasets, especially whole-genome sequencing reads which can exceed 100 gigabytes per person. 

Htsget complements the incumbent BAM and CRAM file formats for reads, which GA4GH also stewards, and their ecosystem of tools. It adds a standardized protocol for accessing such data over the web, securely, reliably, efficiently, and even federally when needed. Retrieval with htsget is now built into the ubiquitous samtools via its underlying htslib library, allowing bioinformaticians to leverage htsget with most existing tools via a familiar Unix pipe. At the same time, htsget’s streaming parallelism enables scalable ETL into cluster environments like Apache Spark, providing a gradual transition path from incumbent file-based toolchains toward modern “big data” platforms. Lastly, htsget simplifies data access for interactive genome browsers, by unifying authentication and removing the need for index files.

On the server side, htsget has been deployed at the Sanger Institute and the European Genotype Archive; DNAnexus operates a multi-cloud htsget server indexing data within Amazon S3 and Azure Blob storage, which we call htsnexus; Google Cloud Platform has open-sourced their own implementation. Clients can speak a uniform protocol abstracting the diverse authentication and storage schemes of these service providers.

These groups, and others, have all shaped the htsget specification through the GA4GH’s highly collaborative process. But it started in large part with a contribution from DNAnexus, drawing on our experience optimizing how our systems utilize cloud object stores in the huge genome projects we’ve served, such as CHARGE, 1000 Genomes Project, TCGA, and HiSeq X Series data production. Through htsget and other work streams under the new GA4GH Connect framework announced today, DNAnexus looks forward to further contributing from our experience and network to advance the GA4GH’s essential mission.

For more information about how DNAnexus is working with htsget, please contact us at info@dnanexus.com.

Upgrading to TLS 1.2

As a part of our continued efforts to maintain the highest security standards on the DNAnexus Platform, we will deprecate support of TLS 1.0 and TLS 1.1 on October 15th, 2017. We have been communicating this proposed change, and most of the customers and users have upgraded.

If you are a user that still has programs or processes using TLS 1.0 or 1.1 when interfacing with the DNAnexus Platform you will need to take a few simple steps to upgrade. Please follow the instructions below to upgrade to TLS 1.2.

What is TLS?
TLS stands for “Transport Layer Security” and is a protocol that ensures connections made to a remote endpoint are the intended destination through encryption and endpoint identity verification. DNAnexus web and API connections use TLS as a key security component, thus it is important that the latest version of TLS is supported.

What do I need to do?
For macOS/OS X Users:
If you are using the DNAnexus Platform SDK (a.k.a. dx-toolkit) with the Python version provided with your operating system, you will need to install an alternative Python version with TLS 1.2 support. We suggest using the Homebrew package manager to install Python version 2.7.13:
Install Homebrew using the instructions at https://brew.sh/.

Once Homebrew is installed, run the following command in your terminal prompt:
brew install python

Once the Python version is installed follow the instructions at https://wiki.dnanexus.com/Downloads to download, unpack, and activate the latest dx-toolkit release.

For Internet Explorer Web Browser Users:
If you use Internet Explorer version 10 or earlier, you will need to upgrade your web browser to Internet Explorer 11 or later.

For All Other Users:
If you use a PC, or if your web browser is Internet Explorer 11 or later, Chrome, Firefox, or Safari, we do not expect that your access to DNAnexus will be impacted by this upgrade.

If you are impacted, please make the necessary modifications by October 15th, 2017 in order to maintain continued access to DNAnexus. Please do not hesitate to contact support@dnanexus.com with any questions or concerns.

DNAnexus and Saphetor Collaborate for Seamless Integration of Tertiary Analysis Solution

DNAnexus has teamed up with Saphetor, a leading variant analysis company, to build a sample-in, report-out genomic analysis solution. Saphetor annotates and classifies genetic variants from next-generation sequencing (NGS) data to help clinicians quickly and accurately diagnose disease to make faster, more precise treatment decisions. Saphetor’s technology is now available on the DNAnexus cloud-based platform.

Saphetor has built a powerful genome interpretation engine by integrating more than 30 public and licensed databases containing genotype, phenotype, variant, drug, and clinical trial information. Automated annotation ensures a comprehensive understanding of variant significance and implication for disease. Each variant is annotated with gene and functional position, protein functional impact, population allele frequencies, and pathogenicity prediction scores. Using the DNAnexus-Saphetor integration, researchers globally can conduct secure, whole-genome analysis leveraging Saphetor’s databases containing 33 billion variant annotation points.

Data from the DNAnexus Platform is exported to Saphetor via a secure API, enabling customers to take advantage of this comprehensive analysis solution. Saphetor’s powerful user interface allows customers to interactively browse and use powerful filters on the annotated and classified variant list in an intuitive fashion. Click on the image below to see a sample of the variant analysis interface.

Together with the scalability and high-performance computing power of the DNAnexus Platform, and Saphetor’s variant browsing tools, customers can quickly move from NGS data analysis to interpretation. Researchers can discover which variants have a functional impact on disease in the hopes of accelerating the implementation of precision medicine. We are excited about our collaboration with Saphetor to offer a secure and scalable environment to power an end-to-end analytical solution for genomic biomarker discovery and interpretation.

Interested in trying out Saphetor’s technology on DNAnexus? Get in touch with a member of our Science Team.