Bioinformaticians for Good: DNAnexus Aids in Open Science Collaborations In Fight Against Coronavirus

Sequestered in bedrooms, kitchens, and makeshift office spaces around the world, a virtual army of bioinformatic enthusiasts has banded together to do their bit in the battle against coronavirus.

These data doyens are working behind the scenes to build the tools and infrastructure scientists need to better understand and respond to the virus that is currently ravaging the world. They are helping to overcome current obstacles to research, and they are coming up with brand new solutions to a number of challenges related to drug discovery, development, and delivery.

They are gathering online for “hackathons,” which are essentially grassroots, sprint-like design events in which groups of experts (usually in small teams) compete or collaborate intensively over a number of days to solve technical challenges.

Dozens of hackathons devoted to COVID-19 are being launched worldwide, involving thousands of scientists, including many from DNAnexus.

Principal scientist Ben Busby is helping to coordinate several of the events, a role he is familiar with after helping to organize more than 40 hackathons in his former job as genomics outreach coordinator at NCBI.

Hackathons are great ways to generate new ideas, foster intense learning and facilitate data sharing and collaboration among people who would never have met otherwise, he said.

“And the DNAnexus Platform is a great data integration and collaboration solution, especially since it already has security built in,” Busby said.

COVID-19 Biohackathon

April 5-11, 2020

COVID-19 Biohackathon Logo

Hundreds of scientists and bioinformaticians have already signed up for this week-long event, organized by an international team of established biohackers. 

Using video conferencing, e-mail, Slack messaging, wikis, and source code repositories, teams will work together to address more than 20 topics.

Among them is a project to create a coronavirus “pangenome.” This would involve collecting genomic data for SARS-CoV-2 and the five other types of coronavirus known to infect humans (SARS-CoV, HKU1, NL63, OC43 and 229E), and coming up with data comparison and visualization programs in order to help researchers answer questions like: What virulence factors do SARS and CoV have in common? What proteins influence virulence, and are we likely to see a new strain emerge that is even more virulent?

hackseqRNA: COVID-19 Ultra-hackathon

April 4, May 2, May 22-24

RNAseq Hackathon Logo

In this peer-led hackathon, participants are encouraged to propose projects and organizers will help recruit a team of interdisciplinary RNA biologists, bioinformaticians, statisticians, computer scientists, and developers to turn the idea into a reality.

The hackathon entails one-day coding ‘sprints’ to develop hackseqRNA projects and lay the ground-work for research efforts, followed by a final three-day push to bring each project to completion.

Among the projects to be tackled: Using RNA sequencing data to better understand how coronaviruses react with the immune system, how different drugs work within the system, and whether personalized treatments might be developed based on individual patients’ responses.

Amplifying High Impact Potential Research for Novel Coronavirus

April 4-5

Coronavirus High Impact Research Logo

Organized by the non-profit group Research to the People and Stanford University, this hackathon involves several initiatives, from tapping into wearable devices like FitBit and Apple Watch to track infectious diseases and predict the onset of symptoms, to using computational models to simulate the effectiveness of potential coronavirus therapies.

Busby will be leading a team exploring novel genomics research directions for COVID-19, including the potential for using HLA typing to determine a person’s (or population’s) predisposition to COVID-19 severity. A kind of genetic test used to identify certain variations in a person’s immune system, HLA (short for human leukocyte antigen) typing is currently used to identify which people can safely donate bone marrow, cord blood, or an organ to a person who needs a transplant.

Busby’s team will investigate any correlations between HLA types and an individual’s susceptibility to coronavirus, as well as the severity of their reactions when infected. They may also look at potential epidemiological ramifications based on geographic distributions of HLA types. This research will help to lay the groundwork for the HLA COVID-19 Project.

Seeding New Science

Although the projects may not result in complete, polished solutions, they will help seed new science, which can be just as valuable, Busby said.

“This could be a really good opportunity to think about precision medicine in infectious diseases, something that’s been largely ignored before now,” he added. “Treatments would be way more effective if you knew your immune type. It could also help in the assessment of personal susceptibility and risk.”

Busby also notes that even if the current COVID-19 pandemic subsidies, the virus — or others like it — will likely re-emerge.

“Having responses ready when it does re-emerge would be very helpful, but we have to start now,” he said. “Our contribution to the ‘all hands on deck’ coronavirus call is to build an infrastructure so that we will be able to respond in a much better, far more constructive, way.”

Doubling Down on Next Generation Sequencing: How TwinStrand Biosciences and DNAnexus Work Together

In the majority of cancer patients, the first sign that something is wrong is finding an already well-established tumor. But what if we could detect cancer at the first sign that the immune system is unable to correct it, when there are only a few cancer cells present? This is one of the challenging problems that TwinStrand Biosciences hopes to solve with their error correction technology, Duplex Sequencing™.

While current next generation sequencing (NGS) methods positively impact how we conduct research and discovery and stratify patients into disease categories, they aren’t yet sensitive enough to detect a small number of cancer cells among millions of healthy cells in the blood. Nor are they sensitive enough to detect the rare mutations induced by carcinogen exposure.  And this is where the TwinStrand Duplex Sequencing Technology™ comes in.

The many steps involved in a typical NGS workflow — DNA isolation, fragmentation, amplification, etc. — can all introduce technical errors. Likewise, different sequencing platforms possess idiosyncratic differences that might also introduce errors. The goal of any NGS run and associated bioinformatic pipeline is to correct the error and separate the “noise” from true signal. However, TwinStrand takes error detection to a whole new level through use of their proprietary error correction technique that detects and masks technical errors with much greater sensitivity by examining and comparing both strands of a DNA sequence.

Twinstrand NGS Pipeline

Typical sources of error in next generation sequencing (NGS)

Duplex Sequencing™ is an ultra-high accuracy sequencing method that overcomes the limitations of Next-Generation Sequencing by independently tracking both strands of individual DNA molecules. The paired sequences are compared to eliminate technical errors affecting only one strand, revealing ultra-low frequency biological mutation signal. Using a combination of proprietary biochemistry and cloud-based informatics, Duplex Sequencing greatly increases the resolution of NGS by reducing error rates from about 1-in-100 with standard NGS to below 1-in-10 million.

Duplex Sequencing can be used on any Illumina® sequencing platform. While early cancer detection and testing for residual disease make for natural applications of the technology, TwinStrand anticipates that the technology will be used to detect the emergence of antibiotic or antiviral resistance, occupational or environmental exposure to carcinogens, fetal abnormalities in non-invasive prenatal testing (NIPT), and in crime scene forensics. 

Indeed, with the wide array of suitable applications, TwinStrand hopes to make their technology as accessible as possible — even to those with little NGS experience or bioinformatics expertise. One of the ways they’ve done this is by implementing their analytic pipeline on the DNAnexus Platform and making the platform available to their customers. When a sequencing run is complete, users can easily upload it to a DNAnexus TwinStrand Portal where the TwinStrand Biosciences analytical pipeline can be run. Then, users can access the results as well as share the data to collaborate with other users. Offering their pipeline on DNAnexus as part of their Duplex Sequencing assays made it easy for TwinStrand to deploy a complete kit-plus-informatics solution from the initial R&D on the pipeline through the final commercialization of the technology.

TwinStrand launched their technology with four new assays at the Advances in Genome Biology and Technology (AGBT) conference on Marco Island, FL, February 23-26. Specifically, they presented the use of their technology to measure residual disease in acute myeloid leukemia (AML MRD) and mutagenesis in human, mouse, and rat models. For more information contact info@twinstrandbio.com.

DNAnexus Platform Updates: A Year in Review

DNAnexus Platform Updates

Now that we’re well into the new year, is everybody remembering to sign important documents with the four-digit year instead of the two-digit year to prevent fraud and accounting disasters?

Now that we have that settled, it’s time to do a little accounting of our own. Were you aware that we made over 20 enhancements/updates to the DNAnexus Platform last year? In case you missed them, take a look at the summary of updates, below. 

New Documentation Site

We updated our documentation with new content, advanced search capabilities, and upgraded design to improve your experience! View the new documentation center at https://documentation.dnanexus.com/

Documentation Overview

Controlled Tool Access

We made modifications so that project admins who have clinical GxP requirements can specify a controlled list of tools at the project level that limits which tools can be run by users in that project.

Tools Library Refresh

We redesigned the Tools Library page to easily filter through Apps and Global Workflows for a seamless user experience. You can now launch Global Workflows directly from the library user interface. Learn more.

Tools Library

New Sentieon Applications

We added 7 new Sentieon applications to the DNAnexus Platform! You can access the full list of tools in your DNAnexus account here

Audit Trail Enhancements

We added a human- and machine-readable daily log of all activities related to the users and projects of your organization. Learn more about how this enhancement fulfills a 21 CFR Part 11 requirement.

Overall New Look to the UI

We updated the overall look of the user interface to be cleaner, flatter, and more modern. Primary actions now appear on the right side of the screen. As a result, you will find it easier and faster to find what you’re looking for. 

Updates to Projects 

We made multiple changes to Projects: 

  • We redesigned the Project List page with an easily filterable list of all your projects. A new “pin” feature allows you to mark your favorite projects and they will remain on top of the list! 
Demo Project
  • Projects now display a line of summary text in the main list. You can add even longer text in the Descriptions section of the Info panel.
DNAnexus Platform Projects UI
  • We added a new Info panel which allows you to quickly inspect any project when you select its row. The info panel can be opened by clicking the “i” icon in the upper right. This panel displays information (metadata, project settings, project size, etc.) which is also available in the Project Settings. Now you can access this information directly from the Project list page. The Info panel also enables you to easily copy the project ID.

Tool Runner

We modified the Tool Runner with a graphical representation of the analysis process. Each app or workflow has three areas that you can configure: Settings, Analysis Inputs, and Stage Settings.

  • Settings include execution name, project location, output folder, and optional advanced features.
  • Analysis Inputs is where you can select appropriate inputs and can toggle to batch mode. Also, you can now view all inputs in one location.
  • Stage Settings contain information about each stage of a workflow, including app version, instance type, and output folder. You can change these as desired.
DNAnexus Platform Analysis

Portal Configuration

For the portal admin, we simplified portal configuration to make it easier to design a custom experience. Learn more about DNAnexus Portals here or, if you are already a Portal customer, get the details here.

AWS Instance Types

We added new AWS instance types with high memory, better CPU, and more disk space, which result in better performance for some workflows. You can access both old and new instance types at the same time. See the documentation for more information.

HiTRUST Certification

In August of 2019, we earned HiTRUST certification. HiTRUST, short for Health Information Trust Alliance, works with organizations in healthcare, technology, and information technology to develop a standard set of controls that govern the secure storage and exchange of sensitive or regulated information, such as protected health information (PHI).

New Archival Service

We implemented a new file-based archive service, which enables you to perform simplified archive and unarchive of projects, folders, and files without a request to DNAnexus.

JupyterLab Support for GPU Instances

We added JupyterLab support for GPU instances. The DNAnexus JupyterLab environment now has CUDA (Compute Unified Device Architecture) libraries pre-installed when running on GPU-enabled instances. This update supports efficient training of sophisticated deep learning models.

New Association Browser 

We created an Association Browser to view and visualize results of Genome-Wide Association Studies (GWAS) and Phenome Wide Association Studies (PheWAS). Learn more.

Rich Applications in HTTPS Environment

We made some modifications such that you can access your data to build Dash, R Shiny, and TensorBoard apps and run them in the HTTPS environment. Running these apps on the DNAnexus Platform means that you can easily connect with your data within the boundaries of a secure environment.

Migration to Python Version 3.0

And finally, we took steps to help you prepare for the sunsetting of Python version 2.x. To learn more read the FAQ.

At DNAnexus, we are committed to building a platform that helps you advance scientific innovation and precision medicine. For more information, reach out and contact us.