Contributors: Arkarachai Fungtammasan, Jason Chin, Gigon Bae, Fernanda Foertter, Fritz Sedlazeck, Claudia Fonseca
“Houston, we’ve had a hackathon.”
And this hackathon has yielded four creative bioinformatics solutions to address the complexities of structural variants.
Recently, Nvidia and DNAnexus jointly sponsored the NCBI Structural Variant Hackathon at Baylor College of Medicine on October 11-13. The event was attended by 45 participants from Baylor College of Medicine, UT Southwestern, Rice University, Stanford, and the Broad Institute. Some guests even traveled all the way from Qatar to attend.
What is a Structural Variant?
A structural variant refers to any segment of DNA greater than 50 base pairs that has been rearranged in some fashion, whether that be inserted, deleted, duplicated, inverted, or translocated . Structural variants can be contributors to many diseases, including cancer. Yet when compared to single nucleotide variants, our understanding of structural variants isn’t as far along because they are difficult to identify, particularly in short-read sequencing formats.
Leading the Charge.
Ben Busby, Scientific Lead at NCBI, and Fritz Sedlazeck, Assistant Professor from Baylor College of Medicine, led the hackathon as a means to encourage inter-institutional collaboration and thinking to tackle research questions related to structural variants of the genome.
DNAnexus provided cloud computing credit during the Hackathon, and both DNAnexus and Nvidia sent scientists to support attendees with cloud computing, GPU-accelerated computing, and bioinformatics pipeline construction. The participants had the opportunity to learn how to build workflows using the DNAnexus platform graphical user interface or from a command line using Workflow Design Language (WDL). They also could build reproducible prototypes using Jupyter notebook, a collaborative framework for working in the cloud environment. In addition, they were able to learn how to use graphical processing units, or GPUs, to transform the efficiency of bioinformatics workflows. Incidentally, GPUs were originally designed to support high-quality gaming experiences, but now their utility is being harnessed to facilitate computationally-intensive workflows such as Physics Simulation and Deep-learning AI.
The event also included an inspirational talk from Richard Gibbs, Director, Baylor College of Medicine Human Genome Sequencing Center, on how the hacking mindset is actively transforming our understanding of genomics. From the mapping of the first human genome to the current era of precision medicine, many great scientific ideas have originated from hacking.
The 45 participants split into groups and each group went to work brainstorming ideas that they could work on over the next three days. Ideas were pitched to the larger group and refined based on feedback.
The next three days were devoted to implementing each of the ideas. And this was when the room came alive. With help from the DNAnexus and Nvidia teams who helped groups get started, there was a lot of cross-talk and collaboration between the attendees, many of whom were merging ideas and borrowing from one another’s prototypes. According to Claudia M.B., Carvalho Fonseca, PhD, who was one of the attendees, “ It was fascinating to see the synergy between people from different disciplines — computational biology, bioinformatics, molecular biology, etc.– to work toward common goals.” She added: “The combination of good organization, time constraints, and diverse backgrounds boosted creativity and helped each group develop solutions.”
The hackathon yielded some of the following innovations.
Fast and efficient QC for multi-sample VCF.
This Python package can perform a rapid evaluation of 2500 sample VCFs in one and a half minutes. Find the package here.
We then made the final presentation to the broader community. Here are some highlights.
Genome mis-assembly detection using structural variant calling.
This quality control tool for metagenomic assemblies uses dxWDL, a workflow development language compiler for the DNAnexus Platform, and Docker, to build workflows and port them across the DNAnexus Platform.
Fast structural variant graph analysis on GPUs.
This applet, called super-minityper, uses a set of cloud-based workflows for constructing structural variant graphs and mapping reads to them. The super-minityper is implemented as a DNAnexus cloud workflow/applet using dxWDL. For minimap2 + seqwish pipeline, the super-minityper also provides a WDL file where minimap2 is substituted by cudamapper in NVIDIA’s Clara Genomics Analysis SDK for faster analysis using GPU. It also provides a public Docker image (ncbicodeathons/superminityper:dx-wdl-builder-1.0) which enables easy-to-use DNANexus’ dxWDL compiler.
Note: The DNAnexus Platform currently doesn’t support GPU-enabled virtual machines for workflows from a web UI but this support is planned for a future release.
This pipeline identifies and validates de novo structural variants in genomics datasets from trios.
SWIft Genomes in a graph.
This automated pipeline builds graphs quickly using k-mer approach. Generally, building graphs for genomes, or large genomic regions is computationally expensive; however, with a multi-scale approach, this pipelines employs a simple algorithm and tool to build genome graphs for the human Major histocompatibility complex (MHC) region within three minutes.
The spirit of innovation continued after the hackathon, when a group of attendees visited the Space Center in Houston. There, attendees saw the Saturn V rocket, the same model that helped the Apollo 11 mission travel and walk on the moon.
The next structural variant hackathon at Baylor College of Medicine will take place on April 19th-21st. For more information or to register, visit: https://www.hgsc.bcm.edu/events/hackathon