One Genome Browser to Rule Them All?

Maria Nattestad

 

 

 

At VIZBI 2018 (Visualizing Biological Data) [https://vizbi.org/2018/], I proposed a topic for a breakout session, called “One Genome Browser to Rule Them All”. This led to a very interesting discussion that I found fascinating and inspiring. I am writing this up here so we all have a place to continue the discussion and explore where to go next.

The basic idea is that there should be a genome browser that the whole community of researchers can use and build on, creating new track types as plugins that can be shared with other scientists, without the complexity of modifying the core framework.

We are at a point where web technologies have matured and made it possible to have more modularity and reusability of code on the web, just like what was possible in Java that enabled Cytoscape to have a very useful library of community-created plugins. This kind of large-scale collaboration to build an ideal genome browser is, I believe, at least an interesting thought experiment as the bioinformatics field matures and we are joined by many of the world’s biologists who rely increasingly on genome and transcriptome sequencing for their innovative research.

So here is roughly what we discussed during this session, with some blue sky wishful thinking:

Built for the web

We agreed that a JavaScript genome browser would be preferable to a desktop application given how much of genomics has moved onto the web with large databases of resources, cloud platforms, and lots of academic web applications that make genomics more accessible. If the genome browser is built for the web, each of these groups of services could have the genome browser embedded into it for better interactive exploration of the data. Among the genome browsers built for the web, some only need to be run from the front-end (IGV.jspileup.jsBiodalliance) while others also need some back-end, server-side code or configuration to be used (JBrowseNGB). It is significantly easier to embed the front-end browsers into a web application, whether that is a small academic project or a large platform like Galaxy or DNAnexus. If a back-end is the only way to improve performance, perhaps it can be an optional feature. I’m curious to hear people’s experiences on when having a back-end improves performance, and whether this is something that could be separated intelligently.

Future-proof

Importantly, this genome browser should also make fewer assumptions about how data will be visualized, to future-proof it against new technologies and ideas that need new types of visualization. Examples of limitations that are hard to overcome in current genome include the difficulty of plotting features that need to connect to multiple loci, which has become increasingly common in recent years for showing long reads (PacBio, Nanopore), linked reads (10X), and long-range variants including gene fusions, to name a few. Current genome browsers can sometimes show multiple loci next to each other, but the tracks cannot cut across and connect to more than one of these loci. See these examples of connecting across multiple loci from Ribbon and gGnome:

gGnome and Ribbon Browsers

Genome Browser Caption

Ann Loraine, who is part of the IGB team, explained that the Integrated Genome Browser (IGB) is an older genome browser that predates IGV. The IGB team had attempted to add multi-locus functionality more recently, but they found that it was no longer possible to make such a fundamental change to the architecture of IGB. All software developers have to make assumptions when starting a project, but it’s important to examine those assumptions and consider what all the features are that we might eventually want to include. That is why future-proofing is such an important topic for this discussion.

To highlight another dimension of future-proofing, Valerie Schneider of the NCBI pointed out that a flood of new genomes, especially from the Vertebrate Genome Project, is one of the challenges she sees on the horizon for the NCBI’s own genome browser. Valerie also mentioned that enabling the genome browser connect to data from government and consortia resources would be very important.

Christian Stolte of the New York Genome Center showed a mini-browser within MetroNome that visually connects protein domains to their genomic coordinates along a gene. That example brought up an interesting point that non-genomic coordinates such as transcript and protein domains could also become first-class citizens. So then I guess we are no longer just talking about a genome browser but actually a multiome browser!

Notice that all of these use cases we discussed for future-proofing are already needed today and supported by niche mini-browsers. If you know of other current and predicted future use cases that are not already covered by existing genome browsers or by the ideas presented here, please add them in the comments, so we can all broaden our horizons and expand our vision for the ideal genome browser.

A library of community-contributed plugins

Before I built each of my own visualization tools, I tried to add the functionality I wanted to the genome browser I was using at the time, the desktop version of IGV. First, I needed to show two loci at the same time, with connecting lines between them for long-range variants. IGV could show two regions at once, but they were independent and I couldn’t draw lines across them — so I built a visualization tool called SplitThreader to show long-range variants as connecting lines between two loci. Later, I needed to show alignments of long reads in a way that you can tell where the alignments are along the read’s own length in addition to where they are on the reference, so I built Ribbon. The funny thing is, after building the parts of the visualization that are really novel, you still have to add several parts that normal genome browsers can do just fine, like drawing genes, variants, other features from a bed file. I would end up spending over 50% of my time implementing features that already exist elsewhere, and pretty shallow versions of them too.

Since starting my data visualization role at DNAnexus I also have a whole new appreciation for modularity because the platform is built to be able to run virtually any kind of analysis on any kind of data applicable in genomics. I dream of having a genome browser that could be equally flexible and modular, that anyone could contribute new types of visualization to, and that could be integrated and used anywhere from consortium database explorers to digital paper figures. If anyone can build an app on DNAnexus or Galaxy and publish it for others to use, then why is it not the same for adding a special track to a genome browser?

Conclusions

I would like to mention that the genome browsers out there right now are fantastic. When thinking about this ideal genome browser, there are several ideas that come directly from existing tools: IGV.js is very lightweight and does an excellent job of being embeddable anywhere, while JBrowse has a library of community-contributed plugins that I find very encouraging for collaboration in this field.

Genome browsers have significant similarities with each other, but each one also has its unique strengths that not all of them share. When I need to build a novel visualization for a particular data type, concept, or application, I would love to be able to just build a track into an already powerful and full-featured genome browser, instead of building the novel feature into its own tool and then adding genes, bed, and variants track types. The users of my tools would surely rather have the full power of an IGV or a JBrowse at their fingertips even while looking at their long-read alignments in a Ribbon browser.

Now I would love to hear what you think!

Are there any genome browser teams out there who have been thinking about modularity, plugins, future-proofing, and making a lightweight yet powerful genome browser for the web?

Dot: An Interactive Dot Plot Viewer for Comparative Genomics

Author: Maria Nattestad, Scientific Visualization Lead

 

 

 

Introduction

Next week, DNAnexus will be at the Plant and Animal Genome conference (PAG) in San Diego (booth 431). As part of an ongoing effort to expand our visualization capabilities, we will present an open-source tool called Dot that helps scientists visualize genome-genome alignments through a rich, interactive dot plot.

In addition to its scientific contribution, Dot encourages community development of new visualization tools by providing a template that can be used for new visualization tools in other areas of bioinformatics. This would allow bioinformaticians to focus on the bioinformatics and visualization without needing to master web programming intricacies such as reading data from local and remote servers, which is all handled by Dot’s modular and reusable inner workings.

Importance of Dot Plots

Constructing a genome assembly is fundamental to studying the biology of a species. In recent years, advances in long-read sequencing and scaffolding technologies have led to unprecedented quality and quantity of genome assemblies. Better reference genomes contribute to better gene annotations, evolutionary understanding, and biotech opportunities.

Comparing new assemblies to existing genomes of related species is crucial to understanding differences between organisms across the tree of life. Genome assemblies are never perfect and always have to be evaluated critically by comparing against other assemblies or reference genomes, whether of the same or a closely related species. Comparative genomics is also how assemblies of two species’ genomes can be compared and contrasted to look for features that represent functional differences or inform the study of their evolution.

The classic method for visualizing genome-genome alignments is the dot plot, which provides an excellent overview of alignments from the perspective of both genomes. Dot plots place the reference genome on one axis and the query genome (that is aligned against the reference) on the other axis. Alignments between the two genomes are placed according to their coordinates on both genomes. Whereas genome browsers (such as IGV and the UCSC Genome Browser) plot data in one dimension on one genome, dot plots use two dimensions to show alignments in two genomes’ coordinates spaces simultaneously. This is necessary when representing large genome alignment data where the query coordinates matter just as much as the reference coordinates for a particular alignment.

However, dot plots have barely changed in the past decade and are still generated from the command-line as static images, limiting detailed investigation. We decided to tackle this problem as an open-source science project at DNAnexus.

Introducing Dot

Here we present Dot, an interactive dot plot viewer that allows genome scientists to visualize genome-genome alignments in order to evaluate new assemblies and perform exploratory comparative genomics.

Dot supports the output of MUMmer’s nucmer aligner the most commonly used software method for aligning genome assemblies. A quick script called DotPrep.py converts the delta file to a more streamlined coordinates file with an index that enables Dot to read in more alignments in certain regions on demand.

Interactivity and features

Dot adds a number of useful features on top of the classic dot plot concept. The index enables a quick plot of an overview that includes the longest 1000 alignments. From here, users can zoom in to look at particular regions and load all the alignments for regions of interest.

In addition to showing alignments, Dot allows scientists to load annotations for either or both genomes to show additional context  (e.g. understanding how sequence differences map to gene differences). Annotation tracks are a common feature of one-dimensional genome browsers, but to translate this concept to the two-dimensional dot plot, we enable annotation tracks on both axes. This is a major benefit of Dot that makes it possible to compare gene annotations visually alongside the alignments of the DNA sequences.

Moreover, users can jump to the same region of the reference genome in the UCSC Genome Browser to quickly see additional context for a region of interest. This allows scientists to explore how known repetitive elements in the reference genome could potentially affect assembly quality in specific regions.

Details for developers

By leveraging D3 and canvas in JavaScript, Dot combines the benefits of interactivity with scalability, enabling scientists to explore large genomes. The UI on the right side panel is built using an open-source SuperUI.js [https://github.com/marianattestad/superui] plugin, and the input handling and basic page navigation is set up through a special VisToolTemplate [https://github.com/MariaNattestad/VisToolTemplate] plugin we developed to enable others to create new visualization tools more easily. We encourage developers to utilize and build on Dot and these open-source projects to create their own visualization tools. Dot is very modular and can be used as a template to build new visualization tools. The template handles complex and necessary components like reading input data files from various sources, thereby letting developers focus only on the visualization itself.

Dot is open source

Dot is free to use online at [https://dnanexus.github.io/dot/] and open source at [https://github.com/dnanexus/dot]. For DNAnexus users, there is a package available among the featured projects with (1) an applet for running MUMmer’s nucmer aligner that includes DotPrep.py, (2) a shortcut  to Dot to send files from DNAnexus quickly, and (3) example data and results.