From Manhattan Plot to BigTop: DNAnexus Makes Data Visualization a (Virtual) Reality

BigTop Team Members

Drowning in a deluge of data? What if there was a better way to grasp the information — literally?

Enter BigTop, an innovative virtual reality platform that allows you to interact with your data in a totally new way. Thousands of data points, three dimensions, unlimited possibilities.

It takes this:

2D Manhattan Plot

And turns it into this:

VR Manhattan Plot

BigTop is the brainchild of Senior Program Manager and resident microbiome expert Sam Westreich, Principal Software Engineer Christopher Meyer, and former Data Visualization Team Lead Maria Nattestad.

An avid gamer, Meyer had been searching for a way to take scientific data visualization into the virtual world, and Westreich was similarly interested in VR, 3D programming, and building better tools for the scientific community.

Scientists have been comfortably plodding along with scatter plots for decades. But the medium is limited to two dimensions, and is insufficient to handle the vast amounts and the depth of data being generated in the era of genomics and precision medicine.

The Manhattan plots often used to visualize genome-wide association study (GWAS) data can allow you to map out associations between disease risk and chromosome locations, for instance. But what if you want to throw in another variable, such as age, race or mutation frequency?

“Scatter plots provide a good way to get an overall glimpse, but you can’t really drill down into the data,” Westreich said.

BigTop widens the field of possibilities — by literally widening the field.

“Computer screens are relatively small. Large amounts of data won’t always fit,” Meyer said. “One of the luxuries of virtual reality is the space to set up an entire world. A lot more data can be presented that way.”

BigTop takes three different qualities of each data point, plotting them in three different dimensions. If you find a point of interest, you don’t have to remember the coordinates and type them into a browser to investigate further — you can just turn around and select the point. That action brings up even more information culled from the originating dataset or other records.  

The program not only enhances the data analysis process for professional users, but it makes it more accessible for a wider audience, who may find the physical interaction to be more intuitive and easier to understand.

“It presents data in a way that we are better equipped to deal with,” Westreich said. “The dataset is not just a file that sits somewhere. It’s turned into a bright, vibrant, multicolored thing that people are curious to explore. You can literally walk through the data. You no longer have to try to conceptualize it in your head.”

Frontier Fridays

BigTop was born one Friday afternoon in March 2018 during a session of Science Frontiers,  time allocated to DNAnexus science and engineering teams to get creative on projects not directly related to their day-to-day customer work. Every week, about a dozen people participate.

Nattestad and Meyer came up with the concept and Westreich was soon brought in to “begin sticking toothpicks into paper plates” to develop a prototype. Soon, Meyer was lugging his huge gaming computer and Oculus headset into work each Friday as they delved into the brave new world. They experimented with different platforms — one of which was abruptly taken off the market — and settled on A-Frame.  

“It’s a relatively new environment, so there’s not years worth of documentation and user questions and answers,” Westreich said. “It’s been challenging, but it’s also exciting. We are really getting in on the ground floor and contributing to the evolution of this completely new media.”

BigTop works with a variety of VR headsets. With the headset on, you can walk around to explore the data, using your right-hand controller to project a laser that can be used to select points: simply aim the laser at a point and pull the trigger on the controller to get info on it. Your left controller will present a virtual hand; this currently does nothing, but you can use it to wave or give a thumbs-up to your data, as the team notes in their QuickStart guide.

The program can also be played like a video game using a phone, computer screen, keyboard and mouse, making it accessible to those who may not have more advanced equipment.

After demonstrating the program to colleagues at DNAnexus, the company invested in VR equipment, and Nattestad and Westreich have taken it on the road to several conferences, where BigTop has been a big hit.

“It’s a lot of fun and attracts a lot of attention. And it shows how we, as a company, are experimenting with new modalities and thinking out of the box,” Westreich said.

BigTop is currently available for free on Github, where it comes pre-loaded with a GWAS dataset from the GIANT consortium with associations between SNPs and height, and a second dataset on breast cancer. The program can also be used to visualize other datasets — Westreich presented an interactive rice genome at this year’s Plant and Animal Genome Conference, for instance.

After experiencing BigTop at the last DNAnexus Connect user group meeting, customers have been eager to get the program integrated into their DNAnexus packages. Westreich and Meyer said the program is already optimized for use as a DNAnexus file viewer. Users have to put their data in the right format themselves, which may not be a straightforward process depending on the data.

But Meyer thinks the payoff will be worth it.

“Virtual reality is still very new. This sort of biological data exploration has not been available before, and scientists are not nearly as cutting edge as you might think when it comes to data visualization,” Meyer said. “Although they may be slow to adopt new methods, when they do start and realize the potential, they embrace it.”

Helping Scientists Discover the Hidden Jewels Within UK Biobank Data

Angela Anderson




Did you know that a loss of function mutation in the MEPE gene, which codes for proteins that regulate bone mineralization, results in two-fold increased odds of osteoporosis and 1.5-fold increased risk of fractures? Neither did scientists, until they delved deeply into data released by the UK Biobank.

The UK Biobank, which has developed a biospecimen collection paired with unparalleled health data and phenotype information from more than 500,000 individuals, is making its massive genotype/phenotype dataset available to approved researchers world-wide, as part of a unique open access project aimed to accelerate medical research and drug discovery. Earlier this month, it released the first batch of results of whole-exome sequencing, providing data from 50,000 people. These exomes were sequenced by the Regeneron Genetics Center, which will soon complete sequencing of the entire 500,000 person cohort with the financial support of a “pre-competitive” consortium of leading biopharma companies.

Contained inside these data is a treasure trove of information — but one in which the jewels are hidden among billions of grains of sand.

How would Dana, a senior scientific researcher in the early development cardiovascular program at a pharma company, use it to determine whether there is a correlation between mutations in the PKP2 gene and a phenotype trait, such as red hair?

Luckily, the new DNAnexus Cohort Browser for UK Biobank makes phenome-wide association studies (PheWAS) and other queries easy to carry out. Developed to help researchers like Dana navigate thousands of phenotypic fields and millions of genetic variants, the browser can mine extremely large datasets in a matter of seconds.

How does it work? Dana would simply plug in her phenotypic (red hair) and genetic (PKP2, familial links) requirements into a built-in browser with a powerful point-and-click interface that makes it easy for her to quickly filter, browse, and visualize the integrated phenotypic and genomic information.

Run on our DNAnexus Apollo Platform, the Cohort Browser for UK Biobank was designed to enable scientists with all levels of bioinformatic expertise to be able to rapidly test multiple hypotheses and gain insight into mechanisms of action, biomarkers, and targets. And working with clinical data requires specialized capabilities to maintain privacy, which includes compliance in accordance with ISO 27001 certification, GDPR, and GxP, among others.

As part of their effort to explore the UK Biobank data, the Regeneron Genetics Center (RGC) used DNAnexus to run bioinformatics pipelines and deliver the results back to pharmaceutical companies in the UK Biobank Exome Sequencing Consortium. As part of this data delivery, the RGC successfully deployed the cohort browser on a collection of thousands of phenotypic fields extracted from the UK Biobank and millions of genetic variants computed through their scientific pipeline.

Initial analysis of the UK Biobank data has already led to many discoveries.

In addition to the MEPE mutation finding, RGC researchers identified a handful of other significant novel loss-of-function associations, including one that confers a nearly five-fold increased odds of varicose veins in certain carriers.

Among the nearly four million single nucleotide and indel coding variants observed by the researchers were many mutations to the so-called “ACMG59” genes — 59 genes proposed by The American College of Medical Genetics to be associated with highly penetrant disease phenotypes.

Overall, 2% of the sequenced individuals carried a flagged variant in one of the ACMG59 genes. Variants in cancer-associated genes were the most prevalent, followed by variants associated with familial hypercholesterolemia and cardiac dysfunction disorders.

Importantly, the Browser for UK Biobank allows researchers to check their hypotheses against real-world data from the de-identified patients’ records. For instance, an individual with a pathogenic missense variant could be found to have a history of benign colon neoplasms, diverticular disease of the intestine, colonic polyps, and intestinal obstruction.

These discoveries are a great illustration of how the extensive health data available for UK Biobank participants will be a valuable resource to assess disease risk at both the individual and the population level.

The crowdsourcing spirit of the initiative is what makes it stand apart. When the Manchester-based biobank enrolled its first volunteer 13 years ago, principal investigator Rory Collins wanted to democratize the data and maximize its scientific pay-off: “By making data available to 100 people around the world, we can get a lot more research done than if I sit here and do one study a year with the data,” the University of Oxford epidemiologist told Science.

Earlier releases of genotyping data in 2015 and 2017 have already resulted in more than 600 papers across 1,400 projects from 7,000 researchers. Additional tranches of exome and whole genome data will similarly be released over the next two years.

The hope is that the easy-to-use DNAnexus Apollo Platform and its Cohort Browser for UK Biobank will help even more researchers navigate the complexities of generating and delivering the combined phenotype and genetic data.

“This is just the beginning,” said Aris Baras, MD, Senior Vice President and Head of the Regeneron Genetics Center. “There is so much actionable information in this resource that can be utilized by scientific minds around the globe. We are hard at work mining the data for novel findings that will accelerate science, innovative new medicines and improved patient care, and are excited to have others join us in this important quest.”

Researchers interested in applying for access to UK Biobank data should visit

For more information about the UK Biobank Cohort Browser, visit

Bio-IT World 2019: Explore Millions of Variants in the UK Biobank Data

We are heading back to our second home in Boston to attend the annual Bio-IT World conference! Our team is excited to join fellow researchers, clinicians, pharmaceutical and IT professionals to discuss the future of precision medicine. This year we are bringing our enhanced DNAnexus Apollo™ platform for UK Biobank, which features a new, innovative clinico-genomic cohort browser that enables users to explore and analyze millions of genomic and clinical data points in a matter of seconds.

Innovative PracticesWe are excited to see that our trusted partner on this project, the Regeneron Genetics Center, has been recognized as a 2019 finalist for the Inaugural Innovative Practices Awards. Regeneron’s work performing exome sequencing and analysis for all 500,000 samples for the UK Biobank is a true achievement. We were honored to partner with them to create an innovative cohort browser for UK Biobank data, allowing researchers the ability to browse through 3,000 phenotypic fields and 15,000,000 genomic variants across 100,000 samples and build cohorts.

Stop by the DNAnexus booth #310 to learn more about the brand new Apollo for UKB! Check out our conference activities below. Can’t make it to any of our events? Request a meeting.

DNAnexus Booth Activities

DNAnexus ApolloDemo: Explore Millions of Genomic & Clinical Data Points with DNAnexus Apollo for UK Biobank

  • Wednesday, April 17: 10:00am-11:00am, 1:00pm-2:00pm
  • Thursday, April 18: 10:00am-11:00am

Mining large-scale datasets for actionable insights is a huge computational effort. The UK Biobank’s first release of their dataset presents an exciting new opportunity for genomics research, with 50,000 exomes and combined phenotypes for users to dig into.

DNAnexus Apollo is based on a scalable cloud-based platform. Whether you’re working with the UK Biobank or your own protected data, DNAnexus Apollo streamlines consumption of multi-omics and clinical records from complex datasets.

Customer Talk/Poster

AI Assisted Rapid Clinical Whole Genome Sequencing for Critical Care
Bioinformatics Track

  • April 18, 2:30pm-3:00pm
  • Ray Veeraraghavan, PhD, Director of IT & Informatics, Rady Children’s Institute for Genomic Medicine

DNAnexus Apollo Lounge: Cocktail Networking Event

Apollo Lounge EventJoin us during Bio-IT World for a special Greek-style networking event!

Eat, drink, and have fun meeting fellow scientists and other translational research industry experts.

Space is limited. Contact us for your personal invitation.