Helping Scientists Discover the Hidden Jewels Within UK Biobank Data

Angela Anderson

 

 

 

Did you know that a loss of function mutation in the MEPE gene, which codes for proteins that regulate bone mineralization, results in two-fold increased odds of osteoporosis and 1.5-fold increased risk of fractures? Neither did scientists, until they delved deeply into data released by the UK Biobank.

The UK Biobank, which has developed a biospecimen collection paired with unparalleled health data and phenotype information from more than 500,000 individuals, is making its massive genotype/phenotype dataset available to approved researchers world-wide, as part of a unique open access project aimed to accelerate medical research and drug discovery. Earlier this month, it released the first batch of results of whole-exome sequencing, providing data from 50,000 people. These exomes were sequenced by the Regeneron Genetics Center, which will soon complete sequencing of the entire 500,000 person cohort with the financial support of a “pre-competitive” consortium of leading biopharma companies.

Contained inside these data is a treasure trove of information — but one in which the jewels are hidden among billions of grains of sand.

How would Dana, a senior scientific researcher in the early development cardiovascular program at a pharma company, use it to determine whether there is a correlation between mutations in the PKP2 gene and a phenotype trait, such as red hair?

Luckily, the new DNAnexus Cohort Browser for UK Biobank makes phenome-wide association studies (PheWAS) and other queries easy to carry out. Developed to help researchers like Dana navigate thousands of phenotypic fields and millions of genetic variants, the browser can mine extremely large datasets in a matter of seconds.

How does it work? Dana would simply plug in her phenotypic (red hair) and genetic (PKP2, familial links) requirements into a built-in browser with a powerful point-and-click interface that makes it easy for her to quickly filter, browse, and visualize the integrated phenotypic and genomic information.

Run on our DNAnexus Apollo Platform, the Cohort Browser for UK Biobank was designed to enable scientists with all levels of bioinformatic expertise to be able to rapidly test multiple hypotheses and gain insight into mechanisms of action, biomarkers, and targets. And working with clinical data requires specialized capabilities to maintain privacy, which includes compliance in accordance with ISO 27001 certification, GDPR, and GxP, among others.

As part of their effort to explore the UK Biobank data, the Regeneron Genetics Center (RGC) used DNAnexus to run bioinformatics pipelines and deliver the results back to pharmaceutical companies in the UK Biobank Exome Sequencing Consortium. As part of this data delivery, the RGC successfully deployed the cohort browser on a collection of thousands of phenotypic fields extracted from the UK Biobank and millions of genetic variants computed through their scientific pipeline.

Initial analysis of the UK Biobank data has already led to many discoveries.

In addition to the MEPE mutation finding, RGC researchers identified a handful of other significant novel loss-of-function associations, including one that confers a nearly five-fold increased odds of varicose veins in certain carriers.

Among the nearly four million single nucleotide and indel coding variants observed by the researchers were many mutations to the so-called “ACMG59” genes — 59 genes proposed by The American College of Medical Genetics to be associated with highly penetrant disease phenotypes.

Overall, 2% of the sequenced individuals carried a flagged variant in one of the ACMG59 genes. Variants in cancer-associated genes were the most prevalent, followed by variants associated with familial hypercholesterolemia and cardiac dysfunction disorders.

Importantly, the Browser for UK Biobank allows researchers to check their hypotheses against real-world data from the de-identified patients’ records. For instance, an individual with a pathogenic missense variant could be found to have a history of benign colon neoplasms, diverticular disease of the intestine, colonic polyps, and intestinal obstruction.

These discoveries are a great illustration of how the extensive health data available for UK Biobank participants will be a valuable resource to assess disease risk at both the individual and the population level.

The crowdsourcing spirit of the initiative is what makes it stand apart. When the Manchester-based biobank enrolled its first volunteer 13 years ago, principal investigator Rory Collins wanted to democratize the data and maximize its scientific pay-off: “By making data available to 100 people around the world, we can get a lot more research done than if I sit here and do one study a year with the data,” the University of Oxford epidemiologist told Science.

Earlier releases of genotyping data in 2015 and 2017 have already resulted in more than 600 papers across 1,400 projects from 7,000 researchers. Additional tranches of exome and whole genome data will similarly be released over the next two years.

The hope is that the easy-to-use DNAnexus Apollo Platform and its Cohort Browser for UK Biobank will help even more researchers navigate the complexities of generating and delivering the combined phenotype and genetic data.

“This is just the beginning,” said Aris Baras, MD, Senior Vice President and Head of the Regeneron Genetics Center. “There is so much actionable information in this resource that can be utilized by scientific minds around the globe. We are hard at work mining the data for novel findings that will accelerate science, innovative new medicines and improved patient care, and are excited to have others join us in this important quest.”

Researchers interested in applying for access to UK Biobank data should visit http://www.ukbiobank.ac.uk/register-apply/.

For more information about the UK Biobank Cohort Browser, visit http://go.dnanexus.com/apollo_ukb.