UK Biobank RAP Researcher Spotlight: June 2023

The Monthly Researcher Spotlight is our new section highlighting the exciting work of the UK Biobank Research Analysis Platform user community. If you would like to be featured, email community@dnanexus.com.

This was simultaneously published in the June 2023 UK Biobank RAP Newsletter. You can sign up for future installments here.

This month's Spotlight features the SAIGE Research Group, a collective of international researchers working to help the research community derive deeper insights from large-scale datasets through their tools for performing GWAS & more.

June 2023 Researcher Spotlight Headshots

SAIGE Research Group (from left to right);

Wei Zhou, Instructor, Massachusetts General Hospital, Broad Institute. Wenjian Bi, Assistant Professor of Medical Genetics, Peking University. Seunggeun (Shawn) Lee. Associate Professor of Data Science, Seoul National University

What are the focus and discovery highlights of your research?

Our primary focus has been on the development of effective methods and tools for analyzing biobank data, with the aim of identifying the genetic basis of complex diseases. Biobanks, with their large sample sizes, diverse phenotypic information, and extensive genetic data, represent invaluable resources. However, analyzing such biobanks poses significant challenges due to their large data size, imbalanced phenotypes, and interrelated samples. To address these challenges, we have devised several methods, including GWAS of continuous and binary traits (SAIGE), Rare-variant test (SAIGE-GENE+), Categorical Phenotypes (POLMM and POLMM-GENE), and Survival Phenotypes (SPA-COX). Our approach integrates scalable mixed-effect model computation with an accurate p-value calculation method. We have successfully applied these methods to the UK Biobank data and have shared the analysis results through PheWEB. Through these efforts, we aim to contribute to accelerating the discovery of novel insights in the field.

What are some of the key questions that you are looking to answer using UK Biobank data?

One of the questions we are interested in is identifying novel genetic variants associated with complex diseases and traits. The association analysis is the first step of the biobank data analysis. It will identify novel variants and genes associated with diseases. The analysis results are also used for downstream analysis, including building a genome-based risk prediction model (i.e., PRS), helping a mechanistic understanding of diseases, and finding the causal effect of biomarkers and drugs on the phenotypes. Of particular interest to us are rare functional variants obtained from recent whole-exome sequencing (WES) and whole-genome sequencing (WGS). These variants offer more direct evidence of the influence of genes on phenotypes, thus providing valuable insights. Additionally, we are interested in exploring the expanded landscape of phenotypes from OMICS, including proteomics and metabolomics.

How has the UK Biobank Research Analysis Platform (UKB-RAP) helped you perform your research?

UKB-RAP is very helpful for our research in multiple aspects. As developers of methods and tools, one notable advantage is the common platform shared by all UK Biobank researchers. This allows us to test and evaluate the method (including computation cost) within one environment, enabling relatively easy application of our tools by other researchers. In terms of data analysis, UKB-RAP eliminates the need for local download and storage of the massive UK Biobank dataset, which spans several petabytes. This significant advantage alleviates a multitude of challenges and complexities. Additionally, UKB-RAP enables scalable analysis, reducing the need to invest in local server infrastructure.

Any tools or tutorials that you have developed that would be useful for the UKB-RAP community?

We would like to introduce two software packages.

SAIGE is an R package for GWAS and rare variant tests of continuous and binary phenotypes based on generalized linear mixed models. The package has been previously used for phenome-wide analyses of common and rare variants in the genotyping and imputation data as well as the recent WES and WGS data in the UK Biobank. The documentation of the SAIGE package can be found here. For the UKB-RAP community, we have developed the pipeline for running SAIGE on UKB-RAP for analyzing the UKBB WES data.

GRAB is an R package developed for GWAS and rare variant tests of various types of complex traits, including time-to-event traits and ordinal categorical traits. Regardless of the trait type, the GRAB package utilizes the same functions for fitting null model and conducting genome-wide testing. Users only need to update traitType (e.g. ordinal, time-to-event) and method (e.g. POLMM, SPACox). For more details, please visit here.

Experience DNAnexus

Move Beyond Genomics