An Introduction to Population Genomics Studies with DNAnexus Apollo

Population Genomics

While the amount and variety of genomics data available to scientists is greater now than ever before, large-scale bioinformatics studies can still seem daunting to researchers interested in exploring the applications of population genomics. Reducing the barrier to entry through software tools and open data access is essential to accelerate the path from genomic discovery to tailored patient care. In this light, we provide an introduction to a few fundamental types of genomic based studies.

Researchers leverage population-scale genomics in their studies to better understand how genetics contribute to an individual’s health. These large-scale studies start with the creation of cohorts — a group of individuals with a particular set of genotypes, phenotypes, and/or combination of the two — that the researcher is interested in studying. Traditional cohort creation for bioinformaticians is a process that involves using the command-line interface (CLI) and a variety of different genotype/phenotype specific tools to sort through datasets and parse hundreds of traits. DNAnexus Apollo enables researchers to work either within a JupyterLab/CLI environment or an intuitive user-interface to easily and quickly explore a multitude of phenotypes and genomic traits in the Cohort Browser, filter for desired traits, and visualize results using built-in charts and visualizations such as Manhattan plots.

DNAnexus Apollo Cohort Browser
Figure 1. DNAnexus Apollo Cohort Browser
DNAnexus Apollo Association Browser
Figure 2. DNAnexus Apollo Association Browser 

Once a cohort has been identified, two common analysis methods are  genome- and phenome-wide association studies.  A genome-wide association study (GWAS) is used to find associations between single-nucleotide polymorphisms (SNPs) and a certain trait or disease. Phenome-wide association studies (PheWAS) operate on a similar premise, and test genetic variants for associations with a set of phenotypes. Both types of studies provide information that researchers can use to uncover genetic risk factors for a variety of diseases and health conditions by surveying phenotype to  SNP correlation. The insights gained into the disease associations provide a better understanding of underlying disease origins, and can aid in the discovery and validation of new targeted drug targets and/or preventative strategies. 

SAIGE and PLATO are two common software algorithms that can be used to execute GWAS analysis.  There are publicly available resources for how to conduct a GWAS using SAIGE, including this repository created by the Neale lab that details the results of a GWAS conducted on the UK Biobank dataset. Many of these guides provide specific details on how quality control was conducted, and which statistical analyses were used, so that those new to these types of studies can replicate their findings. PLATO offers features to those interested in analyzing phenotypes as well, and enables users to run both a GWAS or PheWAS using a single unified tool. Both of these algorithms are available on DNAnexus, wrapped in the scalable application framework accessible, allowing researchers to quickly create cohorts and analyze them in the same place.

Both of these types of studies will process large quantities of data when run on large populations, a challenge for most homegrown informatics systems. DNAnexus Apollo offers Jupyter notebooks that enable researchers to easily analyze the large scale data with actions like annotation of GWAS results, a critical step  for making bioinformatics data more actionable  for all researchers. Apollo is purpose-built to handle the scale and type of computations that population genomics studies use, allowing researchers to work with genomics and complex multi-omics data in exciting new ways. 

More and more tools are becoming available that help enable researchers to conduct innovative and more involved analyses as well. Using the results from a GWAS, researchers can compare their variant sets with other databases online, and incorporate additional omics data types, such as RNA-seq, to better understand how variants affect gene expression. Or, by utilizing machine-learning, researchers can conduct fine-tuned analyses by combining GWAS and functional data to bring clarity to previously noisy results. 

Population genomics studies hold the key to unlocking the potential of precision medicine. With tools like DNAnexus Apollo, researchers can more quickly utilize large and complex datasets for use in the identification of biological mechanisms of disease, the discovery and application of biomarkers, or omics-guided therapeutic target discovery. 

Interested in learning more? Check out the recorded webinar with Ben Busby, Scientific Director, Research Platforms Outreach, and see how Apollo unlocks population-scale omics datasets for accelerated discovery.

MIODx: A Breakthrough Leader in the precisionFDA COVID-19 Precision Immunology App-a-thon

MIODx Precision Immunology

The results are in for precisionFDA’s COVID-19 Precision Immunology App-a-thon, and we are thrilled to announce our partner, MIODx, demonstrated strong leadership in their field by taking home medals in multiple categories!  

Motivation Behind the COVID-19 Precision Immunology App-a-thon 

Despite the accelerated scientific pace of COVID-19 research, treatments, and vaccine development, critical questions remain unanswered, including the role of genetic variability of an individual’s immune system on their response to SARS-COV2 infection. To effectively combat the widespread transmission of COVID-19 infection and save lives, better understanding of its pathophysiology is needed to help enable effective diagnosis, prognosis, and treatment strategies. 

Through the precisionFDA COVID-19 Precision Immunology App-a-thon, the Food and Drug Administration (FDA) called upon the scientific community to develop innovative and user-friendly tools to explore the relationship between personalized immune repertoires and COVID-19 disease outcomes. Participants were evaluated by judges in six categories (Table 1), and MIODx took home awards in 5 out of the 6 categories for their ClonoMapTM Immune Profiler and ImmuneInsight tools. 

ClonoMapTM Immune Profiler Evaluation Categories
Table 1. Evaluation Categories

The ClonoMapTM Immune Profiler and Immune Insight tools were featured due to their high-quality visualizations, excellent documentation, ability to be an out-of-the-box solution for exploratory analysis, and potential impact to the field of precision immunology. See all the results of the app-a-thon here

MIODx’ Impact on COVID-19 Biomarker Discovery and Disease Severity Prediction 

Excelling in the eyes of the judges was just one highlight for the MIODx team. Their analytics and visualization tools, ClonoMapTM Immune Profiler and Immune Insight, were used to generate hypotheses and find preliminary evidence of T Cell receptor (TCR) repertoire genes as biomarkers of significance for COVID-19 clinical outcomes.

ClonoMapTM Immune Profiler is an automated TCR repertoire analysis pipeline that utilizes FASTQ files for downstream analyses and was used in the app-a-thon to investigate whether there are biomarkers in the TCR repertoire data that are predictive of disease severity.

Among the key findings, Immune Profiler highlighted specific T Cell Receptor Beta Variable (TRBV) genes and CDR3 clonotypes at different frequencies in healthy individuals compared to COVID-19 recovered patients. These results provide candidate TCRs for further investigation with respect to COVID-19 disease severity.

One particularly interesting TRBV (TRBV19) was found at high frequency in all samples (healthy and COVID-19). Using their Immune Insight literature search tool, the MIODx team determined that this TCR is specific to influenza virus antigens. MIODx is now investigating the link between this public clonotype and COVID-19 disease progression.

Overcoming TCR Analysis Computational Challenges 

Despite the enormous potential of TCR analysis in precision medicine, big data and computational challenges remain to effectively sequence and analyze T-cells at scale. To that end, MIODx has launched ClonoMap™ Immune Profiler on DNAnexus Titan™, which provides a cloud environment, enabling seamless sharing of projects, data, and pipelines to team members who have approved access. Researchers can upload their data, integrate new data sources, conduct analyses with the ClonoMap™ pipeline, visualize, and share results with collaborators – all within an intuitive user interface. 

The DNAnexus Titan™ Platform allows MIODx to maintain multiple versions of their pipeline. They can give customers access to the production version, while also testing updates and making changes to the pipeline’s research version. DNAnexus enables MIODx’s customers to securely bring their own data online within their shared project and immediately apply the MIODx ClonoMap™ pipeline to generate results.

See ClonoMap Immune Profiler in Action 

Ankita Das, Head of Product, at MIODx will be joining us for a precision immunology webinar, where she will present the clinical relevance of sequencing the TCR repertoire, methodologies for analyzing and interpreting TCR data, and an in-depth look into how their tools generated preliminary evidence of biomarkers in the TCR repertoire that are predictive of COVID-19 disease severity. 

Save your seat here. 

Date: May 27th, 10am PT / 1pm ET

Responding to the Evolving World of COVID-19 Through Global Collaboration

COVID-19 Research Global Collaboration

Will the next wave of COVID-19 surveillance come via wastewater? What challenges lay ahead as we try to detect variants in a constantly evolving viral population, and adjust our therapeutics accordingly? How could bioinformatics help?

These were some of the questions raised during a recent roundtable of genomics experts brought together by DNAnexus to explore ways in which enhanced scientific collaboration tools could contribute to pathogen surveillance and global pandemic response. 

The right tool from the toolbox

The public has become all too familiar with the term ‘variants’ thanks to evolving versions of the SARS-CoV-2 virus that is plaguing the world. But identifying low frequency variants in genomes has always been a challenge. From PCR testing to short- and long-read whole genome and transcriptome sequencing, different detection methods have different benefits, from speed and cost, to accuracy and depth of coverage. 

As scientists set their sights on monitoring urban wastewater supplies to track pathogens in the population, they will have to grapple with samples that contain a ‘grab bag’ of microbes, viruses, bacteria, and lots of other fun stuff, said Todd Treangen. 

Treangen, Assistant Professor of Computer Science of Rice University, warned that environmental metagenomics presents many challenges. Distinguishing between actual low prevalence genetic variants and sequencing errors is high among them. Reassembling viral genomes from the limited information you’re likely to recover is also extremely difficult, he said.

Fritz Sedlazeck, Assistant Professor of Bioinformatics & Data Analysis at Baylor College of Medicine, said quick, easy and affordable tests to detect variants of clinical interest would be required in order for such surveillance to be practically implemented. DNAnexus’s Ben Busby suggested that the creation of simple pipelines for metagenomic assembly might help the effort.

Richard Copin, a senior staff scientist at Regeneron, said deep sequencing still has a role, especially in the development of new therapeutics. It is important to be able to estimate the reservoir of diversity of variants in the virus among different pockets of the population, in case minor variants become dominant, he said.

“What is the selective pressure that would lead to these minor variants taking over the entire virus population, so that our vaccines or treatments would be totally useless?” Copin asked.

Sharing from the start

Another big challenge during the current pandemic has been data access, Copin noted. 

“Sharing sequencing data among scientists, which sounds like something that’s obvious, has been a big challenge, especially in the U.S.,” he said. “It’s something I’ve been shocked about.”

Treangen said coordinated response may have been a bit late out of the gate, but initiatives like the open-science research consortium COV-IRT (the COVID-19 International Research Team) have helped facilitate collaboration to accelerate research and drug development. The group of nearly 200 scientists from 11 countries and 75 institutions recently marked its one-year anniversary with a scientific symposium. Treangen, a founding member, said he is optimistic that such collaboration will help ensure the global response to future pandemics will be timely, well-coordinated and hopefully, proactive. 

DNAnexus has been a partner in that effort, providing IT infrastructure as well as assistance accessing and analyzing data. 

Copin said DNAnexus’s data sharing platforms and tools like the UK Biobank Cohort Browser will be critical for the success of global team science efforts.

Busby, who has helped with several COVID-related hackathons, said he was keen to continue contributing to pathogen surveillance by developing virus detection toolkits and enhanced phenotype/genotype management systems, and he welcomed collaboration with other scientists.