Deciphering the TCR Repertoire

TCR Repertoire

Between the 4th and 19th centuries AD, knowledge of how to read and write Egyptian hieroglyphs was essentially nonexistent. Not until 1799 when French officers found the Rosetta Stone, was there any hope to translate this complex script. With the help of the stone, scholars were able to crack the complex code of hieroglyphics, and gain a deeper understanding of ancient Egyptian life. 

Just like the Rosetta Stone was critical to understanding ancient Egyptian mysteries, the immune repertoire is key to unlocking insights about an individual’s immune response to their environment. Thankfully, advances in next-generation sequencing and computational biology have made translating the immune repertoire a modern reality. 

What is the immune repertoire and why is it important? 

Specialized cells of the immune system, such as macrophages, T-cells, and B-cells, work together to identify threats in the body, and activate a coordinated and complex immune response. An important part of the immune response occurs when a T-cell detects a threatening target and receptors on the T-cell bind to the enemy cell, or antigen. The T-cells then duplicate many times in a process called clonal expansion, and remain in the body to quickly respond when the same antigen appears again.  

Each unique receptor on the T-cell recognizes only a single antigen, and this range is encoded by a fixed number of gene segments. Thus, the T-cell receptor (TCR) repertoire holds the key to understanding the diversity of the immune system and how it responds to disease-causing antigens. By sequencing the TCR repertoire and learning the genetic code of the cells, researchers can begin to understand which antigens those particular T-cells have targeted, build a disease profile that an individual has encountered, and determine whether a particular vaccine or immunotherapy drug may be effective. This powerful information can be used to develop diagnostic tests, create therapeutic products, and predict responses to immunotherapies. 

This week, immune profiling company, MIODx launched ClonoMapTM Immune Profiler, an analysis and biomarker discovery platform that allows researchers to probe the immune system and better understand how an individual’s T-cells can make them susceptible to disease or certain therapies, and how that can change over time. 

A healthy individual’s immune system consists of a vast TCR repertoire, in the order of 109 cells, so the ability to scale up computational resources is crucial. Powered by the scalable and flexible DNAnexus Titan Platform, ClonoMapTM Immune Profiler enables researchers to conduct powerful analysis by incorporating large datasets from multiple and longitudinal studies, integrating metadata, and processing multiple versions of analysis in parallel to generate and test hypotheses on the relevance of TCR samples as biomarkers. 

The MIODx team is at the cutting edge of TCR repertoire research. By empowering researchers with the analytical tools to study the genetic code of T-cells, gain an understanding of the antigens that the T-cells target, and study how the immune system changes over the course of an individual’s lifetime, MIODx is laying the groundwork to translate the TCR repertoire into actionable insights to inform human health. 

Read more about MIODx and their ClonoMapTM Immune Profiler here.

DipAsm: A Method for Generating More Accurate Phased Assemblies

DipAsm Phased Assemblies

Researchers from academia and industry, including Li Lab (Dana-Farber Cancer Institute), Church Lab (Wyss Institute, Harvard University), DNAnexus Research Lab, and others, have developed a new genome assembly approach , dubbed DipAsm, for generating chromosome-scaled phased contigs using long reads and long-range confirmation data. The method, described in Nature Biotechnology, could generate results within a day and outperforms other approaches in terms of contiguity and completeness of the phased assemblies. As shown in the paper, when the method was applied to four public datasets, it produced haplotype-resolved assemblies with contig NG50 of up to 25 Mb and phased almost all heterozygous sites with 98-99 percent accuracy.

Being able to generate accurate chromosome-scale haplotype-resolved assemblies is crucial for capturing the heterozygous variation present in human genomes and understanding allele-specific methylation and gene expression in research and clinical applications. The accuracy of DipAsm’s assemblies makes it a valuable tool for exploring highly polymorphic parts of the genome such as the Human Leukocyte Antigen region, and the Killer-cell Immunoglobulin-like Receptor region. “Our phased assemblies can reconstruct most of these regions with two contigs for each haplotype,” first author Shilpa Garg, a postdoctoral researcher in the Li Lab, explained in the paper. They also demonstrate how the method enables the identification of both SNPs and structural variants with greater sensitivity and specificity than some current methods.

Full details of how DipAsm works are provided in the paper. But briefly, it reconstructs the haplotypes present in diploid individuals using PacBio’s long High-Fidelity reads and Hi-C data as input. DipAsm works with data from an unphased Peregrine assembly scaffolded by 3D-DNA or HiRise. It uses DeepVariant to call small variants, phases them using WhatsHap and HapCUT2, partitions the reads, and then assembles each partition independently using the Peregrine assembly toolkit.

DipAsm Phased Assemblies Pipeline

To demonstrate the method’s accuracy, the researchers applied it to data from four human genomes: the PGP1 from the Personal Genome Project, HG002 and NA12878 from the Genome In a Bottle dataset (GIAB), and HG00733 from the HGSVC project. Full details of the assembly statistics including specifics about the assemblies used for the comparison to the results of the DipAsm pipeline are provided in Table 1 in the paper.

From the GIAB HG002 sample, the researchers generated a phased de novo assembly of 5.95 gigabases that incorporated data from both parental haplotypes. Compared to results from trio binning-based assemblies, the DipAsm assembly achieved better contiguity, and disagreed with less than 0.5% of phased heterozygous SNPs.

To evaluate the consensus accuracy of the DipAsm assembly, the researchers used dipcall to align the phased contigs of the HG002 that they created against the human reference genome. Next, they called SNPs and insertions and deletions from the alignment and compared these calls to the GIAB truth dataset. Out of the 2.36Gb confident regions in GIAB, the DipAsm assembly generated over 5,700 false SNP alleles (about 0.19% of called SNPs), and over 65,000 false insertion and deletion alleles (over 11% of called indels). It “achieves a consensus accuracy comparable to the Arrow-polished TrioCanu assembly,” according to the researchers.

Comparing the assembly to the GIAB truth data demonstrates DipAsm’s phasing power. “During assembly, failing to partition reads in heterozygous regions leads to the loss of heterozygotes,” the team explains. “On this metric, our Hi-C based assemblies only miss 0.4% of heterozygous SNPs. Those results are about 8 times better than those gleaned from a trio binning-based assembly, which is less powerful potentially because it is unable to phase a heterozygote when all individuals in a trio are heterozygous at the same site,” the researchers noted in the paper. Furthermore, “trio binning breaks short reads into k-mers, which also reduces power in comparison to mapping full-length paired-end Hi-C reads in our pipeline.”

Haplotypes through MHC Region Chart
This plot shows 6 haplotype resolved contigs of three individuals through the MHC region and the high degree of divergence from the reference genome. Visit the paper to see the full figure.

In terms of long indels (>50bp), the DipAsm assembly-based call set showed over 93% sensitivity and 92% precision compared to the GIAB structural variant truth dataset. In comparison, trio binning-based call sets had about 3% lower sensitivity for indels and small variants. The researchers also identified various structural variants in the DipAsm haplotype assemblies including microsatellites, simple repeats, and short interspersed nuclear elements.

Other results reported in the paper describe findings from comparing phased SNP calls from the DipAsm version of the HG00733 assembly to calls from the Human Genome Structural Variation Consortium.  The results showed that the DipAsm assembly had a slightly lower phasing error rate and phased more heterozygous SNPs. The team also used DipAsm to assemble the NA12878 and PGP1 genomes. And those results showed that “we can achieve chromosome-long phasing albeit the shorter read length of NA12878 and the lower read coverage of PGP1,” the researchers wrote. Comparisons of these assemblies to those in the GIAB truth set indicated that DipAsm’s NA12878 assembly offers better consensus accuracy.

NEW: Monitor your Diagnostics Pipeline with the Case Management Portal

DNAnexus Case Management Portal

If you’re running NGS data analysis pipelines for diagnostic purposes, then you know how important it is to have up-to-date insights into both the wet lab and bioinformatics stages of your pipeline. These insights empower teams to identify pipeline bottlenecks and deliver accurate results in a more timely manner. However, this type of visibility into pipeline progress is often difficult to achieve across various stakeholders, including executives, lab bioinformaticians, and clinicians.  

To solve these challenges, we’re excited to introduce the Case Management Portal – a powerful new solution built for sequencing service providers and diagnostic companies to monitor data analysis throughout the bioinformatics process, and automate critical workflow steps. 

How Does it Work? 

The Case Management Portal enables customizable monitoring throughout all steps in your bioinformatics pipeline. 

DNAnexus can help customize the dashboard for executives and team leads to show key metrics to review, identify pipeline bottlenecks, and understand the overall efficiency of the production pipeline.  

The dashboard also provides an overview of the diagnostics pipeline for bioinformaticians and lab technicians, broken down by each app or stage. It can show how many samples have completed analysis, and how many are being actively processed on DNAnexus. With additional configuration, the portal can connect to LIMS to pull in additional metadata from the wet lab stages to reveal how many samples are ready to be analyzed in DNAnexus. The dashboard also includes a search feature, so users can easily find a specific sample of interest. 

When samples have completed the bioinformatics pipeline, clinical experts can use the dashboard to review results, make changes or comments if necessary, and approve or make adjustments before sending it out to a physician. 

Case Management Portal
Users may review metrics and determine the number of samples at each pipeline stage from the main page of the Case Management Portal.

Get Started Using the Case Management Portal 
Start monitoring your pipelines today. If you are a DNAnexus customer, reach out to your Account Manager for additional details. If you are new to DNAnexus, please send us a note at, and we’ll chat about your goals.