DNAnexus R&D Report: Benchmarking Germline Variant Calling Pipelines

Variant calling is a staple of genome analysis. Developments in computational methods and algorithms have provided a plethora of programs and pipelines for calling secondary variants in both whole genome and whole exome sequence data. With so many options to choose from, picking a pipeline for an analysis project can be challenging.

DNAnexus researchers are helping to narrow that search by benchmarking the performance of popular variant calling pipelines, determining the factors that can affect performance, and matching appropriate pipelines to specific scientific needs. They describe the results of comparing six of these well-known pipelines in a recent report. The report includes detailed data on the versions of algorithms, compute instance type as well as data on runtimes and the accuracy of their variant calls. The team used variant truth sets from the Genome In a Bottle (GIAB) Consortium as the reference for the analysis. This dataset offers high confidence SNPs, INDELs, and homozygous reference regions generated using data from seven human genomes.

To test the pipelines, the research team used whole exome and whole genome sequence reads from NIST. Specifically, they downloaded five whole exome and seven whole genome samples. For the variant truth set, the researchers used high-confidence regions from version 3.3.2 of the NIST dataset. In terms of the pipelines, the team used four off-the-shelf applications from DNAnexus’ tools library, which offers access to containerized software tools and associated dependencies for easy implementation in compute environments. They packaged two additional apps for the comparison tests. The researchers processed each sample from raw reads through to variant calls using default pipeline parameters and settings. They used standards set by the Global Alliance for Genomics and Health (GA4GH) for assessing the accuracy of small germline variant calls. All six pipelines were each run on single AWS instances.

Label	Software	Version	AWS Instance Type⁺	Variant Calling Algorithm	DNAnexus App
gatk4	BWA-MEM + GATK	0.7.17-r1188 + gatk-4.1.4.1	c5d.18xlarge	GATK HaplotypeCaller	Upon request
parabricks_deepvariant	Parabricks Pipelines DeepVariant	v3.0.0_2	g4dn.12xlarge	DeepVariant	pbdeepvariant
parabricks_germline	Parabricks Pipelines Germline	v3.0.0_1	g4dn.12xlarge	GATK HaplotypeCaller	pbgermline
sentieon_dnascope	Sentieon (DNAscope)	sentieon_release_201911	c5d.18xlarge	DNAscope	sentieon_fastq_to_vcf
sentieon_haplotyper	Sentieon (Haplotyper)	sentieon_release_201911	c5d.18xlarge	GATK HaplotypeCaller	sentieon_fastq_to_vcf
strelka2	Strelka2	2.9.10	c5d.18xlarge	Strelka2	strelka2_germline

Table 1: Germline variant calling software. We intentionally selected instance types with similar AWS hourly rates.

In general, the pipelines performed comparably. They called SNPs and INDELs in both the whole-exome and whole-genome samples, with over 99% recall and precision. There is some variation between pipelines that is likely due to limitations of the sample data collection methods, as well as the reference build that was used. The DNAnexus researchers called variants against both GRCh38 and hs37d5 builds. The GRCh38 build is more complete but contains lots of repetitive sequences in some genomic regions that pose problems for pipelines, and result in more false negative and false positive calls. Lastly, as noted above, this test used default pipeline parameters. In practice, researchers tweak these depending on the genomic region or loci they are studying. These changes influence pipelines’ precision and recall rates.

The runtimes were also comparable, with most pipelines performing notably faster than GATK4. In practice, these runtimes will vary depending on the type of hardware, the parameters, and the efficiency of the algorithms used. Also, the rates reported here will likely change as developers update their software, and reference datasets continue to evolve and improve. GIAB has already updated at least one of truth datasets that was used in this study. DNAnexus researchers plan to publish a follow-up report to this one that will include updated information and analysis.

As with many things, the best pipeline for an analysis project boils down to research needs and available resources. The pipelines used in this report represent the current state-of-the-art and provide a useful starting point for making decisions about variant calling infrastructure. If you have a particular use case you would like to discuss, please reach out. We would be happy to talk with you.