In silico Evaluation of Diagnostic Assays
SARS-CoV-2 continues to accumulate mutations as it persists in the human population. It is essential to continuously evaluate PCR-based diagnostic assays as new SARS-CoV-2 genome sequences become available. To help address this challenge, LANL is computationally screening published PCR-based assays and LANL-designed assays against the increasing number of SARS-CoV-2 sequences being deposited in GenBank. A total of SARS-CoV-2 genomes have been screened with assays as of . An accession table of predicted assay failures can be downloaded here.
Top ranked assays
Assay table
Name | Recall(%) | Perfect match | 1 mismatch | 2 mismatch | Failure |
---|
Full length (i.e. 29 kb or larger) SARS-CoV-2 genomes are accessed from GenBank daily. There are currently xxx full length genomes.
Multiplex-compatible LANL TaqMan PCR assays are designed by iteratively finding the assay with the highest recall (i.e. true positive rate or sensitivity) that does not “interfere” with a previously designed assay in the same multiplex pool. In this context, “interference” includes primer and/or probe heterodimer formation and amplicon overlap. The three LANL assays target the spike protein, RNA-dependent RNA polymerase, and nsp10. In addition to LANL assays, existing assays from USA CDC, China CDC, Charité in Berlin, Hong Kong University, and others [3] [4] [5] [6] [7] [8] [9] [10] are screened against the xxx SARS-CoV-2 genomes. To facilitate comparison between assays, recall is calculated for each assay using the full length SARS-CoV-2 genomes. A predicted false negative is defined as an assay/target combination that has either (a) one or more oligo/target pairwise alignments with 3 or more mismatches, (b) one or more predicted oligo/target melting temperatures less than 40°C, or (c) one or more mismatches in the last two 3’ positions of a primer that are reported by (Li et al, 2004) to inhibit detection by increasing the detection Ct by 2 or more. Melting temperatures are calculated using standard nearest-neighbor thermodynamic parameters [11] [12] [13] [14], as implemented by ThermonucleotideBLAST [15]. This tool uses free energy and melting temperature to predict whether binding occurs between the assay oligonucleotides (primers and probe), and target sequence.
The match quality results for each assay/genome are presented above. The heatmap shows the largest number of mismatches between any of the PCR primers or probe and each of the target genomes. The phylogenetic tree is created using PhaME [16]. Identical SARS-CoV-2 sequences and heatmap patterns are clustered and represented as collapsed branches in the tree (labels are in grey with the number of genomes and regional descriptions obtained from the strain names). A final total of xxx unique genomic lineages are further collapsed and the final tree has xxx leaf nodes. The above visualization is rendered using a custom PhyD3 phylogenetic tree viewer [17]. To facilitate comparison between assays, recall (true positive rate or sensitivity) is calculated for each assay. The filtered genomes mentioned above are used for determining mismatches to each oligonucleotide. The oligonucleotide with the most mismatches determines failure status of assay. Genomes with 3 or more mismatches count as false negatives, and 2 or fewer mismatches are true positives.
This research was supported by LANL (20200732ER), by DTRA (CB10152 and CB10623) and by the DOE Office of Science (KP160101), through the National Virtual Biotechnology Laboratory, a consortium of DOE national laboratories focused on response to COVID-19, with funding provided by the Coronavirus CARES Act. The source code of the web-app is available at GitHub.
Contributors: Po-E Li (po-e@lanl.gov), Adan Myers y Gutierrez (adanm@lanl.gov), Jason Gans (jgans@lanl.gov), Patrick Chain (pchain@lanl.gov)