CoMIT: a bioinformatic pipeline for risk-based prediction of COVID-19 test inclusivity

Walker, Diane M.; Smith, Wendy A.; Gale, Lia; Wolff, Jacob T.; Healy, Connor P.; Van Hollebeke, Hannah F.; Stephenson, Ashlie; Kim, Marianne

doi:10.1186/s12859-025-06046-y

Research
Open access
Published: 12 February 2025

CoMIT: a bioinformatic pipeline for risk-based prediction of COVID-19 test inclusivity

Diane M. Walker¹^na1,
Wendy A. Smith¹^na1,
Lia Gale¹,
Jacob T. Wolff¹,
Connor P. Healy¹,
Hannah F. Van Hollebeke¹,
Ashlie Stephenson¹ &
…
Marianne Kim¹

BMC Bioinformatics volume 26, Article number: 51 (2025) Cite this article

827 Accesses
Metrics details

Abstract

Background

The global Coronavirus Disease 2019 (COVID-19) pandemic highlighted the need to quickly diagnose infections to identify and prevent viral spread in the population. In response to the pandemic, BioFire Defense leveraged its PCR-based “lab-in-a-pouch” technology for expedited development of the BioFire® COVID-19 Test, a novel in vitro diagnostic detecting SARS-CoV-2 nucleic acid in human samples. Following clearance of an in vitro diagnostic device, regulatory bodies such as the U.S. Food and Drug Administration (FDA) require regular post market surveillance to monitor test performance against viral lineages circulating in the field, using predictive in silico inclusivity evaluations. Exponential increases in the number of sequences deposited in bioinformatic repositories such as GISAID, during the pandemic, impeded progress in meeting these post market requirements. In response, BioFire Defense developed a new bioinformatic tool to overcome scalability problems and the loss of accuracy encountered with the standard inclusivity method.

Results

The Coronavirus Monitoring for Inclusivity Tool (CoMIT) uses the Variant Sorter Algorithm to sidestep multiple sequence alignments, a significant barrier inherent in the standard inclusivity method. The implementation of CoMIT and its Variant Sorter Algorithm are described. Automated summary tables and visualizations from a typical inclusivity evaluation are presented. We report our approach to filter and display relevant information in the pipeline outputs using risk factors tied to test performance.

Conclusions

BioFire Defense has developed CoMIT, an automated bioinformatic pipeline for efficient processing and reporting of variant inclusivity from the GISAID EpiCoV™ repository. This tool ensures continuous and comprehensive post market evaluations of BioFire COVID-19 Test performance even from datasets large enough to impede standard inclusivity analyses. CoMIT’s low computational space complexity and modular code allow this tool to be generalized for inclusivity monitoring of multianalyte or single analyte tests with complex assay designs and/or highly variable targets. CoMIT’s databasing capabilities and metadata handling hold the potential for new investigations to improve readiness for future outbreaks.

Background

The Coronavirus Disease 2019 (COVID-19) pandemic brought challenges never faced in the modern era, necessitating accelerated timelines and processes to address urgent public health needs. The global outbreak spawned the rapid development of diagnostic tests capable of detecting the presence of SARS-CoV-2 (the etiological agent of COVID-19) in human samples [1,2,3,4,5,6]. In response, BioFire Defense leveraged its existing BioFire® FilmArray® PCR-based technology to develop a test specific for the identification of SARS-CoV-2 nucleic acid from patient samples. On March 24, 2020, BioFire Defense received initial Emergency Use Authorization (EUA) by the U.S. Food and Drug Administration (FDA) for the BioFire® COVID-19 Test. A 510 k clearance was granted for the BioFire® COVID-19 Test 2 on November 1, 2021, which has identical chemistry to the EUA version, becoming the first single-analyte, PCR-based COVID-19 in vitro diagnostic (IVD) device to receive FDA clearance.

Over the course of the pandemic, the SARS-CoV-2 genome evolved rapidly, resulting in a burgeoning population of genomic variants. The Global Initiative on Sharing All Influenza Data (GISAID) EpiCoV™ database was quickly organized as the public sequence repository for compiling viral genomes from human cases [7, 8] and many countries undertook large scale sequencing efforts. The number of deposited sequences was beginning to accelerate when the BioFire® COVID-19 Test EUA was granted, just 13 days after the World Health Organization (WHO) declared the COVID-19 outbreak a global pandemic.

As more sequence data became available, the need to frequently assess SARS-CoV-2 variants and their potential impacts to test performance was quickly apparent [9]. Such evaluations of test inclusivity must consider a combination of genomic factors and lineage-associated clinical phenotypes (such as increased transmissibility) in order to make informed decisions regarding when corrective actions or mitigations may be needed. For example, mutations falling within primer binding regions can reduce template-primer affinity and extensibility to retard or prevent PCR amplification [9,10,11,12]. This risk may increase in the case of highly transmissible variants. Lineages harboring these mutations may quickly gain prominence in the population and escape detection in clinical specimens, especially with low viral titers. FDA and other regulatory bodies require regular viral sequence monitoring of authorized and cleared products to ensure these tests continue to identify positive cases, including emergent strains circulating both in the US and globally [13].

Challenges in the in-silico inclusivity evaluation process

In silico inclusivity evaluations use publicly available sequence data to approximate risks to detection in the field. The process includes building a multiple sequence alignment (MSA) of intended sequence targets (inclusive sequences) and comparing nucleotide changes across primer binding regions against a reference sequence. Figure 1 reports high-level steps for standard in silico inclusivity evaluations of SARS-CoV-2 variants, similar to inclusivity processes reported in the literature [9]. Online tools such as the Basic Local Alignment Search Tool (BLAST) can also be used, where amplicons or primers are queried against the National Center for Biotechnology Information (NCBI) sequence library [14], although limitations exist with this method [15]. At BioFire Defense, datasets for inclusive sequences are generally small and their evaluations managed using the MSA-based approach outlined in Fig. 1.

As global outbreaks and subsequent sequence submissions caused a surge in publicly available SARS-CoV-2 sequence data, concerns grew for monitoring test performance. The combination of increasingly large data sets and the accelerated demand for new analyses exposed limitations in the standard in silico inclusivity evaluation process. Computational bottlenecks in the MSA step resulted in increased failures and reduced accuracy of large alignments, slowing the evaluation progress and complicating data interpretation. Figure 2 shows the number of complete, high coverage, human-host SARS-CoV-2 sequences collected from January 2020 through August 2022 and submitted to the GISAID EpiCoV™ database through 2 November 2022. The exponential growth of sequence datasets combined with the demands of responding to FDA monitoring requirements and customer inquiries necessitated development of a more scalable and reliable evaluation approach. Here, we describe a fully automated and adaptive bioinformatic pipeline for comprehensive monitoring and reporting BioFire COVID-19 Test performance using a predictive, risk-based strategy.

Implementation

The variant sorter algorithm

CoMIT is written as an R package. The code executes a pipeline that initially builds an empty database housing data for an evaluation (an existing database can also be updated). The pipeline takes GISAID EpiCoV™ sequence submissions (FASTA) and associated metadata (TSV) files as the input. Sequences are processed by the Variant Sorter Algorithm and the resulting data are added to the database. The BioFire COVID-19 Test uses seven nested and multiplexed SARS-COV-2 target regions, or assays, requiring inclusivity surveillance of 30 individual primers. A diagram of the Variant Sorter Algorithm – the main portion of the CoMIT pipeline – with its inputs, high-level operations, and database file output is shown in Fig. 3.

The Variant Sorter Algorithm uses an iterative string-matching comparison to identify primer variants (i.e., mutations exclusively found within primer binding regions of the test). For each sequence in the submission set, a small search space is defined around the presumptive location of the primer binding region. Novel primer variants identified by the Variant Sorter Algorithm are assigned a unique identification number and their mutation characteristics (e.g., primer affected, position, type) are captured in the database. These steps are repeated for each primer binding region. This process ensures that only a small proportion of sequences require alignment for classification. A recent inclusivity run showed 0.2% of sequences required alignment (50/21947 sequences). Pairwise alignments are performed using the DECIPHER R package [16]. Sequences processed by the Variant Sorter Algorithm are stored in a relational database generated for the run (alternatively, an existing database can be specified to which new data is appended during the run). The database holds 11 tables storing sequence, assay, and primer variant data. A database schema is provided as an additional file (Additional File 1).

Structured query language database processing and visualization code

After algorithm processing, other CoMIT package functions can be run on a database to generate summary tables and visualizations. Risk criteria based on primer variant prevalence, mutation severity, co-occurrence, and variant lineage type help identify primer variants predicted to be the highest risk to inclusivity. These criteria are applied in different ways to filter, highlight, and stratify data and can be modified, as needed. Figure 4 shows a flowchart for a typical in silico inclusivity analysis using the CoMIT pipeline.

Risk-based reporting

Key factors for evaluating risk are described in this section, including mutational severity, co-occurrence, prevalence, and variant lineage. Figure 5 provides a summary of these considerations. Sequences harboring mutations to any test primers are identified in the evaluation, and characteristics of mutations (such as mutation position along the primer-spanning region) are leveraged to predict impacts at the individual assay level. Mismatches falling within the last five bases of the 3’ end of a primer binding region are more likely to interfere with amplification [11, 19,20,21,22]. Therefore, sequences carrying 3’ end mutations are labeled as a severity risk in the evaluation (Fig. 5, orange circle).

When considering the risk of complete test failure, inclusivity evaluations may be further complicated by complex test designs. The BioFire COVID-19 Test 2 leverages a nested, multiplex PCR approach, targeting five independent regions of the SARS-CoV-2 genome. Detection of the expected amplicon from only one region is required to successfully elicit a SARS-CoV-2 detected result. Any sequences with mismatches to all or multiple assay primers are identified as a co-occurrence risk in the assessments (Fig. 5, grey circle).

Genetic evolution of SARS-CoV-2 variants resulting in increased pathogenicity of the virus in human hosts can have significant public health impacts [23]. The Centers for Disease Control and Prevention and WHO evaluate and classify emerging variants based on potential or known impacts to effectiveness of medical treatments, severity of disease, and transmissibility [18, 24]. Variant lineages associated with official designations given by US and global health organizations (e.g., Variants of Concern) are considered a prevalence risk (Fig. 5, yellow circle). The prevalence risk is also assessed for unclassified variant lineages when represented at a significant frequency in the sequence dataset [13].

Sequences characterized by an overlap of any two risk factors (Fig. 5, regions indicated by 1–3) would be considered high risk, whereas sequences characterized by all risk factors (Fig. 5, area indicated by 4) are of the greatest concern due to the potential negative impacts on diagnostic accuracy. Sequences carrying primer spanning mutations flagged as high risk in these predictive evaluations are escalated for wet benchtop testing and/or thermodynamic modeling analysis [15, 25, 26].

Results

Five automated visualizations were developed to summarize processed sequence data and enable clear and concise reporting of results. An example of visualization outputs for a candidate evaluation are shown as figures and tables (Tables 1, 2 and 3, Figs. 6 and 7) and as additional files (Additional Files 2–5). Each output in the pipeline features two or more risk indicators (i.e., co-occurrence, prevalence, lineage, and growth). All outputs (except Fig. 7) can be filtered based on mutational severity (i.e., when a primer-spanning mutation is positioned within 10 base pairs of the 3’ end).

Table 1 Sequence Dataset Representation for a Typical in silico Inclusivity Evaluation

Full size table

Database breakdown table

The Database Breakdown Table provides a summary of collection date, variant identity (Pangolin lineage and WHO label), sequence frequencies and frequency changes of variants included in the analysis. These data are taken from the GISAID metadata associated with each sequence analyzed and can be used to clearly summarize the dataset included in the evaluation.

Table 1 shows an example Database Breakdown table for a typical in silico inclusivity evaluation. Date columns refer to sample collection dates, which should include sequences from patient samples collected in the most recent three-month period. Sequence frequencies are reported for the entire dataset (All Sequences) and stratified by Pangolin lineage [27] and WHO label (i.e., Variants of Concern, Variants of Interest, Variants Under Monitoring); these annotations are updated for every evaluation using a variant mapping file sourced from publicly available information [17, 18]. Frequency changes are compared between the one-month sequence data (newest) and a superset of the most recent three months. Growth is represented as yellow shading when delta frequencies increase or decrease between three- and one-month sequence datasets. Variant lineages with notable frequency changes in the most recent month (i.e., greater than or equal to five percent change) are shaded in this example. Delta frequency thresholds can be modified, as needed.

Identifiable mutations in each assay region

As shown in Fig. 3, the CoMIT tool first bins data according to previously identified mutations in the assay primer regions. These mutations are summarized in a table like that shown in Table 2. The number and frequency of each lineage is recorded as an indicator of prevalence within the dataset, in this example over a 3-month period. Within these lineages, the frequencies of sequences with observed mutations are recorded in each assay column (e.g. assay 2a, 2c etc.). This gives visibility to assays which may have reduced sensitivity with emerging lineage variation. In the case of the COVID-19 Test, the Test is comprised of seven assays (2a, 2c-2 g); the co-occurring mutated sequences column indicates the frequency of sequences within each lineage that contains, in this example, mutations in 5 or 6 of the assays on the Test, and that could be potentially at risk of missed or late detection.

Table 2 Summary table of sequences containing identifiable mutations across multiple assays

Full size table

Table 2 provides an example summary table for sequences containing identifiable primer spanning mutations across COVID-19 Test assays. The Sequences by Lineage column shows the most recent three-month period sequence frequencies (count and rate) stratified by lineage exactly matching the Database Breakdown table. The remaining columns report lineage-stratified frequencies of sequences harboring an assay-specific primer variant (i.e., a primer-spanning mutation or set of mutations) (Mutated Sequences by Assay) and sequences with co-occurring primer variants across multiple assays (Co-occurring Mutated Sequences). The bottom row shows sequence frequencies of primer variants by assay (Summary: All Sequences by Assays). Sequence frequencies below one percent are shaded in blue; frequencies equal to or greater than five percent are shaded yellow. A summary table filtering for 3’ end mutations can be generated to represent high-risk mutations (Additional File 2). A version of this table showing an expanded section for Co-occurring Mutated Sequences is also available as an additional file (Additional File 3).

Table 3 provides a detailed breakdown of sequences with mutations under multiple assay primers. Specifically, it shows the number of assays affected by a mutation under those primers, organized by lineage. The columns show the number of assays affected, increasing from left to right. Sequence frequencies (counts and rates) are reported by lineage (Sequences by Lineage) and by increasing co-occurrence risk based on the total number of assays impacted (# Assays Affected, columns 0 through ≥ 6). The Summary: All Sequences by Assays section shows sequence frequencies based on co-occurrence risk. Blue shading indicates sequence frequencies below one percent; frequencies equal to or greater than five percent are shaded yellow. A version of this table with filtering for 3’ end mutations is provided (Additional File 4).

Table 3 Summary table of sequences containing identifiable co-occurring mutations

Full size table

Figure 6 visualizes the lineages with mutations under multiple sets of assay primers, along with their frequencies (counts and rate). All primer-spanning mutations are reported. Purple shading indicates the impact of primer variant combinations on each individual assay and combinations when they are compounded across assays (indicating co-occurrence risks). Sequence counts and percent frequency for each combination are shown at the bottom of the figure. This figure can be filtered for 3’ end mutations only (Additional File 5).

Trending of primer variants

Post-market surveillance not only tracks newly emerging sequence variants but also trends their frequencies. This monitoring helps assess the risk of missed detection based on prevalence. Variant prevalence becomes one of the risk criteria used to assess whether the diagnostic test is still functional in an evolving outbreak.

Figure 7 summarizes characteristics of individual primer variants at or above 0.1% frequency in the sequence dataset compared with the previous 3-month period (note: these datasets represent nonoverlapping time periods totaling six months). The figure consists of three sections: a histogram showing assay location of primer variants and their percent frequencies in the current dataset (top section), a table detailing the primer variant characteristics and trending based on a comparison with the previous 3-month period (middle section), and a stacked bar graph displaying the primer variant distribution across lineages (bottom section). Trending symbols indicate a 0.1% frequency change in the sequence dataset compared with the previous 3-month period. An equivalent symbol represents a delta frequency less than 0.1%. The delta frequency thresholds defining inclusion criteria and trending symbols can be adjusted, as needed.

Taken together, the automated outputs of the CoMIT pipeline provide summary tables and visualizations with risk-based features and modifiable thresholds for added flexibility in reporting evaluation results.

Discussion

CoMIT was developed specifically for in silico inclusivity evaluations of the BioFire COVID-19 Test, a single analyte, PCR-based IVD designed for use with BioFire® FilmArray® Systems. Evaluating SARS-CoV-2 genomes as they evolve through human infection is required by regulatory bodies to ensure reliable detection of COVID-19 cases in the US and globally [28]. The standard inclusivity approach includes a sequence alignment step, which presented a computational bottleneck with the increasing volume and rate of sequence data needing to be analyzed. The Variant Sorter Algorithm identifies and catalogues primer variants using iterative string matching and binning functions, an efficient process to sidestep the predominance of MSAs in the standard approach. The bioinformatic analysis and visualization pipeline handles large volumes of sequence data with automated results reporting and databasing capabilities for regular comprehensive post market in silico inclusivity monitoring. CoMIT’s low computational space complexity requires minimal memory, allowing it to be run on a personal computer.

In silico inclusivity monitoring serves many purposes and its results inform different audiences: online to customers in the BioFire COVID-19 Test Reactivity Technical Note [29], to regulators in FDA submissions, and companywide as required for internal trending purposes. For added flexibility and clarity in reporting, the pipeline applies risk-based parameters to summary tables and visualizations, as primer variants with these characteristics pose the greatest risks to overall the test performance. For example, figures and tables can be filtered to reporting only primer variants likely to disrupt the PCR reaction (i.e., 3’ end mutations). Co-occurrence, lineage, frequency, and delta frequency (i.e., growth) are also featured prominently in the outputs. The visualizations leverage auto-generated shading, data stratification, and symbols to identify prevalence risks in currently circulating variants using both lineage associations and growth characteristics of unclassified sequence populations. These risk criteria can be adjusted as needed to align with FDA or other post market requirements. High risk sequences can be flagged for wet benchtop testing to empirically confirm any predicted performance impacts.

Despite being built for in silico inclusivity testing of COVID-19 tests, CoMIT has been developed as an accessible and user-friendly R package. Researchers can easily download and utilize CoMIT to query and test the inclusivity of their own primer sequences against any organism in the GISAID database. Details on accessing and downloading the CoMIT R package can be found in the Availability and Requirements section.

CoMIT’s Variant Sorter Algorithm has been adapted at BioFire Defense for inclusivity monitoring of different pathogens. We developed a modified version of CoMIT to evaluate in silico inclusivity of the Lassa virus for the BioFire® Global Fever Special Pathogens Panel (an IVD cleared by the FDA). The Lassa assays have complex designs because of the genetic diversity of the Lassa virus species which can be as high as 24.6% between lineages [30]. The pipeline and algorithm are currently being expanded for processing sequences for multianalyte panels.

CoMIT is primarily designed to assess the potential impact of sequence variants on the efficacy of a diagnostic test which can be used to issue a warning when a new variant is likely to escape detection. However, CoMIT could also be leveraged in epidemiology to monitor viral evolution, provide early detection of emerging variants, and inform outbreak response. CoMIT’s databasing capability allows for analyses of sequence datasets to gain new information. GISAID metadata contains details on sequence submissions, including variant information (i.e., Nextstrain clade, variant and Pangolin lineage), case data (i.e., location of exposure, demographics, reporting hospital/laboratory), and amino acid change constellation summaries. Tracking changes in viral properties such as rate and location of spread, disease severity, and variant lineage could aid in forecasting where new outbreaks may occur and when appropriate countermeasures may be needed.

The CoMIT pipeline is currently limited to evaluate only primer binding sites. Successful detection on the BioFire® FilmArray® Systems depends on post run melting temperature (Tm) analyses, and variants with mutations (such as large indels in the amplicon region) that fundamentally change characteristics of the amplified region and could result in a missed detection. We are developing an amplicon tracking feature for monitoring changes to the inner amplicon region. Our inner amplicon tracker will leverage thermodynamic models to predict when sequence changes, such as indels or accumulation of single nucleotide polymorphisms, impact Tm. This new feature highlights the adaptability of CoMIT for improved predictions.

Conclusion

CoMIT leverages publicly available SARS-CoV-2 sequence data and metadata in GISAID EpiCoV™ repository to predict BioFire COVID-19 Test performance in the field. The pipeline can process large datasets with low computational space complexity and leverages adjustable, risk-based summarization features for easily digestible reports of highly complex testing targets. Its flexible database design and improved metadata handling provide opportunities for new epidemiological investigations of both emerging and archived case data, with the potential to improve readiness for future outbreaks as an early warning system for new variants.

Availability and Requirements

Project name: The Coronavirus Monitoring for Inclusivity (CoMIT) Pipeline Project

Project home page: https://bitbucket.org/biofiredefense/comit/src/main/

Operating system: Platform independent

Programming language: R

Other requirements: Package dependencies

License: CC-BY-NC4.0

Any restrictions to use by non-academics: Yes

Availability of data and materials

The in-silico inclusivity evaluation presented in this report is based on 24,911 SARS-CoV-2 sequences and associated metadata available from January 1, 2023, up to March 31, 2023, via gisaid.org/EPI_SET_230509yp. A Supplemental Table describing the sequence dataset is provided as an additional file (Additional File 6). The CoMIT R Package and instructions for its use are available at https://bitbucket.org/biofiredefense/comit/src/main/.

Abbreviations

COVID-19:: Coronavirus disease 2019
CoMIT:: Coronavirus monitoring for inclusivity tool
EUA:: Emergency use authorization
FDA:: U.S. food and drug administration
IVD:: In vitro diagnostic
GISAID:: Global initiative on sharing all influenza data
WHO:: World health organization
MSA:: Multiple sequence alignment
Tm:: Melting temperature

References

Gao J, Quan L. Current status of diagnostic testing for SARS-CoV-2 infection and future developments: a review. Med Sci Monit Int Med J Exp Clin Res. 2020;17(26):e928552.
Google Scholar
Nguyen NNT, McCarthy C, Lantigua D, Camci-Unal G. Development of diagnostic tests for detection of SARS-CoV-2. Diagnostics. 2020;10(11):905.
Article PubMed PubMed Central CAS Google Scholar
Ravi N, Cortade DL, Ng E, Wang SX. Diagnostics for SARS-CoV-2 detection: a comprehensive review of the FDA-EUA COVID-19 testing landscape. Biosens Bioelectron. 2020;1(165):112454.
Article Google Scholar
Jayamohan H, Lambert CJ, Sant HJ, Jafek A, Patel D, Feng H, et al. SARS-CoV-2 pandemic: a review of molecular diagnostic tools including sample collection and commercial response with associated advantages and limitations. Anal Bioanal Chem. 2021;413(1):49–71.
Article PubMed CAS Google Scholar
Jalandra R, Yadav AK, Verma D, Dalal N, Sharma M, Singh R, et al. Strategies and perspectives to develop SARS-CoV-2 detection methods and diagnostics. Biomed Pharmacother. 2020;1(129):110446.
Article Google Scholar
Mitchell SL, St K, George DD, Rhoads SM, Butler-Wu VD, McNult P, Miller MB. Understanding, verifying, and implementing emergency use authorization molecular diagnostics for the detection of SARS-CoV-2 RNA. J Clin Microbiol. 2020;58(8):e00796.
Article PubMed PubMed Central CAS Google Scholar
GISAID Initiative [Internet]. [cited 2022 Dec 20]. Available from: https://www.epicov.org/epi3/frontend#56093
Khare S, Gurry C, Freitas L, Schultz MB, Bach G, Diallo A, et al. GISAID’s Role in pandemic response. China CDC Wkly. 2021;3(49):1049–51.
Article PubMed PubMed Central Google Scholar
Khan KA, Cheung P. Presence of mismatches between diagnostic PCR assays and coronavirus SARS-CoV-2 genome. R Soc Open Sci. 2022;7(6):200636.
Article Google Scholar
Cha RS, Thilly WG. Specificity, efficiency, and fidelity of PCR. Genome Res. 1993;3(3):S18-29.
Article CAS Google Scholar
Bru D, Martin-Laurent F, Philippot L. Quantification of the detrimental effect of a single primer-template mismatch by real-time PCR using the 16S rRNA gene as an example. Appl Environ Microbiol. 2008;74(5):1660–3.
Article PubMed PubMed Central CAS Google Scholar
Rejali NA, Moric E, Wittwer CT. The effect of single mismatches on primer extension. Clin Chem. 2018;64(5):801–9.
Article PubMed CAS Google Scholar
Policy for Evaluating Impact of Viral Mutations on COVID-19 Tests (Revised) - Guidance for Test Developers and Food and Drug Administration Staff.
BLAST: Basic Local Alignment Search Tool [Internet]. [cited 2023 Feb 24]. Available from: https://blast.ncbi.nlm.nih.gov/Blast.cgi
SantaLucia J. Physical Principles and Visual-OMP Software for Optimal PCR Design. In: Yuryev A, editor. PCR Primer Design [Internet]. Totowa, NJ: Humana Press; 2007 [cited 2022 Sep 20]. p. 3–33. (Walker JM, editor. Methods in Molecular Biology^TM; vol. 402).
Wright ES. DECIPHER: harnessing local sequence context to improve protein multiple sequence alignment. BMC Bioinformatics. 2015;16(1):322.
Article PubMed PubMed Central Google Scholar
Pango designation [Internet]. CoV-lineages; 2023 [cited 2023 Feb 7]. Available from: https://github.com/cov-lineages/pango-designation/blob/106720cbb83f1cd10a55ab537f84967d8b6c2e7a/lineage_notes.txt
Tracking SARS-CoV-2 variants [Internet]. [cited 2022 Nov 8]. Available from: https://www.who.int/activities/tracking-SARS-CoV-2-variants
Rychlik W. Priming efficiency in PCR. Biotechniques. 1995;18(1):84–6.
PubMed CAS Google Scholar
Wu JH, Hong PY, Liu WT. Quantitative effects of position and type of single mismatch on single base primer extension. J Microbiol Methods. 2009;77(3):267–75.
Article PubMed CAS Google Scholar
Stadhouders R, Pas SD, Anber J, Voermans J, Mes THM, Schutten M. The effect of primer-template mismatches on the detection and quantification of nucleic acids using the 5′ nuclease assay. J Mol Diagn. 2010;12(1):109–17.
Article PubMed PubMed Central CAS Google Scholar
Kim M, Smith WA, Van Hollebeke H. Personal communication.
Aleem A, Akbar Samad AB, Slenker AK. Emerging Variants of SARS-CoV-2 And Novel Therapeutics Against Coronavirus (COVID-19). In: StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing; 2022 [cited 2023 Feb 22]. Available from: http://www.ncbi.nlm.nih.gov/books/NBK570580/
CDC. Centers for Disease Control and Prevention. 2020 [cited 2023 Feb 22]. Coronavirus Disease 2019 (COVID-19). Available from: https://www.cdc.gov/coronavirus/2019-ncov/variants/variant-classifications.html
Mann T, Humbert R, Dorschner M, Stamatoyannopoulos J, Noble WS. A thermodynamic approach to PCR primer design. Nucleic Acids Res. 2009;37(13):e95–e95.
Article PubMed PubMed Central Google Scholar
Howson ELA, Orton RJ, Mioulet V, Lembo T, King DP, Fowler VL. GoPrime: development of an in silico framework to predict the performance of real-time PCR primers and probes using foot-and-mouth disease virus as a model. Pathogens. 2020;9(4):303.
Article PubMed PubMed Central CAS Google Scholar
Rambaut A, Holmes EC, O’Toole Á, Hill V, McCrone JT, Ruis C, et al. A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology. Nat Microbiol. 2020;5(11):1403–7.
Article PubMed PubMed Central CAS Google Scholar
Health C for D and R. U.S. Food and Drug Administration. FDA; 2021 [cited 2022 Nov 14]. Policy for Evaluating Impact of Viral Mutations on COVID-19 Tests. Available from: https://www.fda.gov/regulatory-information/search-fda-guidance-documents/policy-evaluating-impact-viral-mutations-covid-19-tests
BioFire® COVID-19 Test [Internet]. BioFire Defense. [cited 2022 Nov 13]. Available from: https://www.biofiredefense.com/covid-19test/
Bowen MD, Rollin PE, Ksiazek TG, Hustad HL, Bausch DG, Demby AH, et al. Genetic diversity among lassa virus strains. J Virol. 2000;74(15):6992–7004.
Article PubMed PubMed Central CAS Google Scholar

Download references

Acknowledgements

We gratefully acknowledge all data contributors, i.e., the Authors and their Originating laboratories responsible for obtaining the specimens, and their Submitting laboratories for generating the genetic sequence and metadata and sharing via the GISAID Initiative, on which this research is based. We thank members of the BioFire Defense Regulatory Affairs and Research and Development departments, including Kristin Casper, Dave Rabiger, and Jason Nielson for their thoughtful reviews of the manuscript. Thanks to Scott Glaittli for his technical expertise on the CoMIT software package.

Funding

This project was funded internally by BioFire Defense, LLC.

Author information

Diane M. Walker and Wendy A. Smith have contributed equally to this work.

Authors and Affiliations

BioFire Defense, LLC, 79 West 4500 South, Suite 14, Salt Lake City, Utah, 84107, USA
Diane M. Walker, Wendy A. Smith, Lia Gale, Jacob T. Wolff, Connor P. Healy, Hannah F. Van Hollebeke, Ashlie Stephenson & Marianne Kim

Authors

Diane M. Walker
View author publications
You can also search for this author inPubMed Google Scholar
Wendy A. Smith
View author publications
You can also search for this author inPubMed Google Scholar
Lia Gale
View author publications
You can also search for this author inPubMed Google Scholar
Jacob T. Wolff
View author publications
You can also search for this author inPubMed Google Scholar
Connor P. Healy
View author publications
You can also search for this author inPubMed Google Scholar
Hannah F. Van Hollebeke
View author publications
You can also search for this author inPubMed Google Scholar
Ashlie Stephenson
View author publications
You can also search for this author inPubMed Google Scholar
Marianne Kim
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

Authors' contributions: Tool conceptualization: D.W., L.G., H.F.VH., W.A.S., and M.K.; software development: D.W., L.G., A.S, J.W., and C.H.; data curation: D.W., LG., and A.S.; evaluation methodology: D.W., L.G., H.F.VH., W.A.S., and M.K.; visualizations: L.G., A.S., D.W., H.F.VH., and W.A.S.; project administration: D.W. and M.K.; database validation: L.G. and H.F.VH.; benchmarking: A.S., code reviews: D.W., L.G., A.S., J.W., and C.H.; literature review: L.G., A.S., D.W., H.F.VH.; writing – original draft: D.W.; writing – review and editing: D.W., L.G., H.F.VH., A.S., W.A.S., J.W., C.H., and M.K; supervision: M.K. All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Marianne Kim.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

W.A. Smith, J.T. Wolff, C.P. Healy, H.F. Van Hollebeke, and M. Kim are current employees of Biofire Defense, LLC; D. Walker, L. Gale, and A. Stephenson are former employees of Biofire Defense, LLC; D. Walker, W.A. Smith, and M. Kim are shareholders of Biofire Defense, LLC.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Walker, D.M., Smith, W.A., Gale, L. et al. CoMIT: a bioinformatic pipeline for risk-based prediction of COVID-19 test inclusivity. BMC Bioinformatics 26, 51 (2025). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12859-025-06046-y

Download citation

Received: 07 January 2025
Accepted: 10 January 2025
Published: 12 February 2025
DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12859-025-06046-y

CoMIT: a bioinformatic pipeline for risk-based prediction of COVID-19 test inclusivity

Abstract

Background

Results

Conclusions

Background

Challenges in the in-silico inclusivity evaluation process

Implementation

The variant sorter algorithm

Structured query language database processing and visualization code

Risk-based reporting

Results

Database breakdown table

Identifiable mutations in each assay region

Trending of primer variants

Discussion

Conclusion

Availability and Requirements

Availability of data and materials

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher's Note

Supplementary Information

Additional file 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Bioinformatics

Contact us