Skip to main content

Table 4 AGATHA-2015 model performance (ROC AUC) evaluated on different data sources with the same cut-off date (where possible)

From: Dyport: dynamic importance-based biomedical hypothesis generation benchmarking technique

Semantic Pair

Text mining

Benchmark

Non-cross-ref DBs

Dataset size

Gene or Genome \(\leftrightarrow\) Gene or Genome

0.858

0.612

0.530

42625

Gene or Genome \(\leftrightarrow\) Organic Chemical

0.910

0.733

0.575

27060

Organic Chemical \(\leftrightarrow\) Organic Chemical

0.905

0.922

0.679

15081

Amino Acid, Peptide, or Protein \(\leftrightarrow\) Gene or Genome

0.897

0.695

0.591

14542

Gene or Genome \(\leftrightarrow\) Pharmacologic Substance

0.901

0.702

0.592

7843

Disease or Syndrome \(\leftrightarrow\) Organic Chemical

0.900

0.856

0.660

7612

Amino Acid, Peptide, or Protein \(\leftrightarrow\) Organic Chemical

0.906

0.820

0.564

6072

Organic Chemical \(\leftrightarrow\) Pharmacologic Substance

0.898

0.890

0.616

4070

Disease or Syndrome \(\leftrightarrow\) Disease or Syndrome

0.853

0.854

0.666

2893

Disease or Syndrome \(\leftrightarrow\) Gene or Genome

0.847

0.690

0.575

2442

Non-stratified ROC AUC

0.886

0.715

0.579

130240

  1. Database records lacking literature cross-references (column 3) were randomly selected due to unavailability of temporal information for them