Methods | MLEE | GE09 | GE11 | ||||||
---|---|---|---|---|---|---|---|---|---|
P \((\%)\) | R \((\%)\) | F1 \((\%)\) | P \((\%)\) | R \((\%)\) | F1 \((\%)\) | P \((\%)\) | R \((\%)\) | F1 \((\%)\) | |
Large language models (LLMs) | |||||||||
ChatGPT-3.5 (0-shot) | 33.02 | 30.17 | 31.53 | 17.53 | 26.51 | 21.10 | 14.69 | 28.00 | 19.27 |
ChatGPT-4 (0-shot) | 35.40 | 34.48 | 34.93 | 17.92 | 27.01 | 21.55 | 15.28 | 29.33 | 20.09 |
ChatGPT-3.5 (5-shot ICL) | 43.75 | 40.24 | 41.92 | 20.54 | 29.50 | 24.22 | 23.53 | 32.00 | 27.12 |
ChatGPT-4 (5-shot ICL) | 44.63 | 42.10 | 43.33 | 21.46 | 31.07 | 25.39 | 24.51 | 33.33 | 28.25 |
Feature-based supervised learning models | |||||||||
HASH [31] | – | – | – | 79.83 | 56.02 | 65.84 | – | – | – |
SVM-CRF [9] | – | – | – | 69.96 | 64.28 | 67.00 | – | – | – |
Bio-SVM\(\dagger\) [10] | 75.56 | 81.29 | 78.32 | – | – | – | – | – | – |
TSVM\(\dagger\) [7] | 80.35 | 79.16 | 79.75 | 75.94 | 68.31 | 71.01 | 68.09 | 76.41 | 72.01 |
Representation-based supervised learning models | |||||||||
BiLSTM-FastText [35] | 77.89 | 78.28 | 78.08 | 68.21 | 58.55 | 63.01 | 68.44 | 65.26 | 66.81 |
DeepEventMine [11] | 79.37 | 78.86 | 79.12 | – | – | – | 72.05 | 68.89 | 70.43 |
TEES-CNN [25] | 81.49 | 78.43 | 79.93 | – | – | – | 73.32 | 68.72 | 70.95 |
RecurCRFs [16] | 81.12 | 79.15 | 80.28 | 76.42 | 70.45 | 73.24 | – | – | – |
SemPRE [20] | 79.73 | 81.44 | 80.58 | 71.70 | 71.99 | 71.42 | 73.36 | 70.83 | 71.93 |
ResLSTM [23] | 79.89 | 81.61 | 80.74 | – | – | – | – | – | – |
Tree-LSTM [8] | 82.24 | 80.20 | 81.21 | – | – | – | – | – | – |
BioLSL (Ours) | 80.71 | 83.79 | 82.25 | 74.51 | 76.34 | 75.41 | 78.37 | 71.67 | 74.79 |