Skip to main content

Table 1 Characteristics of the Transformer-based models and pre-training details (medical-domain models are italized in rows 2-4)

From: Hybrid natural language processing tool for semantic annotation of medical texts in Spanish

Model

PT corpus size

#A

#H

#L

#P

#V

RoBERTa EHR (bsc-bio-ehr-es)

>1B tok

12

768

12

125M

52K

EriBERTa (EriBERTa-base)

900M tok

12

768

12

125M

50K

CLIN-X-ES

(xlm-roberta-large-spanish-clinical)

790MB

16

1024

24

550M

250K

mBERT

(bert-base-multilingual-cased)

2.5T

12

768

12

110M

110K

mDeBERTa

(mdeberta-v3-base)

2.5T

12

768

12

190M

250K

  1. A: attention heads; B: billion; H: hidden size; K: thousand; L: number of layers; M: million;
  2. MB: megabytes; P: parameters; PT: pre-training; T: terabytes; Tok: tokens; V: vocabulary size