BMC Bioinformatics

Table 2 Comparison of embedding models used in this study

From: Can large language models understand molecules?

Model	Dim. Size	# Layers	# Parameters	Speed\(^* (s)\)
Morgan FP (Radius=2)	1024	Not applicable	Not applicable	0.0015
BERT	768	12	110 M	2.9777
ChamBERTa	384	3	3 M	4.8544
MolFormer	768	12	44 M	20.9644
GPT	1536	96	175 B	0.2597
LLaMA	4095	32	7 B	50.8919
LLaMA2	4095	32	7 B	51.6308

*Speed of generating embedding. Speed is dependent on the machine

Back to article page

ISSN: 1471-2105

Contact us

General enquiries: journalsubmissions@springernature.com