Skip to main content

Table 4 The results of molecule optimization based on specified editing positions on the small molecule drug dataset, with the evaluation measured by the hit ratio (%) of the property changes in each objective-oriented prompt. Statistically significant improvement (t-test over 5 different dataset splits, p-value< 0.05) is highlighted with bold text

From: Sculpting molecules in text-3D space: a flexible substructure aware framework for text-oriented molecular optimization

Drug Molecule

Objectives

Prompts

Validity (%)

Hit Ratio (%)

  

GPT-3.5

Ours

GPT-3.5

Ours

Apixaban

Electron-donating substitution

High HOMO energy

55

73

1

18

Empagliflozin

More hydrophilic

Large polarity

0

15

0

8

Ibrutinib

Higher permeability

High permeability

19

33

0

2

Dapagliflozin

Large aqueous solubility

More soluble

0

35

0

13

Osimertinib

Electron-withdrawing substitution

Low HOMO energy

30

52

0

39

Olaparib

Electron-withdrawing substitution

Low HOMO energy

34

30

1

2

Abemaciclib

More hydrogen bonding interactions

More hydrogen bond acceptors

20

42

0

21

  

More hydrogen bond donors

0

45

0

25

Pomalidomide

Electron-withdrawing substitution

Low HOMO energy

24

64

11

10

Tafamidis

More hydrogen bonding

interactions

More hydrogen bond acceptors

46

61

0

13

 

More hydrogen bond donors

10

48

0

5

Lisdexamfetamine

To control the hydrolysis

rate

High HOMO energy

0

11

0

8

 

Low HOMO energy

0

19

0

5

  1. Note that the prompts shown in the table have been abbreviated. The prompts actually used in 3DToMolo can be found in Table 2, while we formulated our request to GPT\(-\)3.5 as follows: “Optimize the following molecule: <SMILES> to achieve the desired properties: <specified properties>, while ensuring that the specified substructure <substructure> remains unchanged. Provide 10 different and valid results. Note in this experimental setting, we employed a multi-run approach to facilitate the generation of a total of 100 optimized results for each drug molecule. Specifically, each iteration obtained 10 distinct outputs from GPT\(-\)3.5, with a total of 10 iterations conducted. In most cases, GPT\(-\)3.5 fails to meet the requirements, as the generated molecules do not conform to either the specified structural criteria or the desired properties