Fig. 1

Overview of 3DToMolo. a The alignment of textual description and chemical structures of molecules, which is realized through contrastive learning of the two latent representations: molecule structure encoding with its paired text embedding. b Conditional diffusion model. In order to maintain molecule optimization in alignment with the prompt, conditional diffusion model incorporates text prompts at each step during the subsequent backward optimization process. c The zero-shot prompt-driven molecule optimization task involves modifying the input molecule in response to a given text prompt related to physicochemical properties. 3DToMolo necessitates the overall optimization of both 2D and 3D features of molecules, ensuring a balanced alignment with the input molecule and the text prompt which is achieved by the conditional diffusion model, shown as (b). d Molecule optimization under structural constraints. This task further enhances the similarity to the input molecule by retaining essential structural features. e Molecule optimization under appointed sites. Given the precise position within the input molecule, 3DToMolo aims to optimize molecule by offering strategies for atoms and the bonds connected with the site