- Research
- Open access
- Published:
M3S-GRPred: a novel ensemble learning approach for the interpretable prediction of glucocorticoid receptor antagonists using a multi-step stacking strategy
BMC Bioinformatics volume 26, Article number: 117 (2025)
Abstract
Accelerating drug discovery for glucocorticoid receptor (GR)-related disorders, including innovative machine learning (ML)-based approaches, holds promise in advancing therapeutic development, optimizing treatment efficacy, and mitigating adverse effects. While experimental methods can accurately identify GR antagonists, they are often not cost-effective for large-scale drug discovery. Thus, computational approaches leveraging SMILES information for precise in silico identification of GR antagonists are crucial, enabling efficient and scalable drug discovery. Here, we develop a new ensemble learning approach using a multi-step stacking strategy (M3S), termed M3S-GRPred, aimed at rapidly and accurately discovering novel GR antagonists. To the best of our knowledge, M3S-GRPred is the first SMILES-based predictor designed to identify GR antagonists without the use of 3D structural information. In M3S-GRPred, we first constructed different balanced subsets using an under-sampling approach. Using these balanced subsets, we explored and evaluated heterogeneous base-classifiers trained with a variety of SMILES-based feature descriptors coupled with popular ML algorithms. Finally, M3S-GRPred was constructed by integrating probabilistic feature from the selected base-classifiers derived from a two-step feature selection technique. Our comparative experiments demonstrate that M3S-GRPred can precisely identify GR antagonists and effectively address the imbalanced dataset. Compared to traditional ML classifiers, M3S-GRPred attained superior performance in terms of both the training and independent test datasets. Additionally, M3S-GRPred was applied to identify potential GR antagonists among FDA-approved drugs confirmed through molecular docking, followed by detailed MD simulation studies for drug repurposing in Cushing’s syndrome. We anticipate that M3S-GRPred will serve as an efficient screening tool for discovering novel GR antagonists from vast libraries of unknown compounds in a cost-effective manner.
Introduction
The glucocorticoid receptor (GR), a widely expressed transcription factor activated by ligands and part of the nuclear receptor superfamily, mediates various biological processes. These processes include gluconeogenesis, inflammation, immunity, bone metabolism, cardiovascular function, overall homeostasis and development, and brain function [1]. It is essential for survival, as mice with a disrupted GR cannot survive postpartum due to multiple defects [2]. Cortisol, the natural hormone that binds to GR, is secreted by the adrenal glands and regulated by adrenocorticotropic hormone (ACTH). Cushing syndrome results from excessive glucocorticoid exposure, causing significant morbidity and mortality. It can develop from corticosteroid administration (exogenous) or uncontrolled cortisol hypersecretion, whether ACTH-dependent or independent (endogenous) [3, 4]. Presently, there are two medical treatments for endogenous Cushing’s syndrome that have received approval from the US Food and Drug Administration (FDA). The first approved medical therapy is mifepristone, a nonselective GR antagonist, designated for adult patients with glucose intolerance or type 2 diabetes mellitus coupled with Cushing’s syndrome and are either ineligible for surgery or have undergone unsuccessful surgery [5]. The second approved medical therapy is pasireotide, characterized as an agonist of the somatostatin receptor. It is approved specifically for those diagnosed with a subset of Cushing’s syndrome called Cushing’s disease, when pituitary surgery is not a viable option or has proven ineffective [6].
As previously mentioned, Cushing’s syndrome is caused by excessive cortisol activity, leading to severe symptoms like excess trunk fat, thin arms and legs, rounded face, and a fatty hump between the shoulders. Patients often experience diabetes, hypertension, skin issues, and psychiatric disturbances [7, 8]. Moreover, elevated cortisol levels are linked to a heightened risk of cardiovascular events such as myocardial infarction, cerebrovascular events like sepsis, thromboembolism, and stroke, leading to a greater mortality risk compared to the general population [9,10,11]. Mifepristone effectively alleviates the clinical effects of elevated cortisol by acting as a GR antagonist, improving patients’ overall condition [5]. However, it does not reduce cortisol production and has drawbacks due to its non-selectivity. Its strong affinity for the progesterone receptor (PR) can cause pregnancy termination and issues like irregular vaginal bleeding or endometrial thickening in some patients [12]. Currently, a Phase 3 clinical trial for GR antagonist, Relacorilant is under evaluation (NCT02804750). Consequently, there is an ongoing need to discover new GR antagonists with diverse properties suitable for therapeutic purposes in various diseases involving GR signaling. However, the conventional drug discovery process is known for being prolonged and time-intensive. To expedite this process, machine learning (ML)-based approaches have proven to be highly effective. Moreover, researchers are exploring diverse computer-assisted approaches for GR drug design, including the prediction of quantitative structure–activity relationship (QSAR) for GR using models based on ML [13,14,15,16], deep learning [17, 18], molecular docking [19,20,21,22], molecular dynamic simulations [23,24,25,26], and pharmacophore analysis [25, 27, 28], among others.
Here, we present a novel stacked ensemble learning approach, named M3S-GRPred, designed to rapidly and accurately discover novel GR antagonists. The major contributions of the proposed strategy can be summarized as follows: (i) We developed a multi-step stacking strategy (M3S) to develop M3S-GRPred for solving the data imbalance problem (i.e., 1,314 active compounds and 275 inactive compounds). Unlike the conventional stacking strategy, the M3S-GRPred method employed an under-sampling approach to construct different balanced subsets. Using these balanced subsets, we evaluated and compared the performance of various base-classifiers trained with five SMILES-based feature descriptors (i.e., AP2DC, CDKExt, FP4C, MACCS, and Pubchem) coupled with popular ML algorithms (i.e., KNN, MLP, PLS, RF, SVM, and XGB). All the base-classifiers were employed to generate probabilistic features (PFs) based on the ten-fold cross-validation procedure. Finally, we utilized a two-step feature selection to optimize these PFs and determine the best feature subset for constructing the final ensemble learning model using stacking strategy; (ii) M3S-GRPred is the first SMILES-based stacked model for the identification of GR antagonists; (iii) Experimental results showed that the M3S method not only addresses the data imbalance problem, but also achieves more stable and accurate identification of GR antagonists. Specifically, as indicated by the independent test, M3S-GRPred outperformed several traditional ML classifiers, achieving a balanced accuracy (BACC) of 0.891, Matthews correlation coefficient (MCC) of 0.658 and, area under the receiver-operating curves (AUC) of 0.953; and (iv) The proposed M3S-GRPred was applied to identify important features for GR antagonists and to determine FDA-approved drugs that could potentially act as GR antagonists using molecular docking and MD simulation studies. As a result, M3S-GRPred identified two FDA-approved drugs as potential GR antagonists.
Materials and methods
Training and independent test datasets
In this study, compounds were sourced from the ChEMBL database (Target: GR; ID: CHEMBL2034) [29]. Initially, 13,227 compounds exhibiting activity towards GR were retrieved and subjected to data curation using our in-house code in the R programming environment [30]. During this process, compounds with ‘=’ symbols in their “Standard.Value” column were retained, while those with symbols such as ‘<’, ‘>’, or ‘/’ were excluded. Redundant and missing data points were also eliminated. Subsequently, the dataset was refined by selecting compounds with quantitative IC50 bioactivity values obtained from functional and cell-based assays relevant to GR activity, resulting in a final dataset of 1632 compounds. To enhance data clarity and enable comparisons of drug potency at the equimolar concentrations, these compounds were converted into their pIC50 values (negative logarithm base 10 of IC50 in Molar concentration) and further processed according to our previous works [31,32,33,34]. Consequently, the final dataset comprised 1314 active compounds and 275 inactive compounds. Among these, we applied 75% and 25% of all the active and inactive compounds to construct the training and independent test datasets, respectively.
Feature extraction methods
Feature extraction involves characterizing molecules of interest through quantitative and qualitative descriptors that encompass their structural composition, connectivity, and physicochemical traits [28]. In our study, we implemented data preprocessing steps using the PADEL-descriptor software [35], which included eliminating salts, removing duplicate data, and standardizing tautomers. Following this preprocessing phase, we utilized the SMILES notation of the compounds under analysis to generate molecular fingerprints. A total of five distinct fingerprint types were employed in this research, including AP2DC, CDKExt, FP4C, MACCS, and Pubchem [36,37,38,39,40]. For a detailed description of each fingerprint, please refer to Table 1. All calculations related to molecular fingerprint generation were conducted within the R [30] programming environment.
Overall framework of M3S-GRPred
M3S is a novel multi-step stacking strategy, which is developed for solving the data imbalance issue by leveraging the stacking strategy coupled with the under-sampling technique [41, 42]. Recently, this approach was successfully applied to the interpretable identification of IL-6 inducing peptides [41] and allergenicity of chemical compounds [42]. Here, we applied the M3S strategy for discovering novel GR antagonists. The overall framework of M3S-GRPred is divided into the following steps (Fig. 1), which include: (i) preparing the balanced training datasets; (ii) constructing base-classifiers; and (iii) optimizing meta-classifiers.
In the first step, balanced training subsets (BTS) were established using the under-sampling approach on the original training dataset, which initially contained 790 positives and 159 negatives. Given the 5:1 ratio between positives and negatives, we under-sampled from the positive samples five times to create five balanced training subsets (i.e., BTS1–BTS5). Importantly, there is no overlap among these five positive subsets. Utilizing all five positive subsets allows us to maximize the utility of the information provided by the active compounds. In the second step, for each balanced training subset, 30 base-classifier were constructed based on six ML algorithms (i.e., KNN, MLP, PLS, RF, SVM, and XGB) in conjunction with five molecular descriptors (i.e., AP2DC, CDKExt, FP4C, MACCS, and Pubchem). These six ML algorithms are widely applied in research related to drug discovery and development [43,44,45]. By utilizing all balanced training subsets, we obtained a total of 150 well-trained base-classifiers. All base-classifiers were built and optimized using the caret package in the R programming environment [46], with optimal parameters determined through ten-fold cross-validation (Supplementary Table S2).
At the final step, a probability feature vector (PFV) was established using probability scores from the 150 base-classifiers for being GR antagonists [41, 47,48,49]. For a given compound C, its PFV can be represented by:
where \({\text{P}(\text{ML}}_{\text{i}}, {\text{MD}}_{\text{j}},{\text{BTS}}_{k}),\) denotes the probability score derived from the base-classifier trained with the ith ML algorithm and the jth feature encoding over the kth balanced training subset. Thus, the PFV of the compound C could be represented by a 150-D probabilistic feature vector. To enhance the learning accuracy, a two-step feature selection strategy was employed to determine the best feature subset containing m useful PFs to develop the proposed model. The two-step feature selection implementation herein is the same as applied in our previous studies [45, 50, 51]. In this strategy, all PFs were initially ranked based on their RF-based mean decrease of Gini index (MDGI). Fifteen feature subsets containing m top-ranked importance PFs were generated, where m = 10, 20, 30,…, 150. Subsequently, each feature subset was used to train 15 different SVM-based meta-classifiers independently. The feature subset yielding the highest cross-validation Matthews Correlation Coefficient (MCC) was selected as the best feature subset.
Performance evaluation
Herein, we employed two standard evaluation strategies to assess the robustness and generalization ability of the prediction models, including ten-fold cross-validation and independent tests. In the meanwhile, six performance measures: MCC, ACC, AUC, sensitivity (SN), and specificity (SP) were used to evaluate the performance of the prediction models [52, 53]. These performance measures are defined as
where the numbers of correctly predicted positive and negative samples were referred to as TP and TN, respectively. On the other hand, the numbers of falsely predicted positive and negative samples are referred to as FP and FN, respectively [41, 42, 47, 49, 54, 55].
Molecular docking study of FDA-approved drugs
A library of 2735 FDA-approved small molecule drugs was obtained from the DrugBank database (version 5.1.10; released on January 4, 2023). After eliminating inorganic compounds, salt, SMILES with explicit valence, disconnected SMILES representations, and duplicates, the number of compounds was reduced to 1737. Molecular descriptors were computed for these 1737 compounds, and used as input for predictions with our model. To aid in drug repurposing efforts, the top 30 compounds identified by our model were then subjected to docking analysis using the 3D structure of GR (PDB ID: 1NHZ) obtained from the Protein Data Bank (https://www.rcsb.org/) and prepared for docking in UCSF Chimera X [56]. The refined structure underwent energy optimization and minimization using the OPLS3 force field with 5000 steps of steepest descent (step size = 0.02 Å) and 500 steps of conjugate gradient (step size = 0.02 Å). A receptor grid box was constructed using MGLTools 1.5.7 [57], with dimensions X = 56, Y = 64, and Z = 62, centered at coordinates − 5.31 Å, 14.112 Å, and 5.61 Å based on active site residues within the GR binding pocket. This coordinate space was utilized as the docking site. To validate the docking approach, the co-crystallized ligand (mifepristone; RU486) in the crystal structure was re-docked into the active site of GR to assess the ability of the docking method to replicate the native conformation of the inhibitor. The docking run employed 20 binding modes, an energy range of 4, and an exhaustiveness of 32. Docked complexes were visualized using PyMOL [58].
Molecular dynamics (MD) simulations
The docked conformation of four screened compounds (i.e., azelastine (AZE), metergoline (MET), perampanel (PER), and pirenzepine (PIR)) bound to the GR, along with the GR/MF (Mifepristone) crystal structure [59], were used as initial structures for MD simulations using AMBER22 [60] with periodic conditions as detailed in previous studies [24, 61,62,63]. The protonation state of protein receptors was determined using the PROPKA in the PDB2PQR webserver [64]. The electrostatic potential (ESP) charges for optimized drugs were derived at the HF/6-31(d) theory level and incorporated into restrained ESP (RESP) charges through the ANTECHAMBER module of AmberTools21 [60]. The AMBER force fields ff19SB and GAFF2 were applied for the protein and drug, respectively. Missing hydrogen atoms were added using the tLEaP module. The system was neutralized with counterions and immersed in the TIP3P water model in the octahedral box [65] extended at least 10 Å from the protein surface. Structural minimization was performed with 1500 steps of steepest descent (SD) followed by conjugated gradient (CG) methods. Subsequently, MD simulations with a 2-fs time step were executed, with nonbonded interactions limited of 10 Å, and long-range electrostatic interactions [66] treated using the Particle Mesh Ewald (PME) summation approach. Pressure and temperature were controlled, and covalent bonds involving hydrogen atoms were constrained using the SHAKE methodology [67]. The models were heated from 10 to 310 K for 100 ps and maintained at 310 K for 100 ns, with three replicates using different velocities. The CPPTRAJ module was used to analyze all-atom root mean square deviation (RMSD), intermolecular hydrogen bonds, and contact atoms between the drug and protein over the production phase [68]. Binding affinity of the drug/protein complex for each simulation was estimated using Molecular mechanics with the Generalized-Born (MM/GBSA) [69] or Poisson-Boltzmann (MM/PBSA) [70] surface area solvation calculations, using 100 snapshots from the last 20 ns. Drug-ligand interactions were. Visualized with LigandScout 4.4.8 [71].
Results and discussion
Chemical space analysis
Chemical space analysis in drug discovery aims to understand the distribution of chemical compounds, categorized as active and inactive, based on their physicochemical properties. This involves examining various descriptors such as molecular weight (MW), octanol–water partition coefficient (AlogP), hydrogen bond acceptor count (HBA), hydrogen bond donor count (HBD), topological polar surface area (TPSA), and rotatable bond count (nRotB). Lipinski’s Rule of Five (Ro5) sets criteria for determining if a compound is orally active, with parameters like ALogP < 5, MW < 500, HBD < 5, and HBA < 10 [72]. The chemical spaces, based on physicochemical properties related to the Ro5 and Veber’s rule (i.e., nRot < 10 and TPSA < 140 Å2), were analyzed and depicted in Fig. 2. The findings reveal that the majority of compounds in the active group exhibit a MW ranging from 400 to 600 Da, while those in the inactive group are clustered within the 300–500 Da range (Fig. 2A). Similarly, the AlogP values for both active and inactive groups (Fig. 2B) depict compound density within the ranges of 4–6 and 3–5.5, respectively. Although these properties exhibit slight differences, they are statistically significant, as determined by the Mann–Whitney U Test with a p-value of < 0.05. The HBA and HBD parameters delineate the hydrogen bonding capacity of the compounds. Our results indicate that the majority of compounds in both classes meet the Ro5 criteria for HBA and HBD (Fig. 2C, D), although the differences between the groups are not statistically significant for the HBA property. Furthermore, the active and inactive compounds exhibit maximum ranges of 50–100 and 60–80 Å2 for TPSA, and 3–8 and 2–7 for nRotB, respectively (Fig. 2E, F). The differences between the classes for both properties are statistically significant, with p-values of < 0.05. Thus, it can be inferred that the statistical significance between these groups can provide insights into the relationship between the active compounds and their biological activity as inhibitors. This information can be valuable in guiding the design and optimization of new drug candidates.
To ensure that the independent test dataset is sufficiently distinct from the training dataset to avoid overestimation of model performance, we performed a scaffold analysis using the Bemis-Murcko framework [73] to demonstrate the number of unique scaffolds found in either the training dataset or independent test dataset. The scaffold analysis results showed that there are 50.23% of unique scaffolds found in the independent test dataset (detailed in Supplementary Table S1). In addition, to confirm the generalization ability of the proposed model, we computed the Tanimoto similarity coefficient between compound pairs in the training and independent test datasets based on ECFP4 fingerprints. As can be seen from Supplementary Figures S1-S2, the heatmap is predominantly blue, while the average Tanimoto similarity coefficient was 0.135 and 98.33% of compound pairs in the training and independent test datasets exhibited a Tanimoto similarity coefficient of less than 0.5, indicating low similarity between the training and independent test datasets.
Effect of the under-sampling method on prediction performance
This section investigated two different comparative scenarios. In the first scenario, we compared the performance of various ML classifiers trained with six ML algorithms coupled with five molecular descriptors on the imbalanced dataset. In the second scenario, these ML classifiers were trained and tested on the five balanced training subsets (i.e., BTS1–BTS5). To assess the contributions of molecular descriptors, ML algorithms, and under-sampling technique, six performance measures including BACC, AUC, SN, SP, MCC, and ACC achieved by the ML classifiers were evaluated and compared using ten-fold cross-validation and independent tests. As mentioned above, ML classifiers achieving the highest cross-validation MCC were regarded as the best-performing classifiers. The experimental results of these two comparative scenarios are provided in Fig. 3 and Tables 2 along with Supplementary Tables S3-S9.
As can be seen from Supplementary Table S3, it can be noticed that there is no ML classifier trained with the imbalanced dataset achieving a cross-validation MCC value greater than 0.5. The top-five ML classifiers in this case were SVM-CDKExt, RF-CDKExt, RF-Pubchem, XGB-FP4C, and XGB-Pubchem with corresponding MCC scores of 0.431, 0.425, 0.424, 0.418, and 0.414, respectively (Fig. 3). In the meanwhile, on the independent test dataset, their performance remained unsatisfactory, with corresponding MCC scores of 0.418, 0.423, 0.429, 0.413, and 0.464, respectively (Supplementary Table S4). This indicated that the imbalanced dataset could be detrimental to model performance. Therefore, we were motivated to employ the under-sampling technique to improve model performance.
In case of the models trained with the balanced training subsets, it was observed that all of top-50 ML classifiers provided cross-validation MCC scores greater than 0.5 (Supplementary Table S5). These results demonstrate that the performance of the models trained based on balanced training subset are better than that of the imbalanced training dataset. We noticed that the top-five ML classifiers, achieving cross-validation MCC scores of 0.563, 0.563, 0.560, 0.555, and 0.554, were RF-Pubchem_BTS1, SVM-Pubchem_BTS4, XGB-FP4C_BTS5, SVM-CDKExt_BTS2, and PLS-Pubchem_BTS1, respectively (Fig. 4). Among these classifiers, two classifiers were able to attain MCC scores of 0.508 (RF-Pubchem_BTS1) and 0.523 (SVM-Pubchem_BTS4) as judged the independent test (Supplementary Table S6). To characterize the performance of the models, we calculated the average cross-validation and independent test MCC scores over the five balanced datasets with respect to each ML classifier. As shown in Table 2 and Supplementary Table S7, the top-five ML classifiers, achieving average cross-validation MCC scores of 0.526, 0.524, 0.521, 0.510, and 0.509 are RF-Pubchem, XGB-FP4C, SVM-AP2DC, SVM-CDKExt, and PLS-Pubchem, respectively. To confirm the effectiveness of the under-sampling technique, we compared the performance of the best-performing models trained with balanced (RF-Pubchem_BTS1) and imbalanced (SVM-CDKExt) training datasets. Additionally, we compared the performance of the top-five ML classifiers trained with their balanced datasets to that of the top-five ML classifiers trained with the imbalanced dataset. As shown in Supplementary Table S8, it is apparent that the top-five ML classifiers trained with their balanced datasets perform better than those trained with their imbalanced dataset in terms of BACC and MCC over the training subset. Furthermore, in the independent test, RF-Pubchem_BTS1 outperformed SVM-CDKExt, achieving a BACC of 0.823, MCC of 0.508, and AUC of 0.883 (Supplementary Table S9). This confirms again that the under-sampling technique is beneficial for enhancing model performance. Therefore, we utilized all the ML classifiers trained with the balanced dataset to construct our proposed models in the following studies.
Construction of M3S-GRPred
Generally, it is straightforward to select the best ML classifiers among various ML classifiers trained with different ML algorithms and molecular descriptors. However, the predictive ability of single-feature-based models might not be robust enough [41, 47,48,49, 51, 55, 74]. To deal with the limitation arising from the single-feature-based models, we employed our powerful M3S method to develop the stacked ensemble learning model. Our stacked ensemble learning model was developed based on SVM method (referred to be mSVM) trained with the 150-D probabilistic feature vector. In addition, we applied the two-step feature selection method to identify m useful PFs. In this feature selection method, we initially ranked the PFs based on MDGI scores and generated 15 feature subsets containing m top-ranked informative PFs. After that, all the 15 feature subsets were used to develop different mSVM and their performance was assessed over both the cross-validation and independent tests. As shown in Supplementary Table S10, it is apparent that the feature subsets containing 140 and 150 top-ranking PFs achieved better performance than other feature subsets in term of MCC over the cross-validation test, which are referred as PFV and PFV_FS herein, respectively. The performance evaluation results of PFV and PFV_FS are recorded in Table 3. As can be seen, PFV and PFV_FS achieve similar cross-validation MCC scores of 0.713 and 0.708, respectively. On the independent test dataset, the MCC, SN, and BACC of PFV_FS were 0.658, 0.928, and 0.891, which were 1.48, 1.45, and 0.880% higher than PFV, respectively, demonstrating the effectiveness and robustness of PFV_FS. Therefore, we utilized PFV_FS to develop the final stacked ensemble learning model (M3S-GRPred).
M3S-GRPred outperforms several traditional machine learning-based classifiers
In this section, we compared the performance M3S-GRPred with its constituent base-classifiers trained with balanced and imbalanced training subsets to demonstrate the advantage of the M3S strategy in overcoming the data imbalance problem and attaining the performance improvement and robustness. In the first comparative experiment, we compared M3S-GRPred with the top-five base-classifiers trained with different balanced training subsets. As mentioned above, the top-five base-classifiers in this case were RF-Pubchem_BTS1, SVM-Pubchem_BTS4, XGB-FP4C_BTS5, SVM-CDKExt_BTS2, and PLS-Pubchem_BTS1. Figure 5A, B and Table 4 shows that M3S-GRPred exhibited better performance than the compared base-classifiers over the ten-fold cross-validation and independent tests. Specifically, on the independent test dataset, M3S-GRPredachieved remarkable improvements of 5.62–14.89, 13.46–27.12, and 6.95–13.33% in terms of BACC, MCC, and AUC, respectively. These results demonstrate that M3S-GRPredattained high accuracy and stability in the identification of GR antagonists. In addition, to determine whether our proposed framework can address the imbalanced data problem, we compared M3S-GRPredwith the top-five base-classifiers trained with the imbalanced training dataset. As illustrated in Fig. 5C, D and Table 5, significant improvements in prediction performances were observed across all six measures on both the independent test and training datasets. Specifically, our proposed model significantly improved BACC by 9.94–12.52%, MCC by 19.35–24.51%, and AUC by 8.57–11.26%. Altogether, these results indicate that our proposed framework used to develop M3S-GRPrednot only addresses the data imbalance problem, but also effectively leverages heterogeneous models to achieve more stable and accurate identification of GR antagonists.
Performance comparison of M3S-GRPred and several conventional ML over the training (A, C) and independent (B, D) datasets. A, B ACC, BACC, SN, SP, MCC, and AUC of M3S-GRPred and top-five ML classifiers trained with the imbalanced datasets. C, D ACC, BACC, SN, SP, MCC, and AUC of M3S-GRPred and top-five ML classifiers trained with balanced training subsets
Model interpretation and feature importance analysis
To gain deeper insight into specific substructural elements potentially responsible for antagonistic effects against GR, we employed the RF classifier to determine and rank the feature importance based on the MDGI [43,44,45, 50, 75]. Since the top-three base-classifiers were developed using BTS1, BTS4, and BTS5, we performed feature importance analysis using RF classifiers couple with an interpretable feature descriptor (i.e., Pubchem) on these three balanced training subsets. The important features identified in BTS1, BTS4, and BTS5 are summarized in Table 6. Features with the highest MDGI scores are considered most important for GR antagonist identification. Consequently, we selected the top-20 important features found in all three balanced training subsets for detailed feature importance analysis, as summarized in Table 6. Taking Pubchem568 as an example, its MDGI scores (ranks) based on BTS1, BTS4, and BTS5 were 3.23(1), 1.22(8), and 1.46(9), respectively.
From our analysis, it was highlighted that the top feature in BTS4 and BTS5 were the same, i.e., Pubchem799 (2.59(3), 4.65(1), 3.57(1)) which pertains to 3-methylcyclohexane-1-thiol. This compound is an alkylthiol, meaning that an alkyl group (i.e., methylcyclohexane) is attached to a sulfhydryl group. In addition, Pubchem736 (1.65(9), 2.35(2), 1.93(3)) and Pubchem778 (1.67(8), 1.15(10), 1.62(6)), corresponding to 3-methylbenzenethiol and 4-methylcyclohexane-1-thiol, respectively, are among the top 10 features containing alkylthiol substructures. The thiol (− SH) functional group is found in numerous drug compounds, imparting a unique combination of useful properties. Thiol-containing drugs act as antioxidants by neutralizing radicals and other harmful electrophiles, replenishing cellular thiol pools, and forming stable complexes with heavy metals like arsenic, copper, and lead [76]. In addition, thiol-based drugs are classified as mucolytics due to their ability to lower the thickness and flexibility of bronchial secretions by breaking down disulfide bonds in proteins [77]. A recent study by Khanna et al. [78], explored the antiviral and anti-inflammatory effects of thiol-based drugs in Covid-19. The authors observed that in vivo treatment with thiol drugs (i.e., cysteamine) exerted anti-inflammatory effects and reduced SARS-CoV-2-induced lung inflammation and injury. Moreover, in vitro assays showed that multiple thiol drugs were capable of inhibiting the binding of SARS-CoV2 spike protein to its receptor thereby, inhibiting viral infection [78]. Taken together, the anti-inflammatory and antioxidant properties of thiol-based drugs could be beneficial for treating Cushing’s syndrome, as GR is associated with the inflammation pathway.
Pubchem568 (3.23(1), 1.22(8), 1.46(9)) corresponds to propionitrile, an aliphatic nitrile, polar aprotic solvent, and a natural product found in the Apis species [79]. Additionally, propionitrile serves as a precursor for diarylpropionitrile (DPN), which exhibits strong ERβ agonist properties [80]. Furthermore, DPN has demonstrated antidepressant- and anxiolytic-like effects in animals by activating the endogenous oxytocin system, the body’s natural mechanism for managing stress and promoting well-being [81]. A study by Thangnipon et al., assessed the neuroprotective effects of DPN against oxidative stress in human neuroblastoma cells and concluded that DPN could be beneficial for protecting against neurodegenerative diseases [82]. Several studies have also investigated the role of DPN in breast cancer inhibition [83, 84]. Given that GR belongs to the same superfamily as ERβ (i.e., steroid nuclear receptors), exploring the effects of DPN on GR is worthwhile.
Pubchem340 (1.34(12), 1.08(15), 1.23(14)) and Pubchem336 (1.13(18), 1.16(9), 1.16(17)) corresponding to isopropylamine and 2-methylpropan-2-amine (i.e., tert-Butylamine), respectively are primary aliphatic amines found in the top-ten common features. These compounds serve as prodrug moieties in medicinal chemistry, particularly valued for their efficient drug release capabilities with both small and long-chain aliphatic amines [85]. Moreover, the substitution of aliphatic amines in GR modulators have been patented (US20060154973A1 and WO2005090336A1), exploring a new class of non-steroidal compound for treating GR-associated diseases. Isopropylamine (Pubchem340), when combined with 2-chloro-1-(3,4-dihydroxyphenyl)ethan-1-one, serves as a precursor for sympathomimetic β-adrenoreceptor drugs such as isoprenaline and metaproterenol, used in COPD and asthma treatment [86]. Similarly, the synthesis of salbutamol, a short-acting β2 adrenergic agonist used to treat asthma and COPD, begins with the acylation of salicylaldehyde to form a α-bromo acetophenone derivative. This intermediate then reacts with tert-butylamine (Pubchem336) in isopropanol [86]. The tert-butyl group from tert-butylamine is highly lipophilic and introduces steric hindrance in the inhibitor molecule, which may advantageously block specific binding sites and prevent interaction of the GR with endogenous glucocorticoids or other ligands. Thus, the aliphatic amine functionality of these features could potentially form hydrogen bonds or participate in other non-covalent interactions with the GR.
Case study: prospective GR inhibitors from FDA-approved drugs
Identification of potential GR inhibitors
In this section, we firstly employed M3S-GRPred for virtual screening, where the model was used to estimate the probability score for each FDA-approved compound from DrugBank. Secondly, the top 30 compounds with the highest probability scores were selected and subjected to molecular docking for predicting binding affinity to GR. Thirdly, among these, four potential compounds were selected for MD simulations to examine ligand stability in dynamic conditions using free energy calculations. (details in the Materials and Methods section). Supplementary Table S11 lists the top compounds with their probabilities, corresponding docking scores, and inhibitor target sites. Notably, the top-four compounds with the highest docking scores are all inhibitors of PR or related nuclear receptors, which have demonstrated cross interactions with GR. Therefore, in efforts for drug repurposing, these inhibitors were not considered for further MD simulations. The selected four compounds were elucidated to be azelastine (AZE), metergoline (MET), perampanel (PER), and pirenzepine (PIR), with docking scores of − 9.3, − 8.8, − 8.7, and − 8.3 kcal/mol, respectively. These compounds, along with the reference co-crystal compound (i.e., Mifepristone (MF), − 11.4 kcal/mol), which is the only FDA-approved GR antagonist, underwent further evaluation through MD simulations.
Structural dynamics and binding affinity of screened drugs against GR
The stability of ligand-binding was assessed using plots of root mean square deviation (RMSD), the number of intermolecular hydrogen bonds (# H-bonds), and the number of atom contacts (# atom contacts) against simulation time. Figure 6 and Supplementary Figures S3 and S4 illustrate that all atoms in the five drug/GR complex systems exhibited high fluctuation during MD simulations; however, the binding sites of all five systems showed less variability. Based on this observation and the plots of # H-bonds and # atom contacts, all simulated systems reached equilibrium at 40 ns. Therefore, in this study, snapshots from the final 20 ns were selected for further analysis in terms of binding free energy and drug/protein interactions, as depicted in Figs. 6 and 7, respectively.
All-atom RMSD, # H-bonds, and # atom contacts of mifepristone (MF) and the selected drugs such as, azelastine (AZE), perampanel (PER), metergoline (MET), and pirenzepine (PIR) in complex with GR target plotted over a 100-ns duration of run1-MD simulations. The corresponding data for run2 and run3 simulations can be found in Supplementary Figures S1 and S2, respectively. The values shown in the figure are the averages calculated from the last 20 ns of three independent MD simulations
The crucial interactions of MF, AZE, PER, MET and PIR in complex with GR are depicted in 2D and 3D pharmacophore models, along with the representative pharmacophore models (RPMs) analyzed from the last 20 ns of MD simulations. The green arrow, red arrow, and yellow circle represent the pharmacophore features of hydrogen bond donor (HBD), hydrogen bond acceptor (HBA), and hydrophobic interaction, respectively
As observed in Fig. 6, each focused drug formed 1.6–1.8 H-bonds with the GR protein target, except for PIR (0.7). Meanwhile, the # atom contacts reached ~ 25 in the complexes bound with AZE and PER, which was higher compared to the reference drug MF (~ 18). Therefore, van der Waals (vdW) interactions could serve as the predominant contributor for drug binding. Interestingly, the # atom contacts in the rest of the complexes (~ 14) were somewhat lower than those in the MF system. Additionally, the ΔGbind values obtained from MM/GBSA for all systems followed a similar trend to those from MM/PBSA, as shown in Supplementary Figure S5. AZE and PER exhibited binding energies with GR in the same magnitude as MF. Furthermore, the # H-bonds in MF/GR complex (2.5 ± 0.2, Fig. 6) involving Q570 and R611 (25% and 18%, Fig. 7) corresponded with previous findings [59], slightly exceeding those in AZE (1.8 ± 0.4) with C736 (30%), PER (1.7 ± 0.6) with Q738 (27%), MET (1.6 ± 0.3) with C736 (26%), and PIR (0.7 ± 0.2) with M601 (19%). However, the binding strengths of the potential drug candidates AZE and PER, were compensated by their higher # atom contacts (26.2 ± 1.3) and (25.4 ± 1.2), respectively, compared to MF (18.4 ± 1.6).
According to Fig. 7, both effective drugs showed potential hydrophobic interactions with residues L566 (98%), M604 (94%), A605 (95%), A607 (80%), L608 (99%), L733 (46%), and F740 (47%), whereas MF interacted only with M464 (90%), L563 (48%), M601 (81%), M639 (27%), and L732 (8%) through hydrophobic interactions. Additionally, the MM energies confirmed that vdW interactions, assessed by MM/GBSA and MM/PBSA, are the primary force driving molecular complexation, with values of − 61.1 ± 0.4 and − 61.3 ± 0.4 kcal/mol, respectively, for AZE, as well as − 61.1 ± 0.4 and − 60.7 ± 0.4 kcal/mol, respectively, for PER (Supplementary Table S12). Taken together, the results of ligand–protein interactions, coupled with the obtained binding affinities, suggest that AZE and PER could potentially serve as GR antagonist similar to MF.
Conclusions
We have developed M3S-GRPred, a novel ensemble learning framework utilizing the stacking strategy to rapidly and accurately discover novel GR antagonists using only SMILES information. M3S-GRPred first constructs balanced training subsets via under-sampling, then employs these subsets to train heterogeneous base classifiers with various SMILES-based feature descriptors and ML algorithms. The final model integrates probabilistic outputs from these base classifiers. To reveal the effectiveness of the proposed M3S-GRPred model, we compared its performance with several conventional ML classifiers over the ten-fold cross-validation and independent tests. Our comparative results shows that M3S-GRPred significantly outperforms conventional ML classifiers, achieving a balanced accuracy of 0.891, MCC of 0.658, and AUC of 0.953, with improvements of 9.94–12.52%, 19.35–24.51%, and 8.57–11.26%, respectively, on an independent test dataset. It also successfully identified potential GR antagonists among FDA-approved drugs, confirmed through molecular docking and MD simulation studies for drug repurposing in Cushing’s syndrome. We anticipate that M3S-GRPred will be an efficient screening tool for discovering novel GR antagonists cost-effectively from vast libraries of unknown compounds.
Availability of data and materials
The molecular data used in this research were acquired from the ChEMBL database version 33 with target ID: CHEMBL2034 (https://www.ebi.ac.uk/chembl/search_results/CHEMBL2034). The case study dataset was acquired from the DrugBank database (version 5.1.10; released on January 4, 2023) of FDA-approved drugs (https://go.drugbank.com/). The implementation of this research and the R source codes are available at https://github.com/Shoombuatong/M3S-GRPred.
References
Hunt HJ, et al. Identification of the clinical candidate (R)-(1-(4-Fluorophenyl)-6-((1-methyl-1H-pyrazol-4-yl)sulfonyl)-4,4a,5,6,7,8-hexahydro-1H-pyrazolo[3,4-g]isoquinolin-4a-yl)(4-(trifluoromethyl)pyridin-2-yl)methanone (CORT125134): a selective glucocorticoid receptor (GR) antagonist. J Med Chem. 2017;60(8):3405–21.
Cole TJ, et al. Targeted disruption of the glucocorticoid receptor gene blocks adrenergic chromaffin cell development and severely retards lung maturation. Genes Dev. 1995;9(13):1608–21.
Steffensen C, Bak AM, Rubeck KZ, Jorgensen JO. Epidemiology of cushing’s syndrome. Neuroendocrinology. 2010;92(Suppl 1):1–5.
Savas M, Mehta S, Agrawal N, van Rossum EFC, Feelders RA. Approach to the patient: diagnosis of cushing syndrome. J Clin Endocrinol Metab. 2022;107(11):3162–74.
Fleseriu M, et al. Mifepristone, a glucocorticoid receptor antagonist, produces clinical and metabolic benefits in patients with Cushing’s syndrome. J Clin Endocrinol Metab. 2012;97(6):2039–49.
Hunt HJ, et al. 1H-Pyrazolo[3,4-g]hexahydro-isoquinolines as potent GR antagonists with reduced hERG inhibition and an improved pharmacokinetic profile. Bioorg Med Chem Lett. 2015;25(24):5720–5.
Brown DR, et al. Clinical management of patients with Cushing syndrome treated with mifepristone: consensus recommendations. Clin Diabetes Endocrinol. 2020;6(1):18.
Castinetti F, Morange I, Conte-Devolx B, Brue T. Cushing’s disease. Orphanet J Rare Dis. 2012;7:41.
Li D, El Kawkgi OM, Henriquez AF, Bancos I. Cardiovascular risk and mortality in patients with active and treated hypercortisolism. Gland Surg. 2020;9(1):43–58.
Dekkers OM, et al. Multisystem morbidity and mortality in Cushing’s syndrome: a cohort study. J Clin Endocrinol Metab. 2013;98(6):2277–84.
Fleseriu M, et al. Consensus on diagnosis and management of Cushing’s disease: a guideline update. Lancet Diabetes Endocrinol. 2021;9(12):847–75.
Cadepond F, Ulmann A, Baulieu EE. RU486 (mifepristone): mechanisms of action and clinical uses. Annu Rev Med. 1997;48:129–56.
Stanojevic M, Vracko M, Sollner Dolenc M. Development of in silico classification models for binding affinity to the glucocorticoid receptor. Chemosphere. 2023;336:139147.
Spreafico M, Ernst B, Lill MA, Smiesko M, Vedani A. Mixed-model QSAR at the glucocorticoid receptor: predicting the binding mode and affinity of psychotropic drugs. ChemMedChem. 2009;4(1):100–9.
Lewis DF, Ioannides C, Parke DV, Schulte-Hermann R. Quantitative structure-activity relationships in a series of endogenous and synthetic steroids exhibiting induction of CYP3A activity and hepatomegaly associated with increased DNA synthesis. J Steroid Biochem Mol Biol. 2000;74(4):179–85.
Shin SH, Hur G, Kim NR, Park JHY, Lee KW, Yang H. A machine learning-integrated stepwise method to discover novel anti-obesity phytochemicals that antagonize the glucocorticoid receptor. Food Funct. 2023;14(4):1869–83.
Matsuzaka Y, Uesawa Y. A deep learning-based quantitative structure-activity relationship system construct prediction model of agonist and antagonist with high performance. Int J Mol Sci. 2022;23(4):2141.
Matsuzaka Y, Uesawa Y. Molecular image-based prediction models of nuclear receptor agonists and antagonists using the deepsnap-deep learning approach with the Tox21 10K library. Molecules. 2020;25(12):2764.
Dey R, Roychowdhury P, Mukherjee C. Homology modelling of the ligand-binding domain of glucocorticoid receptor: binding site interactions with cortisol and corticosterone. Protein Eng. 2001;14(8):565–71.
Ray NC, et al. Discovery and optimization of novel, non-steroidal glucocorticoid receptor modulators. Bioorg Med Chem Lett. 2007;17(17):4901–5.
Pang JP, et al. Discovery of a novel nonsteroidal selective glucocorticoid receptor modulator by virtual screening and bioassays. Acta Pharmacol Sin. 2022;43(9):2429–38.
Hu X, et al. Discovery of novel non-steroidal selective glucocorticoid receptor modulators by structure- and IGN-based virtual screening, structural optimization, and biological evaluation. Eur J Med Chem. 2022;237:114382.
Alves NRC, Pecci A, Alvarez LD. Structural insights into the ligand binding domain of the glucocorticoid receptor: a molecular dynamics study. J Chem Inf Model. 2020;60(2):794–804.
Hu X, et al. Discovery of novel GR ligands toward druggable GR antagonist conformations identified by MD simulations and Markov state model analysis. Adv Sci (Weinh). 2022;9(3): e2102435.
Metin R, Akten ED. Drug repositioning to propose alternative modulators for glucocorticoid receptor through structure-based virtual screening. J Biomol Struct Dyn. 2022;40(21):11418–33.
Zare F, Solhjoo A, Sadeghpour H, Sakhteman A, Dehshahri A. Structure-based virtual screening, molecular docking, molecular dynamics simulation and MM/PBSA calculations towards identification of steroidal and non-steroidal selective glucocorticoid receptor modulators. J Biomol Struct Dyn. 2023;41(16):7640–50.
Onnis V, et al. Virtual screening for the identification of novel nonsteroidal glucocorticoid modulators. J Med Chem. 2010;53(8):3065–74.
Potamitis C, et al. Discovery of New non-steroidal selective glucocorticoid receptor agonists. J Steroid Biochem Mol Biol. 2019;186:142–53.
Mendez D, et al. ChEMBL: towards direct deposition of bioassay data. Nucleic Acids Res. 2019;47(D1):D930–40.
R.C. Team, R: a language and environment for statistical computing, 4.3.0 ed. Vienna, Austria: R Foundation for Statistical Computing, 2021.
Schaduangrat N, Anuwongcharoen N, Moni MA, Lio P, Charoenkwan P, Shoombuatong W. StackPR is a new computational approach for large-scale identification of progesterone receptor antagonists using the stacking strategy. Sci Rep. 2022;12(1):1–16.
Schaduangrat N, Anuwongcharoen N, Charoenkwan P, Shoombuatong W. DeepAR: a novel deep learning-based hybrid framework for the interpretable prediction of androgen receptor antagonists. J Cheminform. 2023;15(1):50.
Schaduangrat N, Homdee N, Shoombuatong W. StackER: a novel SMILES-based stacked approach for the accelerated and efficient discovery of ERalpha and ERbeta antagonists. Sci Rep. 2023;13(1):22994.
Schaduangrat N, Malik AA, Nantasenamat C. ERpred: a web server for the prediction of subtype-specific estrogen receptor antagonists. PeerJ. 2021;9: e11716.
Yap CW. PaDEL-descriptor: an open source software to calculate molecular descriptors and fingerprints. J Comput Chem. 2011;32(7):1466–74.
Carhart RE, Smith DH, Venkataraghavan R. Atom pairs as molecular features in structure-activity studies: definition and applications. J Chem Inf Comput Sci. 1985;25(2):64–73.
Durant JL, Leland BA, Henry DR, Nourse JG. Reoptimization of MDL keys for use in drug discovery. J Chem Inf Comput Sci. 2002;42(6):1273–80.
Kim S, et al. PubChem substance and compound databases. Nucleic Acids Res. 2016;44(D1):D1202–13.
Laggner C, SMARTS Patterns for Functional Group Classification, 2005.
Steinbeck C, Han Y, Kuhn S, Horlacher O, Luttmann E, Willighagen E. The chemistry development kit (CDK): an open-source java library for chemo- and bioinformatics. J Chem Inf Comput Sci. 2003;43(2):493–500.
Charoenkwan P, Chiangjong W, Nantasenamat C, Hasan MM, Manavalan B, Shoombuatong W. StackIL6: a stacking ensemble model for improving the prediction of IL-6 inducing peptides. Briefings in Bioinform. 2021;22(6):bbab172.
Charoenkwan P, Schaduangrat N, Manavalan B, and Shoombuatong W. M3S-ALG: Improved and robust prediction of allergenicity of chemical compounds by using a novel multi-step stacking strategy, Future Generation Comput Syst. 2024.
Malik AA, Phanus-umporn C, Schaduangrat N, Shoombuatong W, Isarankura-Na-Ayudhya C, Nantasenamat C. HCVpred: a web server for predicting the bioactivity of hepatitis C virus NS5B inhibitors. J Comput Chem. 2020;41(20):1820–34.
Schaduangrat N, Anuwongcharoen N, Charoenkwan P, Shoombuatong W. DeepAR: a novel deep learning-based hybrid framework for the interpretable prediction of androgen receptor antagonists. J Cheminform. 2023;15(1):50.
Schaduangrat N, Homdee N, Shoombuatong W. StackER: a novel SMILES-based stacked approach for the accelerated and efficient discovery of ERα and ERβ antagonists. Sci Rep. 2023;13(1):22994.
R.D.C. Team. R: a language and environment for statistical computing, (No Title), 2010.
Charoenkwan P, et al. AMYPred-FRL is a novel approach for accurate prediction of amyloid proteins by using feature representation learning. Sci Rep. 2022;12(1):7697.
Charoenkwan P, Nantasenamat C, Hasan MM, Moni MA, Manavalan B, Shoombuatong W. StackDPPIV: a novel computational approach for accurate prediction of dipeptidyl peptidase IV (DPP-IV) inhibitory peptides. Methods. 2022;204:189–98.
Charoenkwan P, Schaduangrat N, Moni MA, Manavalan B, Shoombuatong W. SAPPHIRE: a stacking-based ensemble learning framework for accurate prediction of thermophilic proteins. Comput Biol Med. 2022;146: 105704.
Shoombuatong W, Homdee N, Schaduangrat N, Chumnanpuen P. Leveraging a meta-learning approach to advance the accuracy of Nav blocking peptides prediction. Sci Rep. 2024;14(1):4463.
Ahmad S, et al. SCORPION is a stacking-based ensemble learning framework for accurate prediction of phage virion proteins. Sci Rep. 2022;12(1):4106.
Azadpour M, McKay CM, Smith RL. Estimating confidence intervals for information transfer analysis of confusion matrices. J Acoustical Soc Am. 2014;135(3):EL40–146.
Zhang D, et al. iCarPS: a computational tool for identifying protein carbonylation sites by novel encoded features. Bioinformatics. 2021;37(2):171–7.
Charoenkwan P, Chumnanpuen P, Schaduangrat N, Oh C, Manavalan B, Shoombuatong W. PSRQSP: an effective approach for the interpretable prediction of quorum sensing peptide using propensity score representation learning. Comput Biol Med. 2023;158: 106784.
Charoenkwan P, Schaduangrat N, Moni MA, Shoombuatong W, Manavalan B. Computational prediction and interpretation of druggable proteins using a stacked ensemble-learning framework. Iscience. 2022;25(9):104883.
Pettersen EF, et al. UCSF Chimera–a visualization system for exploratory research and analysis. J Comput Chem. 2004;25(13):1605–12.
Sanner MF. A component-based software environment for visualizing large macromolecular assemblies. Structure. 2005;13(3):447–62.
DW Schrödinger L, PyMOL, ed, 2020, p. PyMOL is an open source molecular visualization system
Kauppi B, et al. The three-dimensional structures of antagonistic and agonistic forms of the glucocorticoid receptor ligand-binding domain: RU-486 induces a transconformation that leads to active antagonism. J Biol Chem. 2003;278(25):22748–54.
Case DA, Aktulga HM, Belfon K, Ben-Shalom IY, Berryman JT, Brozell SR, Cerutti DS, et al. Amber 2022, ed. University of California, San Francisco, 2022
Sencanski M, et al. Identification of SARS-CoV-2 papain-like protease (PLpro) inhibitors using combined computational approach. ChemistryOpen. 2022;11(2): e202100248.
Yelshanskaya MV, Singh AK, Narangoda C, Williams RSB, Kurnikova MG, Sobolevsky AI. Structural basis of AMPA receptor inhibition by trans-4-butylcyclohexane carboxylic acid. Br J Pharmacol. 2022;179(14):3628–44.
Jana ID, et al. Targeting an evolutionarily conserved "E-L-L" motif in the spike protein to develop a small molecule fusion inhibitor against SARS-CoV-2, bioRxiv. 2022.
Dolinsky TJ, et al. PDB2PQR: expanding and upgrading automated preparation of biomolecular structures for molecular simulations. Nucleic Acids Res. 2007;35:W522-5.
Mark P, Nilsson L. Structure and dynamics of the TIP3P, SPC, and SPC/E water models at 298 K. J Phys Chem A. 2001;105(43):9954.
Chari R, Jerath K, Badkar AV, Kalonia DS. Long- and short-range electrostatic interactions affect the rheology of highly concentrated antibody solutions. Pharm Res. 2009;26(12):2607–18.
Ryckaert JP, Ciccotti G, Berendsen HJ. Numerical integration of the cartesian equations of motion of a system with constraints: molecular dynamics of n-alkanes. J Comput Phys. 1977;23(3):327–41.
Roe DR, Cheatham TE 3rd. PTRAJ and CPPTRAJ: software for processing and analysis of molecular dynamics trajectory data. J Chem Theory Comput. 2013;9(7):3084–95.
Genheden S, Ryde U. The MM/PBSA and MM/GBSA methods to estimate ligand-binding affinities. Expert Opin Drug Discov. 2015;10(5):449–61.
Cavalheiro JPDVH, Pires NMM, & Dong T. MM-PBSA: Challenges and opportunities, in 10th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI), Shanghai, China, 2017: IEEE.
Wolber G, Langer T. LigandScout: 3-D pharmacophores derived from protein-bound ligands and their use as virtual screening filters. J Chem Inf Model. 2005;45(1):160–9.
Lipinski CA. Drug-like properties and the causes of poor solubility and poor permeability. J Pharmacol Toxicol Methods. 2000;44(1):235–49.
Bemis GW, Murcko MA. The properties of known drugs. 1. Molecular frameworks. J Med Chem. 1996;39(15):2887–93.
Charoenkwan P, Chumnanpuen P, Schaduangrat N, and Shoombuatong W. Accelerating the identification of the allergenic potential of plant proteins using a stacked ensemble-learning framework. J Biomol Struct Dyn. 2024;1–13.
Simeon S, et al. osFP: a web server for predicting the oligomeric states of fluorescent proteins. J Cheminform. 2016;8:1–15.
Pfaff AR, Beltz J, King E, Ercal N. Medicinal thiols: current status and new perspectives. Mini Rev Med Chem. 2020;20(6):513–29.
Cazzola M, Calzetta L, Page C, Rogliani P, Matera MG. Thiol-based drugs in pulmonary medicine: much more than mucolytics. Trends Pharmacol Sci. 2019;40(7):452–63.
Khanna K, et al. Exploring antiviral and anti-inflammatory effects of thiol drugs in COVID-19. Am J Physiol Lung Cell Mol Physiol. 2022;323(3):L372–89.
National Center for Biotechnology Information (2024). PubChem Taxonomy Summary for Taxonomy 7459, Apis. Available: https://pubchem.ncbi.nlm.nih.gov/taxonomy/Apis
Weiser MJ, Wu TJ, Handa RJ. Estrogen receptor-beta agonist diarylpropionitrile: biological activities of R- and S-enantiomers on behavior and hormonal response to stress. Endocrinology. 2009;150(4):1817–25.
Kudwa AE, McGivern RF, Handa RJ. Estrogen receptor beta and oxytocin interact to modulate anxiety-like behavior and neuroendocrine stress reactivity in adult male and female rats. Physiol Behav. 2014;129:287–96.
Suthprasertporn N, Suwanna N, Thangnipon W. Protective effects of diarylpropionitrile against hydrogen peroxide-induced damage in human neuroblastoma SH-SY5Y cells. Drug Chem Toxicol. 2022;45(1):44–51.
Krishnamurthy N, Hu Y, Siedlak S, Doughman YQ, Watanabe M, Montano MM. Induction of quinone reductase by tamoxifen or DPN protects against mammary tumorigenesis. FASEB J. 2012;26(10):3993–4002.
Motylewska E, Stasikowska O, Melen-Mucha G. The inhibitory effect of diarylpropionitrile, a selective agonist of estrogen receptor beta, on the growth of MC38 colon cancer line. Cancer Lett. 2009;276(1):68–73.
Yan VC, Pham CD, Arthur K, Yang KL, Muller FL. Aliphatic amines are viable pro-drug moieties in phosphonoamidate drugs. Bioorg Med Chem Lett. 2020;30(24):127656.
Kazi AA, Subba Reddy BV, Ravithej Singh L. Synthetic approaches to FDA approved drugs for asthma and COPD from 1969 to 2020. Bioorg Med Chem. 2021;41:116212.
Acknowledgements
This project is funded by the National Research Council of Thailand and Mahidol University (N42A660380), and Mahidol University Partnering Initiative under the MU-KMUTT Biomedical Engineering & Biomaterials Research Consortium.
Funding
Open access funding provided by Mahidol University. This project is funded by the National Research Council of Thailand and Mahidol University (N42A660380), and Mahidol University Partnering Initiative under the MU-KMUTT Biomedical Engineering & Biomaterials Research Consortium.
Author information
Authors and Affiliations
Contributions
NS: Design of this study, data collection, formal analysis, drafting the article, data analysis and interpretation, and docking study and analysis. HC: MD simulations, data analysis and interpretation, drafting the article. TR: MD simulations, drafting the article. PM: Data analysis and interpretation. WS: Project administration, supervision, design of this study, methodology, data analysis and interpretation, drafting the article, and critical revision of the article. All authors reviewed and approved the manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Schaduangrat, N., Chuntakaruk, H., Rungrotmongkol, T. et al. M3S-GRPred: a novel ensemble learning approach for the interpretable prediction of glucocorticoid receptor antagonists using a multi-step stacking strategy. BMC Bioinformatics 26, 117 (2025). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12859-025-06132-1
Received:
Accepted:
Published:
DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12859-025-06132-1