M3S-GRPred: a novel ensemble learning approach for the interpretable prediction of glucocorticoid receptor antagonists using a multi-step stacking strategy

Schaduangrat, Nalini; Chuntakaruk, Hathaichanok; Rungrotmongkol, Thanyada; Mookdarsanit, Pakpoom; Shoombuatong, Watshara

doi:10.1186/s12859-025-06132-1

Research
Open access
Published: 30 April 2025

M3S-GRPred: a novel ensemble learning approach for the interpretable prediction of glucocorticoid receptor antagonists using a multi-step stacking strategy

Nalini Schaduangrat¹,
Hathaichanok Chuntakaruk^2,3,4,
Thanyada Rungrotmongkol^2,3,
Pakpoom Mookdarsanit⁵ &
…
Watshara Shoombuatong¹

BMC Bioinformatics volume 26, Article number: 117 (2025) Cite this article

529 Accesses
Metrics details

Abstract

Accelerating drug discovery for glucocorticoid receptor (GR)-related disorders, including innovative machine learning (ML)-based approaches, holds promise in advancing therapeutic development, optimizing treatment efficacy, and mitigating adverse effects. While experimental methods can accurately identify GR antagonists, they are often not cost-effective for large-scale drug discovery. Thus, computational approaches leveraging SMILES information for precise in silico identification of GR antagonists are crucial, enabling efficient and scalable drug discovery. Here, we develop a new ensemble learning approach using a multi-step stacking strategy (M3S), termed M3S-GRPred, aimed at rapidly and accurately discovering novel GR antagonists. To the best of our knowledge, M3S-GRPred is the first SMILES-based predictor designed to identify GR antagonists without the use of 3D structural information. In M3S-GRPred, we first constructed different balanced subsets using an under-sampling approach. Using these balanced subsets, we explored and evaluated heterogeneous base-classifiers trained with a variety of SMILES-based feature descriptors coupled with popular ML algorithms. Finally, M3S-GRPred was constructed by integrating probabilistic feature from the selected base-classifiers derived from a two-step feature selection technique. Our comparative experiments demonstrate that M3S-GRPred can precisely identify GR antagonists and effectively address the imbalanced dataset. Compared to traditional ML classifiers, M3S-GRPred attained superior performance in terms of both the training and independent test datasets. Additionally, M3S-GRPred was applied to identify potential GR antagonists among FDA-approved drugs confirmed through molecular docking, followed by detailed MD simulation studies for drug repurposing in Cushing’s syndrome. We anticipate that M3S-GRPred will serve as an efficient screening tool for discovering novel GR antagonists from vast libraries of unknown compounds in a cost-effective manner.

Peer Review reports

Introduction

The glucocorticoid receptor (GR), a widely expressed transcription factor activated by ligands and part of the nuclear receptor superfamily, mediates various biological processes. These processes include gluconeogenesis, inflammation, immunity, bone metabolism, cardiovascular function, overall homeostasis and development, and brain function [1]. It is essential for survival, as mice with a disrupted GR cannot survive postpartum due to multiple defects [2]. Cortisol, the natural hormone that binds to GR, is secreted by the adrenal glands and regulated by adrenocorticotropic hormone (ACTH). Cushing syndrome results from excessive glucocorticoid exposure, causing significant morbidity and mortality. It can develop from corticosteroid administration (exogenous) or uncontrolled cortisol hypersecretion, whether ACTH-dependent or independent (endogenous) [3, 4]. Presently, there are two medical treatments for endogenous Cushing’s syndrome that have received approval from the US Food and Drug Administration (FDA). The first approved medical therapy is mifepristone, a nonselective GR antagonist, designated for adult patients with glucose intolerance or type 2 diabetes mellitus coupled with Cushing’s syndrome and are either ineligible for surgery or have undergone unsuccessful surgery [5]. The second approved medical therapy is pasireotide, characterized as an agonist of the somatostatin receptor. It is approved specifically for those diagnosed with a subset of Cushing’s syndrome called Cushing’s disease, when pituitary surgery is not a viable option or has proven ineffective [6].

As previously mentioned, Cushing’s syndrome is caused by excessive cortisol activity, leading to severe symptoms like excess trunk fat, thin arms and legs, rounded face, and a fatty hump between the shoulders. Patients often experience diabetes, hypertension, skin issues, and psychiatric disturbances [7, 8]. Moreover, elevated cortisol levels are linked to a heightened risk of cardiovascular events such as myocardial infarction, cerebrovascular events like sepsis, thromboembolism, and stroke, leading to a greater mortality risk compared to the general population [9,10,11]. Mifepristone effectively alleviates the clinical effects of elevated cortisol by acting as a GR antagonist, improving patients’ overall condition [5]. However, it does not reduce cortisol production and has drawbacks due to its non-selectivity. Its strong affinity for the progesterone receptor (PR) can cause pregnancy termination and issues like irregular vaginal bleeding or endometrial thickening in some patients [12]. Currently, a Phase 3 clinical trial for GR antagonist, Relacorilant is under evaluation (NCT02804750). Consequently, there is an ongoing need to discover new GR antagonists with diverse properties suitable for therapeutic purposes in various diseases involving GR signaling. However, the conventional drug discovery process is known for being prolonged and time-intensive. To expedite this process, machine learning (ML)-based approaches have proven to be highly effective. Moreover, researchers are exploring diverse computer-assisted approaches for GR drug design, including the prediction of quantitative structure–activity relationship (QSAR) for GR using models based on ML [13,14,15,16], deep learning [17, 18], molecular docking [19,20,21,22], molecular dynamic simulations [23,24,25,26], and pharmacophore analysis [25, 27, 28], among others.

Here, we present a novel stacked ensemble learning approach, named M3S-GRPred, designed to rapidly and accurately discover novel GR antagonists. The major contributions of the proposed strategy can be summarized as follows: (i) We developed a multi-step stacking strategy (M3S) to develop M3S-GRPred for solving the data imbalance problem (i.e., 1,314 active compounds and 275 inactive compounds). Unlike the conventional stacking strategy, the M3S-GRPred method employed an under-sampling approach to construct different balanced subsets. Using these balanced subsets, we evaluated and compared the performance of various base-classifiers trained with five SMILES-based feature descriptors (i.e., AP2DC, CDKExt, FP4C, MACCS, and Pubchem) coupled with popular ML algorithms (i.e., KNN, MLP, PLS, RF, SVM, and XGB). All the base-classifiers were employed to generate probabilistic features (PFs) based on the ten-fold cross-validation procedure. Finally, we utilized a two-step feature selection to optimize these PFs and determine the best feature subset for constructing the final ensemble learning model using stacking strategy; (ii) M3S-GRPred is the first SMILES-based stacked model for the identification of GR antagonists; (iii) Experimental results showed that the M3S method not only addresses the data imbalance problem, but also achieves more stable and accurate identification of GR antagonists. Specifically, as indicated by the independent test, M3S-GRPred outperformed several traditional ML classifiers, achieving a balanced accuracy (BACC) of 0.891, Matthews correlation coefficient (MCC) of 0.658 and, area under the receiver-operating curves (AUC) of 0.953; and (iv) The proposed M3S-GRPred was applied to identify important features for GR antagonists and to determine FDA-approved drugs that could potentially act as GR antagonists using molecular docking and MD simulation studies. As a result, M3S-GRPred identified two FDA-approved drugs as potential GR antagonists.

Materials and methods

Training and independent test datasets

In this study, compounds were sourced from the ChEMBL database (Target: GR; ID: CHEMBL2034) [29]. Initially, 13,227 compounds exhibiting activity towards GR were retrieved and subjected to data curation using our in-house code in the R programming environment [30]. During this process, compounds with ‘=’ symbols in their “Standard.Value” column were retained, while those with symbols such as ‘<’, ‘>’, or ‘/’ were excluded. Redundant and missing data points were also eliminated. Subsequently, the dataset was refined by selecting compounds with quantitative IC₅₀ bioactivity values obtained from functional and cell-based assays relevant to GR activity, resulting in a final dataset of 1632 compounds. To enhance data clarity and enable comparisons of drug potency at the equimolar concentrations, these compounds were converted into their pIC₅₀ values (negative logarithm base 10 of IC₅₀ in Molar concentration) and further processed according to our previous works [31,32,33,34]. Consequently, the final dataset comprised 1314 active compounds and 275 inactive compounds. Among these, we applied 75% and 25% of all the active and inactive compounds to construct the training and independent test datasets, respectively.

Feature extraction methods

Feature extraction involves characterizing molecules of interest through quantitative and qualitative descriptors that encompass their structural composition, connectivity, and physicochemical traits [28]. In our study, we implemented data preprocessing steps using the PADEL-descriptor software [35], which included eliminating salts, removing duplicate data, and standardizing tautomers. Following this preprocessing phase, we utilized the SMILES notation of the compounds under analysis to generate molecular fingerprints. A total of five distinct fingerprint types were employed in this research, including AP2DC, CDKExt, FP4C, MACCS, and Pubchem [36,37,38,39,40]. For a detailed description of each fingerprint, please refer to Table 1. All calculations related to molecular fingerprint generation were conducted within the R [30] programming environment.

Table 1 Summary of five SMILES-based feature descriptors used in this study

Full size table

Overall framework of M3S-GRPred

M3S is a novel multi-step stacking strategy, which is developed for solving the data imbalance issue by leveraging the stacking strategy coupled with the under-sampling technique [41, 42]. Recently, this approach was successfully applied to the interpretable identification of IL-6 inducing peptides [41] and allergenicity of chemical compounds [42]. Here, we applied the M3S strategy for discovering novel GR antagonists. The overall framework of M3S-GRPred is divided into the following steps (Fig. 1), which include: (i) preparing the balanced training datasets; (ii) constructing base-classifiers; and (iii) optimizing meta-classifiers.

In the first step, balanced training subsets (BTS) were established using the under-sampling approach on the original training dataset, which initially contained 790 positives and 159 negatives. Given the 5:1 ratio between positives and negatives, we under-sampled from the positive samples five times to create five balanced training subsets (i.e., BTS1–BTS5). Importantly, there is no overlap among these five positive subsets. Utilizing all five positive subsets allows us to maximize the utility of the information provided by the active compounds. In the second step, for each balanced training subset, 30 base-classifier were constructed based on six ML algorithms (i.e., KNN, MLP, PLS, RF, SVM, and XGB) in conjunction with five molecular descriptors (i.e., AP2DC, CDKExt, FP4C, MACCS, and Pubchem). These six ML algorithms are widely applied in research related to drug discovery and development [43,44,45]. By utilizing all balanced training subsets, we obtained a total of 150 well-trained base-classifiers. All base-classifiers were built and optimized using the caret package in the R programming environment [46], with optimal parameters determined through ten-fold cross-validation (Supplementary Table S2).

At the final step, a probability feature vector (PFV) was established using probability scores from the 150 base-classifiers for being GR antagonists [41, 47,48,49]. For a given compound C, its PFV can be represented by:

$$\text{PFV}=\left[{\text{P}(\text{ML}}_{1}, {\text{MD}}_{1},{\text{BTS}}_{1}\right),{\text{P}(\text{ML}}_{1}, {\text{MD}}_{1},{\text{BTS}}_{2}),\dots .,{\text{P}(\text{ML}}_{6}, {\text{MD}}_{5},{\text{BTS}}_{5})]$$

(1)

where ${\text{P}(\text{ML}}_{\text{i}}, {\text{MD}}_{\text{j}},{\text{BTS}}_{k}),$ denotes the probability score derived from the base-classifier trained with the ith ML algorithm and the jth feature encoding over the kth balanced training subset. Thus, the PFV of the compound C could be represented by a 150-D probabilistic feature vector. To enhance the learning accuracy, a two-step feature selection strategy was employed to determine the best feature subset containing m useful PFs to develop the proposed model. The two-step feature selection implementation herein is the same as applied in our previous studies [45, 50, 51]. In this strategy, all PFs were initially ranked based on their RF-based mean decrease of Gini index (MDGI). Fifteen feature subsets containing m top-ranked importance PFs were generated, where m = 10, 20, 30,…, 150. Subsequently, each feature subset was used to train 15 different SVM-based meta-classifiers independently. The feature subset yielding the highest cross-validation Matthews Correlation Coefficient (MCC) was selected as the best feature subset.

Performance evaluation

Herein, we employed two standard evaluation strategies to assess the robustness and generalization ability of the prediction models, including ten-fold cross-validation and independent tests. In the meanwhile, six performance measures: MCC, ACC, AUC, sensitivity (SN), and specificity (SP) were used to evaluate the performance of the prediction models [52, 53]. These performance measures are defined as

$$\text{SN}=\frac{\text{TP}}{\left(\text{TP}+\text{FN}\right)}$$

(2)

$$\text{SP}=\frac{\text{TN}}{\left(\text{TN}+\text{FP}\right)}$$

(3)

$$\text{ACC}=\frac{\text{TP}+\text{TN}}{\left(\text{TP}+\text{TN}+\text{FP}+\text{FN}\right)}$$

(4)

$$\text{MCC}=\frac{\text{TP}\times \text{TN}-\text{FP}\times \text{FN}}{\sqrt[]{(\text{TP}+\text{FP})(\text{TP}+\text{FN})(\text{TN}+\text{FP})(\text{TN}+\text{FN})}}$$

(5)

$$\text{BACC}=(\text{SN}+\text{SP})/2$$

(6)

where the numbers of correctly predicted positive and negative samples were referred to as TP and TN, respectively. On the other hand, the numbers of falsely predicted positive and negative samples are referred to as FP and FN, respectively [41, 42, 47, 49, 54, 55].

Molecular docking study of FDA-approved drugs

A library of 2735 FDA-approved small molecule drugs was obtained from the DrugBank database (version 5.1.10; released on January 4, 2023). After eliminating inorganic compounds, salt, SMILES with explicit valence, disconnected SMILES representations, and duplicates, the number of compounds was reduced to 1737. Molecular descriptors were computed for these 1737 compounds, and used as input for predictions with our model. To aid in drug repurposing efforts, the top 30 compounds identified by our model were then subjected to docking analysis using the 3D structure of GR (PDB ID: 1NHZ) obtained from the Protein Data Bank (https://www.rcsb.org/) and prepared for docking in UCSF Chimera X [56]. The refined structure underwent energy optimization and minimization using the OPLS3 force field with 5000 steps of steepest descent (step size = 0.02 Å) and 500 steps of conjugate gradient (step size = 0.02 Å). A receptor grid box was constructed using MGLTools 1.5.7 [57], with dimensions X = 56, Y = 64, and Z = 62, centered at coordinates − 5.31 Å, 14.112 Å, and 5.61 Å based on active site residues within the GR binding pocket. This coordinate space was utilized as the docking site. To validate the docking approach, the co-crystallized ligand (mifepristone; RU486) in the crystal structure was re-docked into the active site of GR to assess the ability of the docking method to replicate the native conformation of the inhibitor. The docking run employed 20 binding modes, an energy range of 4, and an exhaustiveness of 32. Docked complexes were visualized using PyMOL [58].

Molecular dynamics (MD) simulations

The docked conformation of four screened compounds (i.e., azelastine (AZE), metergoline (MET), perampanel (PER), and pirenzepine (PIR)) bound to the GR, along with the GR/MF (Mifepristone) crystal structure [59], were used as initial structures for MD simulations using AMBER22 [60] with periodic conditions as detailed in previous studies [24, 61,62,63]. The protonation state of protein receptors was determined using the PROPKA in the PDB2PQR webserver [64]. The electrostatic potential (ESP) charges for optimized drugs were derived at the HF/6-31(d) theory level and incorporated into restrained ESP (RESP) charges through the ANTECHAMBER module of AmberTools21 [60]. The AMBER force fields ff19SB and GAFF2 were applied for the protein and drug, respectively. Missing hydrogen atoms were added using the tLEaP module. The system was neutralized with counterions and immersed in the TIP3P water model in the octahedral box [65] extended at least 10 Å from the protein surface. Structural minimization was performed with 1500 steps of steepest descent (SD) followed by conjugated gradient (CG) methods. Subsequently, MD simulations with a 2-fs time step were executed, with nonbonded interactions limited of 10 Å, and long-range electrostatic interactions [66] treated using the Particle Mesh Ewald (PME) summation approach. Pressure and temperature were controlled, and covalent bonds involving hydrogen atoms were constrained using the SHAKE methodology [67]. The models were heated from 10 to 310 K for 100 ps and maintained at 310 K for 100 ns, with three replicates using different velocities. The CPPTRAJ module was used to analyze all-atom root mean square deviation (RMSD), intermolecular hydrogen bonds, and contact atoms between the drug and protein over the production phase [68]. Binding affinity of the drug/protein complex for each simulation was estimated using Molecular mechanics with the Generalized-Born (MM/GBSA) [69] or Poisson-Boltzmann (MM/PBSA) [70] surface area solvation calculations, using 100 snapshots from the last 20 ns. Drug-ligand interactions were. Visualized with LigandScout 4.4.8 [71].

Results and discussion

Chemical space analysis

Chemical space analysis in drug discovery aims to understand the distribution of chemical compounds, categorized as active and inactive, based on their physicochemical properties. This involves examining various descriptors such as molecular weight (MW), octanol–water partition coefficient (AlogP), hydrogen bond acceptor count (HBA), hydrogen bond donor count (HBD), topological polar surface area (TPSA), and rotatable bond count (nRotB). Lipinski’s Rule of Five (Ro5) sets criteria for determining if a compound is orally active, with parameters like ALogP < 5, MW < 500, HBD < 5, and HBA < 10 [72]. The chemical spaces, based on physicochemical properties related to the Ro5 and Veber’s rule (i.e., nRot < 10 and TPSA < 140 Å²), were analyzed and depicted in Fig. 2. The findings reveal that the majority of compounds in the active group exhibit a MW ranging from 400 to 600 Da, while those in the inactive group are clustered within the 300–500 Da range (Fig. 2A). Similarly, the AlogP values for both active and inactive groups (Fig. 2B) depict compound density within the ranges of 4–6 and 3–5.5, respectively. Although these properties exhibit slight differences, they are statistically significant, as determined by the Mann–Whitney U Test with a p-value of < 0.05. The HBA and HBD parameters delineate the hydrogen bonding capacity of the compounds. Our results indicate that the majority of compounds in both classes meet the Ro5 criteria for HBA and HBD (Fig. 2C, D), although the differences between the groups are not statistically significant for the HBA property. Furthermore, the active and inactive compounds exhibit maximum ranges of 50–100 and 60–80 Å² for TPSA, and 3–8 and 2–7 for nRotB, respectively (Fig. 2E, F). The differences between the classes for both properties are statistically significant, with p-values of < 0.05. Thus, it can be inferred that the statistical significance between these groups can provide insights into the relationship between the active compounds and their biological activity as inhibitors. This information can be valuable in guiding the design and optimization of new drug candidates.

To ensure that the independent test dataset is sufficiently distinct from the training dataset to avoid overestimation of model performance, we performed a scaffold analysis using the Bemis-Murcko framework [73] to demonstrate the number of unique scaffolds found in either the training dataset or independent test dataset. The scaffold analysis results showed that there are 50.23% of unique scaffolds found in the independent test dataset (detailed in Supplementary Table S1). In addition, to confirm the generalization ability of the proposed model, we computed the Tanimoto similarity coefficient between compound pairs in the training and independent test datasets based on ECFP4 fingerprints. As can be seen from Supplementary Figures S1-S2, the heatmap is predominantly blue, while the average Tanimoto similarity coefficient was 0.135 and 98.33% of compound pairs in the training and independent test datasets exhibited a Tanimoto similarity coefficient of less than 0.5, indicating low similarity between the training and independent test datasets.

Effect of the under-sampling method on prediction performance

This section investigated two different comparative scenarios. In the first scenario, we compared the performance of various ML classifiers trained with six ML algorithms coupled with five molecular descriptors on the imbalanced dataset. In the second scenario, these ML classifiers were trained and tested on the five balanced training subsets (i.e., BTS1–BTS5). To assess the contributions of molecular descriptors, ML algorithms, and under-sampling technique, six performance measures including BACC, AUC, SN, SP, MCC, and ACC achieved by the ML classifiers were evaluated and compared using ten-fold cross-validation and independent tests. As mentioned above, ML classifiers achieving the highest cross-validation MCC were regarded as the best-performing classifiers. The experimental results of these two comparative scenarios are provided in Fig. 3 and Tables 2 along with Supplementary Tables S3-S9.

Table 2 Cross-validation MCC scores of 30 base-classifiers developed based on individual five balanced training datasets

Full size table

As can be seen from Supplementary Table S3, it can be noticed that there is no ML classifier trained with the imbalanced dataset achieving a cross-validation MCC value greater than 0.5. The top-five ML classifiers in this case were SVM-CDKExt, RF-CDKExt, RF-Pubchem, XGB-FP4C, and XGB-Pubchem with corresponding MCC scores of 0.431, 0.425, 0.424, 0.418, and 0.414, respectively (Fig. 3). In the meanwhile, on the independent test dataset, their performance remained unsatisfactory, with corresponding MCC scores of 0.418, 0.423, 0.429, 0.413, and 0.464, respectively (Supplementary Table S4). This indicated that the imbalanced dataset could be detrimental to model performance. Therefore, we were motivated to employ the under-sampling technique to improve model performance.

In case of the models trained with the balanced training subsets, it was observed that all of top-50 ML classifiers provided cross-validation MCC scores greater than 0.5 (Supplementary Table S5). These results demonstrate that the performance of the models trained based on balanced training subset are better than that of the imbalanced training dataset. We noticed that the top-five ML classifiers, achieving cross-validation MCC scores of 0.563, 0.563, 0.560, 0.555, and 0.554, were RF-Pubchem_BTS1, SVM-Pubchem_BTS4, XGB-FP4C_BTS5, SVM-CDKExt_BTS2, and PLS-Pubchem_BTS1, respectively (Fig. 4). Among these classifiers, two classifiers were able to attain MCC scores of 0.508 (RF-Pubchem_BTS1) and 0.523 (SVM-Pubchem_BTS4) as judged the independent test (Supplementary Table S6). To characterize the performance of the models, we calculated the average cross-validation and independent test MCC scores over the five balanced datasets with respect to each ML classifier. As shown in Table 2 and Supplementary Table S7, the top-five ML classifiers, achieving average cross-validation MCC scores of 0.526, 0.524, 0.521, 0.510, and 0.509 are RF-Pubchem, XGB-FP4C, SVM-AP2DC, SVM-CDKExt, and PLS-Pubchem, respectively. To confirm the effectiveness of the under-sampling technique, we compared the performance of the best-performing models trained with balanced (RF-Pubchem_BTS1) and imbalanced (SVM-CDKExt) training datasets. Additionally, we compared the performance of the top-five ML classifiers trained with their balanced datasets to that of the top-five ML classifiers trained with the imbalanced dataset. As shown in Supplementary Table S8, it is apparent that the top-five ML classifiers trained with their balanced datasets perform better than those trained with their imbalanced dataset in terms of BACC and MCC over the training subset. Furthermore, in the independent test, RF-Pubchem_BTS1 outperformed SVM-CDKExt, achieving a BACC of 0.823, MCC of 0.508, and AUC of 0.883 (Supplementary Table S9). This confirms again that the under-sampling technique is beneficial for enhancing model performance. Therefore, we utilized all the ML classifiers trained with the balanced dataset to construct our proposed models in the following studies.

Construction of M3S-GRPred

Generally, it is straightforward to select the best ML classifiers among various ML classifiers trained with different ML algorithms and molecular descriptors. However, the predictive ability of single-feature-based models might not be robust enough [41, 47,48,49, 51, 55, 74]. To deal with the limitation arising from the single-feature-based models, we employed our powerful M3S method to develop the stacked ensemble learning model. Our stacked ensemble learning model was developed based on SVM method (referred to be mSVM) trained with the 150-D probabilistic feature vector. In addition, we applied the two-step feature selection method to identify m useful PFs. In this feature selection method, we initially ranked the PFs based on MDGI scores and generated 15 feature subsets containing m top-ranked informative PFs. After that, all the 15 feature subsets were used to develop different mSVM and their performance was assessed over both the cross-validation and independent tests. As shown in Supplementary Table S10, it is apparent that the feature subsets containing 140 and 150 top-ranking PFs achieved better performance than other feature subsets in term of MCC over the cross-validation test, which are referred as PFV and PFV_FS herein, respectively. The performance evaluation results of PFV and PFV_FS are recorded in Table 3. As can be seen, PFV and PFV_FS achieve similar cross-validation MCC scores of 0.713 and 0.708, respectively. On the independent test dataset, the MCC, SN, and BACC of PFV_FS were 0.658, 0.928, and 0.891, which were 1.48, 1.45, and 0.880% higher than PFV, respectively, demonstrating the effectiveness and robustness of PFV_FS. Therefore, we utilized PFV_FS to develop the final stacked ensemble learning model (M3S-GRPred).

Table 3 Performance of new feature representations over the ten-fold cross-validation and independent tests

Full size table

M3S-GRPred outperforms several traditional machine learning-based classifiers

In this section, we compared the performance M3S-GRPred with its constituent base-classifiers trained with balanced and imbalanced training subsets to demonstrate the advantage of the M3S strategy in overcoming the data imbalance problem and attaining the performance improvement and robustness. In the first comparative experiment, we compared M3S-GRPred with the top-five base-classifiers trained with different balanced training subsets. As mentioned above, the top-five base-classifiers in this case were RF-Pubchem_BTS1, SVM-Pubchem_BTS4, XGB-FP4C_BTS5, SVM-CDKExt_BTS2, and PLS-Pubchem_BTS1. Figure 5A, B and Table 4 shows that M3S-GRPred exhibited better performance than the compared base-classifiers over the ten-fold cross-validation and independent tests. Specifically, on the independent test dataset, M3S-GRPredachieved remarkable improvements of 5.62–14.89, 13.46–27.12, and 6.95–13.33% in terms of BACC, MCC, and AUC, respectively. These results demonstrate that M3S-GRPredattained high accuracy and stability in the identification of GR antagonists. In addition, to determine whether our proposed framework can address the imbalanced data problem, we compared M3S-GRPredwith the top-five base-classifiers trained with the imbalanced training dataset. As illustrated in Fig. 5C, D and Table 5, significant improvements in prediction performances were observed across all six measures on both the independent test and training datasets. Specifically, our proposed model significantly improved BACC by 9.94–12.52%, MCC by 19.35–24.51%, and AUC by 8.57–11.26%. Altogether, these results indicate that our proposed framework used to develop M3S-GRPrednot only addresses the data imbalance problem, but also effectively leverages heterogeneous models to achieve more stable and accurate identification of GR antagonists.

Table 4 Performance comparison of M3S-GRPred and top-five ML base-classifiers developed using different balanced training subsets

Full size table

Table 5 Performance comparison of M3S-GRPred and top-five base-classifiers developed using the imbalanced training subset

Full size table

Model interpretation and feature importance analysis

To gain deeper insight into specific substructural elements potentially responsible for antagonistic effects against GR, we employed the RF classifier to determine and rank the feature importance based on the MDGI [43,44,45, 50, 75]. Since the top-three base-classifiers were developed using BTS1, BTS4, and BTS5, we performed feature importance analysis using RF classifiers couple with an interpretable feature descriptor (i.e., Pubchem) on these three balanced training subsets. The important features identified in BTS1, BTS4, and BTS5 are summarized in Table 6. Features with the highest MDGI scores are considered most important for GR antagonist identification. Consequently, we selected the top-20 important features found in all three balanced training subsets for detailed feature importance analysis, as summarized in Table 6. Taking Pubchem568 as an example, its MDGI scores (ranks) based on BTS1, BTS4, and BTS5 were 3.23(1), 1.22(8), and 1.46(9), respectively.

Table 6 List of important Pubchem-based features along with their MDGI scores (ranks) found in all three balanced training datasets (i.e., BTS1, BTS5, and BTS4)

Full size table

From our analysis, it was highlighted that the top feature in BTS4 and BTS5 were the same, i.e., Pubchem799 (2.59(3), 4.65(1), 3.57(1)) which pertains to 3-methylcyclohexane-1-thiol. This compound is an alkylthiol, meaning that an alkyl group (i.e., methylcyclohexane) is attached to a sulfhydryl group. In addition, Pubchem736 (1.65(9), 2.35(2), 1.93(3)) and Pubchem778 (1.67(8), 1.15(10), 1.62(6)), corresponding to 3-methylbenzenethiol and 4-methylcyclohexane-1-thiol, respectively, are among the top 10 features containing alkylthiol substructures. The thiol (− SH) functional group is found in numerous drug compounds, imparting a unique combination of useful properties. Thiol-containing drugs act as antioxidants by neutralizing radicals and other harmful electrophiles, replenishing cellular thiol pools, and forming stable complexes with heavy metals like arsenic, copper, and lead [76]. In addition, thiol-based drugs are classified as mucolytics due to their ability to lower the thickness and flexibility of bronchial secretions by breaking down disulfide bonds in proteins [77]. A recent study by Khanna et al. [78], explored the antiviral and anti-inflammatory effects of thiol-based drugs in Covid-19. The authors observed that in vivo treatment with thiol drugs (i.e., cysteamine) exerted anti-inflammatory effects and reduced SARS-CoV-2-induced lung inflammation and injury. Moreover, in vitro assays showed that multiple thiol drugs were capable of inhibiting the binding of SARS-CoV2 spike protein to its receptor thereby, inhibiting viral infection [78]. Taken together, the anti-inflammatory and antioxidant properties of thiol-based drugs could be beneficial for treating Cushing’s syndrome, as GR is associated with the inflammation pathway.

Pubchem568 (3.23(1), 1.22(8), 1.46(9)) corresponds to propionitrile, an aliphatic nitrile, polar aprotic solvent, and a natural product found in the Apis species [79]. Additionally, propionitrile serves as a precursor for diarylpropionitrile (DPN), which exhibits strong ERβ agonist properties [80]. Furthermore, DPN has demonstrated antidepressant- and anxiolytic-like effects in animals by activating the endogenous oxytocin system, the body’s natural mechanism for managing stress and promoting well-being [81]. A study by Thangnipon et al., assessed the neuroprotective effects of DPN against oxidative stress in human neuroblastoma cells and concluded that DPN could be beneficial for protecting against neurodegenerative diseases [82]. Several studies have also investigated the role of DPN in breast cancer inhibition [83, 84]. Given that GR belongs to the same superfamily as ERβ (i.e., steroid nuclear receptors), exploring the effects of DPN on GR is worthwhile.

Pubchem340 (1.34(12), 1.08(15), 1.23(14)) and Pubchem336 (1.13(18), 1.16(9), 1.16(17)) corresponding to isopropylamine and 2-methylpropan-2-amine (i.e., tert-Butylamine), respectively are primary aliphatic amines found in the top-ten common features. These compounds serve as prodrug moieties in medicinal chemistry, particularly valued for their efficient drug release capabilities with both small and long-chain aliphatic amines [85]. Moreover, the substitution of aliphatic amines in GR modulators have been patented (US20060154973A1 and WO2005090336A1), exploring a new class of non-steroidal compound for treating GR-associated diseases. Isopropylamine (Pubchem340), when combined with 2-chloro-1-(3,4-dihydroxyphenyl)ethan-1-one, serves as a precursor for sympathomimetic β-adrenoreceptor drugs such as isoprenaline and metaproterenol, used in COPD and asthma treatment [86]. Similarly, the synthesis of salbutamol, a short-acting β2 adrenergic agonist used to treat asthma and COPD, begins with the acylation of salicylaldehyde to form a α-bromo acetophenone derivative. This intermediate then reacts with tert-butylamine (Pubchem336) in isopropanol [86]. The tert-butyl group from tert-butylamine is highly lipophilic and introduces steric hindrance in the inhibitor molecule, which may advantageously block specific binding sites and prevent interaction of the GR with endogenous glucocorticoids or other ligands. Thus, the aliphatic amine functionality of these features could potentially form hydrogen bonds or participate in other non-covalent interactions with the GR.

Case study: prospective GR inhibitors from FDA-approved drugs

Identification of potential GR inhibitors

In this section, we firstly employed M3S-GRPred for virtual screening, where the model was used to estimate the probability score for each FDA-approved compound from DrugBank. Secondly, the top 30 compounds with the highest probability scores were selected and subjected to molecular docking for predicting binding affinity to GR. Thirdly, among these, four potential compounds were selected for MD simulations to examine ligand stability in dynamic conditions using free energy calculations. (details in the Materials and Methods section). Supplementary Table S11 lists the top compounds with their probabilities, corresponding docking scores, and inhibitor target sites. Notably, the top-four compounds with the highest docking scores are all inhibitors of PR or related nuclear receptors, which have demonstrated cross interactions with GR. Therefore, in efforts for drug repurposing, these inhibitors were not considered for further MD simulations. The selected four compounds were elucidated to be azelastine (AZE), metergoline (MET), perampanel (PER), and pirenzepine (PIR), with docking scores of − 9.3, − 8.8, − 8.7, and − 8.3 kcal/mol, respectively. These compounds, along with the reference co-crystal compound (i.e., Mifepristone (MF), − 11.4 kcal/mol), which is the only FDA-approved GR antagonist, underwent further evaluation through MD simulations.

Structural dynamics and binding affinity of screened drugs against GR

The stability of ligand-binding was assessed using plots of root mean square deviation (RMSD), the number of intermolecular hydrogen bonds (# H-bonds), and the number of atom contacts (# atom contacts) against simulation time. Figure 6 and Supplementary Figures S3 and S4 illustrate that all atoms in the five drug/GR complex systems exhibited high fluctuation during MD simulations; however, the binding sites of all five systems showed less variability. Based on this observation and the plots of # H-bonds and # atom contacts, all simulated systems reached equilibrium at 40 ns. Therefore, in this study, snapshots from the final 20 ns were selected for further analysis in terms of binding free energy and drug/protein interactions, as depicted in Figs. 6 and 7, respectively.

As observed in Fig. 6, each focused drug formed 1.6–1.8 H-bonds with the GR protein target, except for PIR (0.7). Meanwhile, the # atom contacts reached ~ 25 in the complexes bound with AZE and PER, which was higher compared to the reference drug MF (~ 18). Therefore, van der Waals (vdW) interactions could serve as the predominant contributor for drug binding. Interestingly, the # atom contacts in the rest of the complexes (~ 14) were somewhat lower than those in the MF system. Additionally, the ΔGbind values obtained from MM/GBSA for all systems followed a similar trend to those from MM/PBSA, as shown in Supplementary Figure S5. AZE and PER exhibited binding energies with GR in the same magnitude as MF. Furthermore, the # H-bonds in MF/GR complex (2.5 ± 0.2, Fig. 6) involving Q570 and R611 (25% and 18%, Fig. 7) corresponded with previous findings [59], slightly exceeding those in AZE (1.8 ± 0.4) with C736 (30%), PER (1.7 ± 0.6) with Q738 (27%), MET (1.6 ± 0.3) with C736 (26%), and PIR (0.7 ± 0.2) with M601 (19%). However, the binding strengths of the potential drug candidates AZE and PER, were compensated by their higher # atom contacts (26.2 ± 1.3) and (25.4 ± 1.2), respectively, compared to MF (18.4 ± 1.6).

According to Fig. 7, both effective drugs showed potential hydrophobic interactions with residues L566 (98%), M604 (94%), A605 (95%), A607 (80%), L608 (99%), L733 (46%), and F740 (47%), whereas MF interacted only with M464 (90%), L563 (48%), M601 (81%), M639 (27%), and L732 (8%) through hydrophobic interactions. Additionally, the MM energies confirmed that vdW interactions, assessed by MM/GBSA and MM/PBSA, are the primary force driving molecular complexation, with values of − 61.1 ± 0.4 and − 61.3 ± 0.4 kcal/mol, respectively, for AZE, as well as − 61.1 ± 0.4 and − 60.7 ± 0.4 kcal/mol, respectively, for PER (Supplementary Table S12). Taken together, the results of ligand–protein interactions, coupled with the obtained binding affinities, suggest that AZE and PER could potentially serve as GR antagonist similar to MF.

Conclusions

We have developed M3S-GRPred, a novel ensemble learning framework utilizing the stacking strategy to rapidly and accurately discover novel GR antagonists using only SMILES information. M3S-GRPred first constructs balanced training subsets via under-sampling, then employs these subsets to train heterogeneous base classifiers with various SMILES-based feature descriptors and ML algorithms. The final model integrates probabilistic outputs from these base classifiers. To reveal the effectiveness of the proposed M3S-GRPred model, we compared its performance with several conventional ML classifiers over the ten-fold cross-validation and independent tests. Our comparative results shows that M3S-GRPred significantly outperforms conventional ML classifiers, achieving a balanced accuracy of 0.891, MCC of 0.658, and AUC of 0.953, with improvements of 9.94–12.52%, 19.35–24.51%, and 8.57–11.26%, respectively, on an independent test dataset. It also successfully identified potential GR antagonists among FDA-approved drugs, confirmed through molecular docking and MD simulation studies for drug repurposing in Cushing’s syndrome. We anticipate that M3S-GRPred will be an efficient screening tool for discovering novel GR antagonists cost-effectively from vast libraries of unknown compounds.

Availability of data and materials

The molecular data used in this research were acquired from the ChEMBL database version 33 with target ID: CHEMBL2034 (https://www.ebi.ac.uk/chembl/search_results/CHEMBL2034). The case study dataset was acquired from the DrugBank database (version 5.1.10; released on January 4, 2023) of FDA-approved drugs (https://go.drugbank.com/). The implementation of this research and the R source codes are available at https://github.com/Shoombuatong/M3S-GRPred.

References

Hunt HJ, et al. Identification of the clinical candidate (R)-(1-(4-Fluorophenyl)-6-((1-methyl-1H-pyrazol-4-yl)sulfonyl)-4,4a,5,6,7,8-hexahydro-1H-pyrazolo[3,4-g]isoquinolin-4a-yl)(4-(trifluoromethyl)pyridin-2-yl)methanone (CORT125134): a selective glucocorticoid receptor (GR) antagonist. J Med Chem. 2017;60(8):3405–21.
Article CAS PubMed Google Scholar
Cole TJ, et al. Targeted disruption of the glucocorticoid receptor gene blocks adrenergic chromaffin cell development and severely retards lung maturation. Genes Dev. 1995;9(13):1608–21.
Article CAS PubMed Google Scholar
Steffensen C, Bak AM, Rubeck KZ, Jorgensen JO. Epidemiology of cushing’s syndrome. Neuroendocrinology. 2010;92(Suppl 1):1–5.
Article CAS PubMed Google Scholar
Savas M, Mehta S, Agrawal N, van Rossum EFC, Feelders RA. Approach to the patient: diagnosis of cushing syndrome. J Clin Endocrinol Metab. 2022;107(11):3162–74.
Article PubMed PubMed Central Google Scholar
Fleseriu M, et al. Mifepristone, a glucocorticoid receptor antagonist, produces clinical and metabolic benefits in patients with Cushing’s syndrome. J Clin Endocrinol Metab. 2012;97(6):2039–49.
Article CAS PubMed Google Scholar
Hunt HJ, et al. 1H-Pyrazolo[3,4-g]hexahydro-isoquinolines as potent GR antagonists with reduced hERG inhibition and an improved pharmacokinetic profile. Bioorg Med Chem Lett. 2015;25(24):5720–5.
Article CAS PubMed Google Scholar
Brown DR, et al. Clinical management of patients with Cushing syndrome treated with mifepristone: consensus recommendations. Clin Diabetes Endocrinol. 2020;6(1):18.
Article PubMed PubMed Central Google Scholar
Castinetti F, Morange I, Conte-Devolx B, Brue T. Cushing’s disease. Orphanet J Rare Dis. 2012;7:41.
Article PubMed PubMed Central Google Scholar
Li D, El Kawkgi OM, Henriquez AF, Bancos I. Cardiovascular risk and mortality in patients with active and treated hypercortisolism. Gland Surg. 2020;9(1):43–58.
Article PubMed PubMed Central Google Scholar
Dekkers OM, et al. Multisystem morbidity and mortality in Cushing’s syndrome: a cohort study. J Clin Endocrinol Metab. 2013;98(6):2277–84.
Article CAS PubMed Google Scholar
Fleseriu M, et al. Consensus on diagnosis and management of Cushing’s disease: a guideline update. Lancet Diabetes Endocrinol. 2021;9(12):847–75.
Article PubMed PubMed Central Google Scholar
Cadepond F, Ulmann A, Baulieu EE. RU486 (mifepristone): mechanisms of action and clinical uses. Annu Rev Med. 1997;48:129–56.
Article CAS PubMed Google Scholar
Stanojevic M, Vracko M, Sollner Dolenc M. Development of in silico classification models for binding affinity to the glucocorticoid receptor. Chemosphere. 2023;336:139147.
Article CAS PubMed Google Scholar
Spreafico M, Ernst B, Lill MA, Smiesko M, Vedani A. Mixed-model QSAR at the glucocorticoid receptor: predicting the binding mode and affinity of psychotropic drugs. ChemMedChem. 2009;4(1):100–9.
Article CAS PubMed Google Scholar
Lewis DF, Ioannides C, Parke DV, Schulte-Hermann R. Quantitative structure-activity relationships in a series of endogenous and synthetic steroids exhibiting induction of CYP3A activity and hepatomegaly associated with increased DNA synthesis. J Steroid Biochem Mol Biol. 2000;74(4):179–85.
Article CAS PubMed Google Scholar
Shin SH, Hur G, Kim NR, Park JHY, Lee KW, Yang H. A machine learning-integrated stepwise method to discover novel anti-obesity phytochemicals that antagonize the glucocorticoid receptor. Food Funct. 2023;14(4):1869–83.
Article CAS PubMed Google Scholar
Matsuzaka Y, Uesawa Y. A deep learning-based quantitative structure-activity relationship system construct prediction model of agonist and antagonist with high performance. Int J Mol Sci. 2022;23(4):2141.
Article CAS PubMed PubMed Central Google Scholar
Matsuzaka Y, Uesawa Y. Molecular image-based prediction models of nuclear receptor agonists and antagonists using the deepsnap-deep learning approach with the Tox21 10K library. Molecules. 2020;25(12):2764.
Article CAS PubMed PubMed Central Google Scholar
Dey R, Roychowdhury P, Mukherjee C. Homology modelling of the ligand-binding domain of glucocorticoid receptor: binding site interactions with cortisol and corticosterone. Protein Eng. 2001;14(8):565–71.
Article CAS PubMed Google Scholar
Ray NC, et al. Discovery and optimization of novel, non-steroidal glucocorticoid receptor modulators. Bioorg Med Chem Lett. 2007;17(17):4901–5.
Article CAS PubMed Google Scholar
Pang JP, et al. Discovery of a novel nonsteroidal selective glucocorticoid receptor modulator by virtual screening and bioassays. Acta Pharmacol Sin. 2022;43(9):2429–38.
Article CAS PubMed PubMed Central Google Scholar
Hu X, et al. Discovery of novel non-steroidal selective glucocorticoid receptor modulators by structure- and IGN-based virtual screening, structural optimization, and biological evaluation. Eur J Med Chem. 2022;237:114382.
Article CAS PubMed Google Scholar
Alves NRC, Pecci A, Alvarez LD. Structural insights into the ligand binding domain of the glucocorticoid receptor: a molecular dynamics study. J Chem Inf Model. 2020;60(2):794–804.
Article CAS PubMed Google Scholar
Hu X, et al. Discovery of novel GR ligands toward druggable GR antagonist conformations identified by MD simulations and Markov state model analysis. Adv Sci (Weinh). 2022;9(3): e2102435.
Article PubMed Google Scholar
Metin R, Akten ED. Drug repositioning to propose alternative modulators for glucocorticoid receptor through structure-based virtual screening. J Biomol Struct Dyn. 2022;40(21):11418–33.
Article CAS PubMed Google Scholar
Zare F, Solhjoo A, Sadeghpour H, Sakhteman A, Dehshahri A. Structure-based virtual screening, molecular docking, molecular dynamics simulation and MM/PBSA calculations towards identification of steroidal and non-steroidal selective glucocorticoid receptor modulators. J Biomol Struct Dyn. 2023;41(16):7640–50.
Article CAS PubMed Google Scholar
Onnis V, et al. Virtual screening for the identification of novel nonsteroidal glucocorticoid modulators. J Med Chem. 2010;53(8):3065–74.
Article CAS PubMed Google Scholar
Potamitis C, et al. Discovery of New non-steroidal selective glucocorticoid receptor agonists. J Steroid Biochem Mol Biol. 2019;186:142–53.
Article CAS PubMed Google Scholar
Mendez D, et al. ChEMBL: towards direct deposition of bioassay data. Nucleic Acids Res. 2019;47(D1):D930–40.
Article CAS PubMed Google Scholar
R.C. Team, R: a language and environment for statistical computing, 4.3.0 ed. Vienna, Austria: R Foundation for Statistical Computing, 2021.
Schaduangrat N, Anuwongcharoen N, Moni MA, Lio P, Charoenkwan P, Shoombuatong W. StackPR is a new computational approach for large-scale identification of progesterone receptor antagonists using the stacking strategy. Sci Rep. 2022;12(1):1–16.
Article Google Scholar
Schaduangrat N, Anuwongcharoen N, Charoenkwan P, Shoombuatong W. DeepAR: a novel deep learning-based hybrid framework for the interpretable prediction of androgen receptor antagonists. J Cheminform. 2023;15(1):50.
Article CAS PubMed PubMed Central Google Scholar
Schaduangrat N, Homdee N, Shoombuatong W. StackER: a novel SMILES-based stacked approach for the accelerated and efficient discovery of ERalpha and ERbeta antagonists. Sci Rep. 2023;13(1):22994.
Article CAS PubMed PubMed Central Google Scholar
Schaduangrat N, Malik AA, Nantasenamat C. ERpred: a web server for the prediction of subtype-specific estrogen receptor antagonists. PeerJ. 2021;9: e11716.
Article PubMed PubMed Central Google Scholar
Yap CW. PaDEL-descriptor: an open source software to calculate molecular descriptors and fingerprints. J Comput Chem. 2011;32(7):1466–74.
Article CAS PubMed Google Scholar
Carhart RE, Smith DH, Venkataraghavan R. Atom pairs as molecular features in structure-activity studies: definition and applications. J Chem Inf Comput Sci. 1985;25(2):64–73.
Article CAS Google Scholar
Durant JL, Leland BA, Henry DR, Nourse JG. Reoptimization of MDL keys for use in drug discovery. J Chem Inf Comput Sci. 2002;42(6):1273–80.
Article CAS PubMed Google Scholar
Kim S, et al. PubChem substance and compound databases. Nucleic Acids Res. 2016;44(D1):D1202–13.
Article CAS PubMed Google Scholar
Laggner C, SMARTS Patterns for Functional Group Classification, 2005.
Steinbeck C, Han Y, Kuhn S, Horlacher O, Luttmann E, Willighagen E. The chemistry development kit (CDK): an open-source java library for chemo- and bioinformatics. J Chem Inf Comput Sci. 2003;43(2):493–500.
Article CAS PubMed PubMed Central Google Scholar
Charoenkwan P, Chiangjong W, Nantasenamat C, Hasan MM, Manavalan B, Shoombuatong W. StackIL6: a stacking ensemble model for improving the prediction of IL-6 inducing peptides. Briefings in Bioinform. 2021;22(6):bbab172.
Article Google Scholar
Charoenkwan P, Schaduangrat N, Manavalan B, and Shoombuatong W. M3S-ALG: Improved and robust prediction of allergenicity of chemical compounds by using a novel multi-step stacking strategy, Future Generation Comput Syst. 2024.
Malik AA, Phanus-umporn C, Schaduangrat N, Shoombuatong W, Isarankura-Na-Ayudhya C, Nantasenamat C. HCVpred: a web server for predicting the bioactivity of hepatitis C virus NS5B inhibitors. J Comput Chem. 2020;41(20):1820–34.
Article CAS PubMed Google Scholar
Schaduangrat N, Anuwongcharoen N, Charoenkwan P, Shoombuatong W. DeepAR: a novel deep learning-based hybrid framework for the interpretable prediction of androgen receptor antagonists. J Cheminform. 2023;15(1):50.
Article CAS PubMed PubMed Central Google Scholar
Schaduangrat N, Homdee N, Shoombuatong W. StackER: a novel SMILES-based stacked approach for the accelerated and efficient discovery of ERα and ERβ antagonists. Sci Rep. 2023;13(1):22994.
Article CAS PubMed PubMed Central Google Scholar
R.D.C. Team. R: a language and environment for statistical computing, (No Title), 2010.
Charoenkwan P, et al. AMYPred-FRL is a novel approach for accurate prediction of amyloid proteins by using feature representation learning. Sci Rep. 2022;12(1):7697.
Article CAS PubMed PubMed Central Google Scholar
Charoenkwan P, Nantasenamat C, Hasan MM, Moni MA, Manavalan B, Shoombuatong W. StackDPPIV: a novel computational approach for accurate prediction of dipeptidyl peptidase IV (DPP-IV) inhibitory peptides. Methods. 2022;204:189–98.
Article CAS PubMed Google Scholar
Charoenkwan P, Schaduangrat N, Moni MA, Manavalan B, Shoombuatong W. SAPPHIRE: a stacking-based ensemble learning framework for accurate prediction of thermophilic proteins. Comput Biol Med. 2022;146: 105704.
Article CAS PubMed Google Scholar
Shoombuatong W, Homdee N, Schaduangrat N, Chumnanpuen P. Leveraging a meta-learning approach to advance the accuracy of Nav blocking peptides prediction. Sci Rep. 2024;14(1):4463.
Article CAS PubMed PubMed Central Google Scholar
Ahmad S, et al. SCORPION is a stacking-based ensemble learning framework for accurate prediction of phage virion proteins. Sci Rep. 2022;12(1):4106.
Article CAS PubMed PubMed Central Google Scholar
Azadpour M, McKay CM, Smith RL. Estimating confidence intervals for information transfer analysis of confusion matrices. J Acoustical Soc Am. 2014;135(3):EL40–146.
Article Google Scholar
Zhang D, et al. iCarPS: a computational tool for identifying protein carbonylation sites by novel encoded features. Bioinformatics. 2021;37(2):171–7.
Article PubMed Google Scholar
Charoenkwan P, Chumnanpuen P, Schaduangrat N, Oh C, Manavalan B, Shoombuatong W. PSRQSP: an effective approach for the interpretable prediction of quorum sensing peptide using propensity score representation learning. Comput Biol Med. 2023;158: 106784.
Article CAS PubMed Google Scholar
Charoenkwan P, Schaduangrat N, Moni MA, Shoombuatong W, Manavalan B. Computational prediction and interpretation of druggable proteins using a stacked ensemble-learning framework. Iscience. 2022;25(9):104883.
Article CAS PubMed PubMed Central Google Scholar
Pettersen EF, et al. UCSF Chimera–a visualization system for exploratory research and analysis. J Comput Chem. 2004;25(13):1605–12.
Article CAS PubMed Google Scholar
Sanner MF. A component-based software environment for visualizing large macromolecular assemblies. Structure. 2005;13(3):447–62.
Article CAS PubMed Google Scholar
DW Schrödinger L, PyMOL, ed, 2020, p. PyMOL is an open source molecular visualization system
Kauppi B, et al. The three-dimensional structures of antagonistic and agonistic forms of the glucocorticoid receptor ligand-binding domain: RU-486 induces a transconformation that leads to active antagonism. J Biol Chem. 2003;278(25):22748–54.
Article CAS PubMed Google Scholar
Case DA, Aktulga HM, Belfon K, Ben-Shalom IY, Berryman JT, Brozell SR, Cerutti DS, et al. Amber 2022, ed. University of California, San Francisco, 2022
Sencanski M, et al. Identification of SARS-CoV-2 papain-like protease (PLpro) inhibitors using combined computational approach. ChemistryOpen. 2022;11(2): e202100248.
Article CAS PubMed PubMed Central Google Scholar
Yelshanskaya MV, Singh AK, Narangoda C, Williams RSB, Kurnikova MG, Sobolevsky AI. Structural basis of AMPA receptor inhibition by trans-4-butylcyclohexane carboxylic acid. Br J Pharmacol. 2022;179(14):3628–44.
Article CAS PubMed Google Scholar
Jana ID, et al. Targeting an evolutionarily conserved "E-L-L" motif in the spike protein to develop a small molecule fusion inhibitor against SARS-CoV-2, bioRxiv. 2022.
Dolinsky TJ, et al. PDB2PQR: expanding and upgrading automated preparation of biomolecular structures for molecular simulations. Nucleic Acids Res. 2007;35:W522-5.
Article PubMed PubMed Central Google Scholar
Mark P, Nilsson L. Structure and dynamics of the TIP3P, SPC, and SPC/E water models at 298 K. J Phys Chem A. 2001;105(43):9954.
Article CAS Google Scholar
Chari R, Jerath K, Badkar AV, Kalonia DS. Long- and short-range electrostatic interactions affect the rheology of highly concentrated antibody solutions. Pharm Res. 2009;26(12):2607–18.
Article CAS PubMed Google Scholar
Ryckaert JP, Ciccotti G, Berendsen HJ. Numerical integration of the cartesian equations of motion of a system with constraints: molecular dynamics of n-alkanes. J Comput Phys. 1977;23(3):327–41.
Article CAS Google Scholar
Roe DR, Cheatham TE 3rd. PTRAJ and CPPTRAJ: software for processing and analysis of molecular dynamics trajectory data. J Chem Theory Comput. 2013;9(7):3084–95.
Article CAS PubMed Google Scholar
Genheden S, Ryde U. The MM/PBSA and MM/GBSA methods to estimate ligand-binding affinities. Expert Opin Drug Discov. 2015;10(5):449–61.
Article CAS PubMed PubMed Central Google Scholar
Cavalheiro JPDVH, Pires NMM, & Dong T. MM-PBSA: Challenges and opportunities, in 10th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI), Shanghai, China, 2017: IEEE.
Wolber G, Langer T. LigandScout: 3-D pharmacophores derived from protein-bound ligands and their use as virtual screening filters. J Chem Inf Model. 2005;45(1):160–9.
Article CAS PubMed Google Scholar
Lipinski CA. Drug-like properties and the causes of poor solubility and poor permeability. J Pharmacol Toxicol Methods. 2000;44(1):235–49.
Article CAS PubMed Google Scholar
Bemis GW, Murcko MA. The properties of known drugs. 1. Molecular frameworks. J Med Chem. 1996;39(15):2887–93.
Article CAS PubMed Google Scholar
Charoenkwan P, Chumnanpuen P, Schaduangrat N, and Shoombuatong W. Accelerating the identification of the allergenic potential of plant proteins using a stacked ensemble-learning framework. J Biomol Struct Dyn. 2024;1–13.
Simeon S, et al. osFP: a web server for predicting the oligomeric states of fluorescent proteins. J Cheminform. 2016;8:1–15.
Article Google Scholar
Pfaff AR, Beltz J, King E, Ercal N. Medicinal thiols: current status and new perspectives. Mini Rev Med Chem. 2020;20(6):513–29.
Article CAS PubMed PubMed Central Google Scholar
Cazzola M, Calzetta L, Page C, Rogliani P, Matera MG. Thiol-based drugs in pulmonary medicine: much more than mucolytics. Trends Pharmacol Sci. 2019;40(7):452–63.
Article CAS PubMed Google Scholar
Khanna K, et al. Exploring antiviral and anti-inflammatory effects of thiol drugs in COVID-19. Am J Physiol Lung Cell Mol Physiol. 2022;323(3):L372–89.
Article CAS PubMed PubMed Central Google Scholar
National Center for Biotechnology Information (2024). PubChem Taxonomy Summary for Taxonomy 7459, Apis. Available: https://pubchem.ncbi.nlm.nih.gov/taxonomy/Apis
Weiser MJ, Wu TJ, Handa RJ. Estrogen receptor-beta agonist diarylpropionitrile: biological activities of R- and S-enantiomers on behavior and hormonal response to stress. Endocrinology. 2009;150(4):1817–25.
Article CAS PubMed Google Scholar
Kudwa AE, McGivern RF, Handa RJ. Estrogen receptor beta and oxytocin interact to modulate anxiety-like behavior and neuroendocrine stress reactivity in adult male and female rats. Physiol Behav. 2014;129:287–96.
Article CAS PubMed PubMed Central Google Scholar
Suthprasertporn N, Suwanna N, Thangnipon W. Protective effects of diarylpropionitrile against hydrogen peroxide-induced damage in human neuroblastoma SH-SY5Y cells. Drug Chem Toxicol. 2022;45(1):44–51.
Article CAS PubMed Google Scholar
Krishnamurthy N, Hu Y, Siedlak S, Doughman YQ, Watanabe M, Montano MM. Induction of quinone reductase by tamoxifen or DPN protects against mammary tumorigenesis. FASEB J. 2012;26(10):3993–4002.
Article CAS PubMed PubMed Central Google Scholar
Motylewska E, Stasikowska O, Melen-Mucha G. The inhibitory effect of diarylpropionitrile, a selective agonist of estrogen receptor beta, on the growth of MC38 colon cancer line. Cancer Lett. 2009;276(1):68–73.
Article CAS PubMed Google Scholar
Yan VC, Pham CD, Arthur K, Yang KL, Muller FL. Aliphatic amines are viable pro-drug moieties in phosphonoamidate drugs. Bioorg Med Chem Lett. 2020;30(24):127656.
Article CAS PubMed Google Scholar
Kazi AA, Subba Reddy BV, Ravithej Singh L. Synthetic approaches to FDA approved drugs for asthma and COPD from 1969 to 2020. Bioorg Med Chem. 2021;41:116212.
Article CAS PubMed Google Scholar

Download references

Acknowledgements

This project is funded by the National Research Council of Thailand and Mahidol University (N42A660380), and Mahidol University Partnering Initiative under the MU-KMUTT Biomedical Engineering & Biomaterials Research Consortium.

Funding

Open access funding provided by Mahidol University. This project is funded by the National Research Council of Thailand and Mahidol University (N42A660380), and Mahidol University Partnering Initiative under the MU-KMUTT Biomedical Engineering & Biomaterials Research Consortium.

Author information

Authors and Affiliations

Faculty of Medical Technology, Center for Research Innovation and Biomedical Informatics, Mahidol University, Bangkok, 10700, Thailand
Nalini Schaduangrat & Watshara Shoombuatong
Program in Bioinformatics and Computational Biology, Graduate School, Chulalongkorn University, Bangkok, 10330, Thailand
Hathaichanok Chuntakaruk & Thanyada Rungrotmongkol
Faculty of Science, Center of Excellence in Structural and Computational Biology, Chulalongkorn University, Bangkok, 10330, Thailand
Hathaichanok Chuntakaruk & Thanyada Rungrotmongkol
Faculty of Medicine, Center for Artificial Intelligence in Medicine, Chulalongkorn University, Bangkok, Bangkok, 10330, Thailand
Hathaichanok Chuntakaruk
Faculty of Science, Computer Science and Artificial Intelligence, Chandrakasem Rajabhat University, Bangkok, 10900, Thailand
Pakpoom Mookdarsanit

Authors

Nalini Schaduangrat
View author publications
You can also search for this author inPubMed Google Scholar
Hathaichanok Chuntakaruk
View author publications
You can also search for this author inPubMed Google Scholar
Thanyada Rungrotmongkol
View author publications
You can also search for this author inPubMed Google Scholar
Pakpoom Mookdarsanit
View author publications
You can also search for this author inPubMed Google Scholar
Watshara Shoombuatong
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

NS: Design of this study, data collection, formal analysis, drafting the article, data analysis and interpretation, and docking study and analysis. HC: MD simulations, data analysis and interpretation, drafting the article. TR: MD simulations, drafting the article. PM: Data analysis and interpretation. WS: Project administration, supervision, design of this study, methodology, data analysis and interpretation, drafting the article, and critical revision of the article. All authors reviewed and approved the manuscript.

Corresponding author

Correspondence to Watshara Shoombuatong.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary material 1

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Schaduangrat, N., Chuntakaruk, H., Rungrotmongkol, T. et al. M3S-GRPred: a novel ensemble learning approach for the interpretable prediction of glucocorticoid receptor antagonists using a multi-step stacking strategy. BMC Bioinformatics 26, 117 (2025). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12859-025-06132-1

Download citation

Received: 10 January 2025
Accepted: 03 April 2025
Published: 30 April 2025
DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12859-025-06132-1

M3S-GRPred: a novel ensemble learning approach for the interpretable prediction of glucocorticoid receptor antagonists using a multi-step stacking strategy

Abstract

Introduction

Materials and methods

Training and independent test datasets

Feature extraction methods

Overall framework of M3S-GRPred

Performance evaluation

Molecular docking study of FDA-approved drugs

Molecular dynamics (MD) simulations

Results and discussion

Chemical space analysis

Effect of the under-sampling method on prediction performance

Construction of M3S-GRPred

M3S-GRPred outperforms several traditional machine learning-based classifiers

Model interpretation and feature importance analysis

Case study: prospective GR inhibitors from FDA-approved drugs

Identification of potential GR inhibitors

Structural dynamics and binding affinity of screened drugs against GR

Conclusions

Availability of data and materials

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher's Note

Supplementary Information

Supplementary material 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Bioinformatics

Contact us