Skip to main content

Improving data interpretability with new differential sample variance gene set tests

Abstract

Background

Gene set analysis methods have played a major role in generating biological interpretations of omics data such as gene expression datasets. However, most methods focus on detecting homogenous pattern changes in mean expression while methods detecting pattern changes in variance remain poorly explored. While a few studies attempted to use gene-level variance analysis, such approach remains under-utilized. When comparing two phenotypes, gene sets with distinct changes in subgroups under one phenotype are overlooked by available methods although they reflect meaningful biological differences between two phenotypes. Multivariate sample-level variance analysis methods are needed to detect such pattern changes.

Results

We used ranking schemes based on minimum spanning tree to generalize the Cramer-Von Mises and Anderson–Darling univariate statistics into multivariate gene set analysis methods to detect differential sample variance or mean. We characterized the detection power and Type I error rate of these methods in addition to two methods developed earlier using simulation results with different parameters. We applied the developed methods to microarray gene expression dataset of prednisolone-resistant and prednisolone-sensitive children diagnosed with B-lineage acute lymphoblastic leukemia and bulk RNA-sequencing gene expression dataset of benign hyperplastic polyps and potentially malignant sessile serrated adenoma/polyps. One or both of the two compared phenotypes in each of these datasets have distinct molecular subtypes that contribute to within phenotype variability and to heterogeneous differences between two compared phenotypes. Our results show that methods designed to detect differential sample variance provide meaningful biological interpretations by detecting specific hallmark gene sets associated with the two compared phenotypes as documented in available literature.

Conclusions

The results of this study demonstrate the usefulness of methods designed to detect differential sample variance in providing biological interpretations when biologically relevant but heterogeneous changes between two phenotypes are prevalent in specific signaling pathways. Software implementation of the methods is available with detailed documentation from Bioconductor package GSAR. The available methods are applicable to gene expression datasets in a normalized matrix form and could be used with other omics datasets in a normalized matrix form with available collection of feature sets.

Peer Review reports

Background

Gene expression is a quantitative trait translating genetic changes (e.g. SNPs, frameshifts, deletions, splicing disturbing variants, chromosomal aberrations) into dynamic molecular phenotypes. For Mendelian diseases, application of gene expression analysis (RNA-seq) has increased diagnostic rate by 8–36% and improved our understanding of molecular mechanisms of the variants [1,2,3]. Gene expression is the most broadly studied class of molecular data for cancer so far (The Cancer Genome Atlas repository [4]) and has led to development and validation of predictive biomarkers in lung [5], breast [6, 7], colon [8, 9], and other tumor types [10,11,12,13,14]. Transcriptome analysis helps identify gene expression signatures that can predict response to specific cancer therapies [15, 16] and molecular correlates of metastasis [17].

The routine omics data analysis of individual genes (proteins) usually includes the analysis of Differentially Expressed (DE) genes, measuring the mean expression differences between sets of samples representing conditions of biological interests, for example disease and healthy phenotypes. However gradually, it became evident that gene expression variability is also important for understanding molecular mechanisms underlying phenotypic differences. Expression variability has been associated with biological function [18,19,20,21,22], ageing [23,24,25], is implicated in human diseases [20, 26, 27], cancer [28,29,30,31] and has diagnostic and prognostic potential in cancer [28,29,30, 32,33,34]. Moreover, there are genes with consistently higher across-sample variability in tumors of different origin as compared to normal samples [30]. These differentially variable genes (anti-profiles) can serve as a robust molecular signature for multiple cancer types [30, 33]. Notably, heterogeneity is an intrinsic property of cancer cells since the formation of tumor is dependent on the acquisition of oncogenic mutation and further clonal selection through Darwinian-like evolution. Variability itself is a cornerstone of Darwin’s evolutionary theory and a characteristic of cancer cells [35].

Genes and proteins are organized in functional units acting in concert. Phenotypic changes are associated with changes of multiple genes in biological pathways and molecular networks, more often than single gene alterations. In line with these ideas almost two decades ago a new development in DE analysis was Gene Set Analysis (GSA) technique, first implementation known as Gene Set Enrichment Analysis (GSEA) [36], followed by more than 100 GSA methods/tools [37]. The advantages of pathway analysis (or GSA) are well known but it’s worth reiterating: 1) It allows for the incorporation pre-existing biological knowledge into the analysis; 2) it employs a hypothesis-testing framework accounting for differences in various distributional statistics; and 3) it enhances the power of analysis by decreasing the number of hypotheses to test [38]. Gene set analysis methods differ in many aspects, such as univariate/multivariate, underlying assumptions, definition of enrichment, null hypotheses, significance assessment, but share a common feature: They are generally testing ‘average’ difference between pathway’s genes expression under different phenotypes.

Because of the accumulated evidence that initiation and progression of cancers are dependent on the acquisition of multiple driver mutations that activate oncogenic pathways and inactivate tumor suppressor pathways [35], it would be expected that there are many computational approaches allowing one to identify differential variability in gene sets. However, differential variability analysis for individual genes was not immediately followed by the development of gene set approaches. The first attempts to present GSA methods that test for differential sample variance (DV) between two phenotypes was done by our group [38]. We demonstrated that for three different cancer types, DV approach was able to identify cancer-specific pathways, while pathways identified using differential mean approach were shared between the three cancer types. Other GSA approaches claiming to perform differential variability tests, compare variability in gene ranks within a pathway between two phenotypes rather than sample variance [39], transform gene ranks under each individual sample into enrichment scores for gene sets but don’t test for sample variance [40], or detect outlier gene sets with respect to randomly selected control gene sets [41]. These methods test different statistical hypotheses than differential sample variance, hence cannot be directly compared to the methods proposed in this study.

Here we propose to use ranking schemes based on the minimum spanning tree to generalize the Cramer-Von Mises and Anderson–Darling statistics into gene set analysis methods to detect differential sample variance or mean. We characterize the detection power and Type I error rate of these two methods with two additional methods developed based on the same principle using simulation results with different parameters. Finally, we apply our methods to microarray and bulk RNA-sequencing gene expression datasets and demonstrate the biological relevance of the detected Hallmark gene sets by our differential sample variance methods.

Methods

Rank-based statistical methods: We introduce nonparametric GSA methods by generalizing univariate statistics into multivariate methods based on the minimum spanning tree (MST). For a gene set of size p, let n1 and n2 p-dimensional measurements respectively represent samples from phenotypes X and Y. These N = n1 + n2 samples can be represented as vertices of an edge-weighted undirected graph where edge weights represent some distance measure between samples in the Rp space, such as the Euclidean distance. The MST of this graph is the acyclic subset of edges selected from the full set of N(N-1)/2 possible edges in the full graph such that the sum of edge weights is minimal. The distance between any two vertices in the MST correlates with their distance in the Rp space. This property allows the multivariate generalization of rank-based univariate statistical methods by assigning vertices in the MST ranks based on a specific scheme. The test statistic is calculated based on vertex ranks and significance is estimated non-parametrically using a sample permutations approach. The ranking scheme can assign a specific alternative hypothesis such as differential sample mean (DM) or differential sample variance (DV) more detection power. Two examples of MST that demonstrate DM (Fig. 1A) and DV (Fig. 1B) between two phenotypes are shown in Fig. 1. It has been shown that the high directed preorder (HDP) ranking scheme is suitable for detecting DM (µX ≠ µY), while the radial ranking scheme is suitable for detecting DV (σX ≠ σY) [38, 42]. We have applied the MST-based approach to generalize the Kolmogorov–Smirnov statistic into a gene set test to detect DM (KS) or DV (Radial Kolmogorov–Smirnov or RKS) [38]. We have also repurposed the mean deviation statistic used for GSEA to test for DM (MD) and DV (RMD) [43]. We propose here to generalize two new univariate statistics, namely the Cramer-Von Mises and Anderson–Darling, using the MST-based approach to have two new tests of DM (CVM and AD) and two new tests of DV (RCVM and RAD). The four statistics put emphasis on different regions of the deviation between two empirical cumulative distribution functions (CDFs) of two groups of ranked samples in the MST. These four statistics have the potential to capture different patterns of differential variability in gene sets. For instance, CVM assigns the middle range more detection power, while AD assigns more detection power to the tails. We also examine the effect of the MST order (K) on the methods that test for DV. When MST order K = 2, we apply the radial ranking scheme to the union of the first and second MSTs. The second MST is defined as the MST of the full graph after excluding the links of the first MST. Additional MSTs can be added iteratively in a similar way for K > 2. In total, we examine the performance of 16 options, including 4 methods that test for DM (KS, MD, CVM, and AD), and 4 methods that test for DV with K = 1 (RKS1, RMD1, RCVM1, RAD1), K = 2 (RKS2, RMD2, RCVM2, RAD2), and K = 3 (RKS3, RMD3, RCVM3, RAD3).

Fig. 1
figure 1

Examples of minimum spanning tree graphs: A Two phenotypes indicted by different colors are well-separated in the tree indicating differential sample mean, B two phenotypes show differential sample variance where samples from the green phenotype occupy the backbone of the tree while samples from the orange phenotype occupy the peripheries

Kolmogorov–Smirnov (KS): The KS statistic measures the maximum deviation between the CDFs of two groups of ranked samples in the MST. The KS statistic is defined as

$$KS = \sqrt {\frac{{n_{1} n_{2} }}{N}} \mathop {\max }\limits_{i} \left| {\frac{{r_{i} }}{{n_{1} }} - \frac{{s_{i} }}{{n_{2} }}} \right|$$

where ri and si are respectively the number of samples in X and Y ranked lower than i in the MST. The MST-based multivariate generalization of the KS statistic was suggested early in a general context [42] and later implemented and characterized against DM and DV alternative hypotheses in gene expression data [38, 43].

Mean Deviation (MD): A single-sample version of GSEA was used to estimate enrichment scores, calculated as the mean deviation between the empirical CDFs of gene ranks between a gene set of interest and the rest of the genes in the dataset [44]. We successfully repurposed this score to test if the mean deviation between the empirical CDFs of sample ranks of two groups in the MST is significant [43]. The mean deviation statistic is defined as

$$M = \mathop \sum \limits_{i = 1}^{p} \left[ {\frac{{\mathop \sum \nolimits_{j \in X, j \le i} r_{j}^{\rho } }}{{\mathop \sum \nolimits_{j \in X} r_{j}^{\rho } }} - \mathop \sum \limits_{j \in Y, j \le i} \frac{1}{{n_{2} }}} \right]$$

where rj is the rank of sample j in the MST, p is the gene set size, and the exponent ρ is set to 0.25 to give the ranks a modest weight.

Cramer-Von Mises (CVM): This statistic measures the difference between two groups of samples based on the deviation between their empirical CDFs as [45]

$$C = \frac{{n_{1} n_{2} }}{{N^{2} }}\left\{ {\mathop \sum \limits_{i = 1}^{{n_{1} }} \left[ {\frac{{r_{i} }}{{n_{1} }} - i\left( {\frac{1}{{n_{1} }} + \frac{1}{{n_{2} }}} \right)} \right]^{2} + \mathop \sum \limits_{j = 1}^{{n_{2} }} \left[ {\frac{{s_{j} }}{{n_{2} }} - j\left( {\frac{1}{{n_{1} }} + \frac{1}{{n_{2} }}} \right)} \right]^{2} } \right\}$$

where ri and si are respectively the number of samples in X and Y ranked lower than i in the MST.

Anderson–Darling (AD): This statistic measures the deviation between sample ranks of two groups in the MST, allowing higher weight to deviations at the tails of the distribution. The AD statistic can be defined as [46, 47]

$$A = \frac{1}{{n_{1} n_{2} }}\mathop \sum \limits_{i = 1}^{N - 1} \frac{{\left( {r_{i} N - i n_{1} } \right)^{2} }}{{i \left( {N - i} \right)}}$$

where ri is the number of samples in X ranked lower than i in the MST. The AD statistic is a weighted version of the CVM statistic.

Significance estimation: For KS, CVM, and AD, p-value is defined as the proportion of permutations yielding a more extreme permutation statistic stati than the observed statistic statobs or

$$p - value = \frac{{\mathop \sum \nolimits_{i = 1}^{B} I\left[ {stat_{i} \ge stat_{obs} } \right] + 1}}{B + 1}$$

where B is the number of permutations and \(I\left[ . \right]\) is the identity function. For MD, the absolute value of the statistic is used when p-value is estimated or

$$p - value = \frac{{\mathop \sum \nolimits_{i = 1}^{B} I\left[ {\left| {stat_{i} } \right| \ge \left| {stat_{obs} } \right|} \right] + 1}}{B + 1}$$

Simulation setup: We simulate gene expression under two phenotypes X and Y assuming multivariate normal distribution with different parameters. Type I error rate under the null hypothesis (FX = FY) and detection power under the DV (σX ≠ σY) and DM (µX ≠ µY) alternative hypotheses were estimated. To estimate Type I error rate, we generated 1000 non-overlapping gene sets of size p under groups X and Y of equal sample size, n1 = n2 = N/2 from the p-dimensional normal distribution N(0, Ip×p) where Ip×p is a p × p identity matrix. Type I error rate is the proportion of detected gene sets when the null hypothesis is true. Statistical significance was estimated using 1000 permutations for all methods.

To estimate detection power under different scenarios, we introduced four parameters: 1) γ, the proportion of genes in a set with true effect, 2) σ, the fold-change in variance, 3) µ the shift in mean, and 4) r the intergene correlation. Two groups of N/2 samples from p-dimensional normal distributions N(0, Sx) and N(0, Sy) represent two phenotypes were generated. To simulate the DV scenario (σX ≠ σY), the covariance matrices Sx and Sy are p × p positive definite and symmetric matrices. The diagonal elements of Sx are equal to 1 and off-diagonal elements are equal to r. Matrix Sy is defined as

$$S_{y} = \left[ {\begin{array}{*{20}c} A & B \\ C & D \\ \end{array} } \right]$$

where A is a γp × γp matrix with Aij = σ for i = j and Aij =  for i ≠ j, B and C are respectively γp × (1-γ)p and (1-γ)p × γp matrices where Bij = Cij = \(\sqrt \sigma\) r for all i and j, and D is a (1-γ)p × (1-γ)p matrix with Dij = 1 for i = j and Dij = r for i ≠ j. To simulate the DM scenario (µX ≠ µY), diagonal elements of Sx and SY are set 1 and off-diagonal elements are set to r. Since expression data is assumed to be in log-scale, the shift in mean µ is added to γp of the genes in each gene set. We generate 1000 non-overlapping gene sets. We consider the parameters γ = [0.25, 0.5, 0.75], r = [0.1, 0.5], σ = [1, …, 5], µ = [0, …, 2], p = [20, 60, 100], and N = [20, 40, 60].

Childhood B-ALL gene expression dataset: Different cytogenetic translocations in children diagnosed with B-lineage acute lymphoblastic leukemia (B-ALL) produce fusion genes with distinct molecular subtypes of B-ALL, contributing to heterogeneous gene expression differences in some gene sets including those related to glucocorticoids treatment resistance. We aim to compare prednisolone-sensitive and prednisolone-resistant phenotypes in this dataset to highlight gene sets of relevant biological association with the phenotypes. Raw Affymetrix microarray data profiling leukemia cells in children diagnosed with B-ALL [48] was obtained from the Gene Expression Omnibus (GEO) repository [49]. Samples profiling in vitro sensitive (54 samples) and resistant (20 samples) to prednisolone patients were downloaded (GSE655 and GSE656). Samples with biased median normalized unscaled standard error (NUSE) were discarded, leaving 45 sensitive and 16 resistant samples. Raw intensities were normalized using the robust multi-array average (RMA) method and multiple probes measuring the same target mRNA were summarized based on the maximum inter-quartile range (IQR) across samples.

Colon polyps dataset: This dataset of colon polyps represents bulk RNA-seq data profiled benign hyperplastic polyps (HP) and potentially malignant sessile serrated adenoma/polyps (SSP). A subset of SSP may progress to carcinoma while the rest remain in a dormant state. The differences in SSP state may give rise to heterogeneous differences between HP and SSP phenotypes when oncogenic, development, or stress gene sets are tested for differential sample variance. A subset of a colon polyps bulk RNA-seq dataset (GEO accession number GSE76987, [50]) was downloaded including 10 hyperplastic polyps (HP) and 21 sessile serrated adenoma/polyps (SSP) specimens. Preprocessing of raw reads, aligning to the hg19 human genome, and quantifying gene expression were performed following the steps in [51]. Normalized log2(1 + FPKM) values were used for downstream analysis.

Gene sets: Fifty gene sets from the Hallmark collection [52] were downloaded from the molecular signature database (MSigDB) [53]. Hallmark gene sets were derived from founder gene sets that convey specific biological state, process, or signaling pathway, and summarize the most relevant information of their original founder sets. Hallmark gene sets were selected for their biological relevance. Four prednisolone resistance gene sets from the curated collection of the MSigDB were also included to analyze the childhood B-ALL dataset. These four sets were curated from the study that generated the childhood B-ALL dataset [48] and are useful for a sanity check.

Results

Simulation results: Simulations were specifically designed to assess the proposed methods’ ability to detect true positives (detection power) and control the rate of false positives (Type I error rate) under different simulation parameters as detailed in the Methods section. Simulation results demonstrate the proposed methods’ ability to detect specific alternative hypotheses (DV or DM) and their ability to control the rate of false positives. Estimated Type I error rate for all methods at sample size N = 40, gene set size p = 60, and significance level α = 0.05, is shown in Table 1. All of the four statistics and MST order options (K = 1, 2, and 3) control Type I error rate at the specified significance level. Similar Tables for different sample size values (N = 20, 40, and 60) and gene set size values (p = 20, 60, and 100) are provided in Additional File 1 (Tables S1S8).

Table 1 Estimated Type I error rate from simulated data at sample size N = 40 and gene set size p = 60 (α = 0.05)

The detection power estimates when either the DV alternative hypothesis is true (σX ≠ σY) (Fig. 2A and C) or the DM alternative hypothesis is true (µX ≠ µY) (Fig. 2B and D) are shown in Fig. 2 for N = 40, p = 60, γ = 0.5, and r = 0.1. For the methods that were designed to detect DV (RKS, RMD, RCVM, and RAD), the null hypothesis is rejected if they detect significant DV (p-value < 0.05) and their DM counterpart methods (KS, MD, CVM, and AD) fail to detect significant DM (p-value ≥ 0.05). This was done to guard against the non-negligible detection power of DV methods when µX ≠ µY and the shift in mean is large. We examined the effect of changing the MST order on the detection power of methods that were designed to detect DV. The detection power against the variance fold-change when σX ≠ σY for the four methods that detect DV is shown in Fig. 2A. Gain in detection power is achieved when the MST order increases from 1 to 2 and to a lesser extent from 2 to 3. Higher MST orders yield negligible gain or none, hence the presented results consider K = 1, 2, and 3. This is likely due to adding additional links to the union of multiple MSTs when K > 3, that are redundant with respect to the radial ranking scheme which assigns node ranks based on the shortest distance between each node and the root node at the center of the union of multiple MSTs. Methods that were designed to detect DV show a negligible detection power against the mean shift when the alternative hypothesis µX ≠ µY is true (Fig. 2B) regardless of the MST order (K = 1, 2, and 3). To also examined the selectivity of methods designed to detect DV or DM to detect their targeted alternative hypothesis exclusively. Methods that were designed to detect DV (K = 3) showed high detection power against the variance fold-change when the alternative hypothesis σX ≠ σY is true, while methods designed to detect DM showed negligible detection power (Fig. 2C). Similar Figures for N = [20, 40, 60], p = [20, 60, 100], γ = [0.25, 0.5, 0.75], r = [0.1, 0.5], and K = 1, 2, and 3, are shown in Figures S1S18 (Additional File 1). On the other hand, methods designed to detect DM showed high detection power against the mean shift the alternative hypothesis µX ≠ µY is true, while methods designed to detect DV (K = 3) showed negligible detection power (Fig. 2D). We also examined the effects of sample size, gene set seize, intergene correlation, and the proportion of genes in a set with true effect on detection power. The detection power estimates for N = [20, 40, 60], p = [20, 60, 100], γ = [0.25, 0.5, 0.75], r = [0.1, 0.5] are shown in Figures S19S36 (Additional File 1).

Fig. 2
figure 2

Simulated data estimates of detection power against specific alternative hypotheses. Panels A and B show the power when minimum spanning tree order K = 1, 2, and 3. Panel C shows high detection power achieved by methods that used the radial ranking with K = 3 (RKS3, RMD3, RCVM3, and RAD3) and negligible power by methods that used the highdirected preorder ranking (KS, MD, CVM, and AD) against differential variance fold-change σ when σX ≠ σY. Panel D shows high detection power achieved by methods that used the high directed preorder ranking and negligible power by methods that used the radial ranking against differential mean shift µ when µX ≠ µY. Sample size N = 40, gene set size p = 60, proportion of genes with true difference γ = 0.5, and intergene correlation r = 0.1

Childhood B-ALL dataset: Children diagnosed with B-ALL have different cytogenetic translocations that produce fusion genes with distinct molecular subtypes of B-ALL. We hypothesized that these molecular subtypes will contribute to heterogeneous differences in the gene expressions of related gene sets when the prednisolone-sensitive and prednisolone-resistant phenotypes are compared, and that our methods that detect DV could capture such differences in relevant gene sets. We applied our methods that detect DV or DM to compare prednisolone-sensitive and prednisolone-resistant phenotypes. We used 10,000 permutations to estimate significance and gene sets were deemed having significant DV if a DV method returned p-value < 0.05 but not the DM counterpart method (e.g. RAD but not AD). Nine Hallmark gene sets showed significant DV but not DM between prednisolone-sensitive and prednisolone-resistant phenotypes of the childhood B-ALL dataset in at least 3 of the 4 used methods (Fig. 3). We will focus on the interpretation of these 9 gene sets by presenting studies from literature that support the biological relevance of these sets through association with prednisolone/glucocorticoid (GC) resistance, B-ALL, or inflammatory response as detailed below. Five of these 9 set sets are associated with inflammatory-response related signaling pathways, two with immune response, and one with stress, proliferation, and metabolism (Fig. 3).

Fig. 3
figure 3

Hallmark gene sets detected by methods designed to detect differential sample variance (DV) but not differential sample mean (DM) in the childhood B-lineage acute lymphoblastic leukemia (B-ALL) dataset comparing prednisolone-sensitive and prednisolone-resistant patients and in the colon polyps datasets comparing hyperplastic polys and sessile serrated adenoma/polyps. Four prednisolone resistance gene sets from the study that generated the childhood B-ALL dataset were used only with the childhood B-ALL dataset for a sanity check. The association of these gene sets with prednisolone resistance was indicated under column BP_Category. Cells with dark gray color indicate p-value < 0.05. Three annotation columns on the left side indicate the gene sets found to be significant by at least 3 of the 4 used methods for the childhood B-ALL (column DV_B_ALL) and polyps datasets (column DV_polyps), and the biological process category of each gene set (column BP_Category). DV: Differential sample variance, KS: Kolmogorov–Smirnov, MD: Mean deviation, AD: Anderson–Darling, CVM: Cramer Von-Mises, RKS: Radial Kolmogorov–Smirnov, RMD: Radial mean deviation, RAD: Radial Anderson–Darling, RCVM: Radial Cramer Von-Mises

MYC is a proto-oncogene transcription factor targeted by glucocorticoid (GC)-induced repression [54, 55]. The link between MYC suppression and GC-induced apoptosis in human leukemic cells was established long time ago [55]. MYC targets V1 was found significantly enriched with downregulated genes in childhood B-ALL patients with ETV6-RUNX1 fusion [56]. Similar findings were reported in [57], which showed that MYC targets were upregulated in mixed lineage leukemia or MLL-rearranged infant B-ALL but depleted in ETV6-rearranged patients. Another study examined the acquired resistance to DOT1L inhibition in MLL-rearranged B-ALL cells, which leads to partial loss of MLL-fusion driven gene expression. Using GSEA on RNA-seq data, the study detected a few enriched Hallmark gene sets including MYC targets v1 [58].

Inflammatory response in childhood B-ALL occurs either directly or due to the therapy-induced toxicity. A study quantified a panel of cytokines and chemokines from blood serum of children with ALL and reported pro-inflammatory status even in the absence of apparent infection [59]. Another study highlighted cytokine release syndrome (CRS) as the most common therapy-induced toxicity resulting from CART-T therapy where an abundant release of cytokines leads to excessive activation of immune cells, manifesting as a wide range of symptoms including systemic inflammatory response [60]. Polymorphic interferon-gamma (IFN-γ) alleles are associated with prednisone response in childhood B-ALL, suggesting distinct effects of IFN-γ in immunosurveillance and early response to steroid therapy [61]. A study showed that natural killer cells demonstrate downregulation of activating receptors and upregulation of inhibitory receptors with impaired IFN-γ production and cytotoxicity in childhood B-ALL [62]. A more recent study presented the reaction of ALL cell lines and patient-derived xenografts to cytokines tumor necrosis factor alpha (TNF-α) and IFN-γ displayed great heterogeneity in cell death where some samples show a dose-dependent cell death by both cytokines while others show no reaction or an increased viability [63]. It is worth noting that our methods detected DV in both IFN-γ and TNF-α signaling gene sets.

Five highlighted signaling pathways in Fig. 3 were related to immune or inflammatory responses, GC-resistance, oncogenic role, or tumor suppression. Tumor growth factor-β (TGF-β) signaling induces nuclear localization of the aminoacyl-tRNA synthetase-interacting factor 2 (AIMP2) which enhances ubiquitin-dependent degradation of the FUSE-binding protein (FBP), a transcriptional activator of MYC [64]. A retrospective study analyzed gene expression dataset of GC-resistant and GC-sensitive B-ALL infants with MLL-rearrangements and detected a gene module enriched with aminoacyl-tRNA synthetases (ARSs) in the GC-resistant phenotype [65]. The study suggested a failure to repress MYC and initiate GC-induced apoptosis due to increased cellular activity of TGF-β-induced ARSs interacting factors in the GC-resistant phenotype. Another study showed that the TGF-β/SMAD signaling pathway is an important mediator of natural killer cell dysfunction in childhood B-ALL, contributing to evasion of innate immune surveillance and resistance to anti-leukemic cytotoxicity [62]. Another study compared good and poor responders to prednisolone in childhood B-ALL with regard to the potential role of genes EMP1, CASP1, and NLRP3 in prednisolone response [66]. The study showed that genes positively correlated with EMP1 were associated with multiple signaling pathways, including TGF-β and TNF-α signaling pathway. Tumor necrosis factor-α (TNF-α) is a proinflammatory cytokine that initiates immune response in part by activating the transcription factor NF-κB. NF-κB was identified as one of the three transcription factors targeted by GC-induced repression which leads to apoptosis in hematological cells [54]. Many studies have shown the interaction between GC receptor and the p65 subunit of NF-κB [67,68,69,70,71]. Tissing et al. identified 51 transcriptionally regulated genes in childhood ALL cell lines at 8 h post-exposure to prednisolone, including 11 genes related to NF-κB signaling [72]. In vitro studies have shown that NF-κB inhibition induces apoptosis in leukemic stem cells [73]. TNF-α treatment applied to HeLaS3 and COS1 human cell lines led to NF-κB-mediated regulation of two protein isoforms of the GC receptor where the accumulation of the nuclear localized β isoform over the cytoplasmic α form was correlated with GC resistance [74]. However, NF-κB inhibition acts on normal cells and induce inflammatory molecules, particularly TNF-α, increasing sensitivity to cell death signals [75]. TNF-α induces NF-κB-dependent and NF-κB-independent survival signaling, hence promoting proliferation of leukemia cells [76, 77]. A study showed that the inhibition of TNF-α enhanced NF-κB inhibition-induced apoptosis in leukemia cells while protecting healthy hematopoietic and other tissue cells [78]. The promoter region of the estrogen receptor gene is aberrantly methylated in 86% of human hematopoietic tumors, including 8 of 9 pediatric ALL [79]. Gallagher et al. showed that the estrogen-related receptor-β (ESRRB) transcription factor cooperates with the GC receptor to mediate the GC gene expression signature in mouse and human ALL cells [80]. Bardini et al. [81] investigated the effects of bromodomain and extra-terminal (BET) inhabitation in a preclinical mouse model of MLL-rearranged infant ALL and reported in vivo impairment of the leukemic engraftment of patient-derived primary samples in mice. The study also reported that the treatment sensitizes GC-resistant MLL-rearranged cell lines to prednisolone in vitro. Transcriptional change associated with the treatment was enriched in the xenobiotic metabolism and estrogen response (early and late) [81]. Notch signaling can play oncogenic or tumor suppressor roles, depending on cell type [82]. A study highlighted the role of Notch signaling in the stromal cell-dependent protection of B-ALL cells from chemotherapy-induced apoptosis where the inhabitation of Notch signaling in the presence of hydrocortisone dramatically decreased the level of overall live B-ALL cells co-cultured with human bone marrow mesenchymal stromal cells [83]. Additional results supported the enhanced chemo-sensitivity of B-ALL cells via the inhibition of Notch signaling [84]. Multiple Notch signaling pathway genes contain CpG islands in their promoter regions and showed higher and more frequent methylation than normal controls. Among others, the Notch target gene Hes5 was hypermethylated in B-ALL and T-ALL cells compared to normal controls with more frequent and greater methylation in B-ALL compared to T-ALL [85]. Reversing the transcriptional silencing of Hes5 due to promoter methylation led to growth arrest and apoptosis in B-ALL cells [85].

The unfolded protein response (UPR) is an evolutionarily adaptive response that detects the accumulation of unfolded or misfolded proteins and adjusts the protein folding ability of the endoplasmic reticulum (ER) [86]. Studies have suggested that UPR contributes to chemotherapy resistance in B-ALL and lymphoma cells [87, 88]. UPR activates due to ER stress caused by altered cell protein homeostasis, and the activation is mediated by three main sensors: PERK, ATF6α, and IRE1α [89]. These three sensors are kept inactive through binding to the stress sensor 78-kDa glucose-regulated protein (GRP78) [90]. In response to stimuli, GRP78 binds with a higher affinity to unfolded or misfolded proteins within the ER, thereby detach from the three ER-stress sensors [91]. ER stress induction triggers IRE1α to induce a conformational change that activates the RNase which leads to a general downregulation of the ER load via unconventional splicing of the X-box binding protein 1 (XBP1) mRNA [89]. It was shown that IRE1α, XBP1, and GRP78 were upregulated in B-ALL patients at diagnosis and at relapse, and high levels of these proteins predicted a poor outcome [92, 93]. For the Philadelphia chromosome positive (Ph +) pre-B-ALL cells, XBP1 was demethylated and upregulated under the BCR-ABL1 subtype [92]. GRP78 was abundantly expressed and contributed to chemotherapy resistance of leukemic B-cell precursors [93]. These observations have advocated the use of ER stressors as potential drugs for the treatment of B-ALL [89, 92,93,94,95].

Synthetic GCs such as prednisolone are xenobiotics that could have harmful toxic effects if inadequately metabolized by the cytochrome P450 enzymes (CYP) [96]. Clearing of prednisolone from the body occurs primarily by the hepatic metabolism [97], which is part of the xenobiotic metabolism. Studies have associated host genetic polymorphism of xenobiotic metabolizing enzymes with host susceptibility and response to drugs. Krajinovic et al. investigated if CYP2E1, NQO1, and MPO variants modify risk in childhood ALL (predominantly B-ALL) [98] and reported that CYP2E1 and NQO1 variants significantly contributed to the risk, but not MPO variant alone. Interestingly, CYP2E1 and NQO1 are members of the xenobiotic metabolism gene set, but not MPO. The same group showed that variants in CYP1A1 (xenobiotic metabolism gene set member), among other genes, are significant predictors of childhood ALL risk [99], and variants of CYP1A1 and NQO1 were associated with poor prognosis in pediatric ALL [100]. Another study reported risk association for a CYP2E1 variant in both ALL and AML [101]. These studies provide support for associating the xenobiotic metabolism gene set with prednisolone resistance.

Two prednisolone resistance gene sets consisting of up- and down-regulated genes in prednisolone-resistant as compared to prednisolone-sensitive B-ALL patients [48] were found significant by all four methods that detect DM and DV. This is expected since members of these sets are significantly different between the two phenotypes, leading to significant DM. The large difference in mean caused DV methods that show non-negligible detection power to significant DM to return small p-values. However, DV is ignored here since DV is accepted only if the methods that test DM return insignificant p-values. Another two gene sets of up- and down-regulated genes in prednisolone-resistant as compared to prednisolone-sensitive ALL (both B-ALL and T-ALL) patients were found significant by all DM but not DV methods. This is also expected since only a subset of genes in each of these sets is up or down-regulated in the analyzed B-ALL dataset. Half of the genes were common between the two up-regulated gene sets in ALL and B-ALL, and only about quarter of the genes were common between the two down-regulated sets. This led to moderate DM that was detected by mean methods but not by variance methods. Results from these 4 gene sets serve as a sanity check for our methods.

Colon polyps dataset: The potentially malignant sessile serrated adenoma/polyps (SSP) account for 15–30% of colon cancers [102]. These polyps have been distinguished from the benign hyperplastic polyps (HP) by their endoscopic appearance (larger, flat and hypermucinous) and histologic characteristics (dilated crypts, horizontal crypts, and boot shaped deformities) [103,104,105,106]. Similarities between SSP and HP often led to low consensus in classification, especially for borderline phenotypes. Studies have shown that considerable proportions of samples that were classified as HPs were probably SSPs [107, 108]. We hypothesized that the heterogeneity of the SSP phenotype will lead to molecular variability in relevant gene sets that could be detected using our methods that were designed to detect significant DV. Therefore, we focus on the interpretation of gene sets detected by the majority of methods that detect DV by presenting studies from literature that link them to homeostasis of the intestinal epithelium, tumor initiation and progression, or colon cancer. We repeated the steps performed for the childhood B-ALL datasets to detect Hallmark gene sets that show significant DV by at least 3 of the 4 used methods. Our DV methods detected five gene sets (Fig. 3), including three canonical oncogenic pathways: Apoptosis, reactive oxygen species (ROS), and epithelial mesenchymal transition (EMT). The first two are related to stress and the third to development. Detecting significant DV in the three oncogenic gene sets reflect the fact that a subset of SSPs may progress to carcinoma while the rest remain in dormant state. Balanced cell death through apoptosis is necessary to maintain the homeostasis of colorectal mucosa, and defective regulation of apoptosis is one factor that lead to the progression of an adenoma to carcinoma [109]. Reactive oxygen species (ROS) play a critical role in tumorigenesis by regulating signaling pathways that drive proliferation, evasion of apoptosis, migration, and malignant progression [110, 111]. ROS promote the EMT process [112, 113], which has also been detected by our DV methods. ROS production and NF-κB activation triggered by Ras-related C3 botulinum toxin substrate 1 is critical for the initiation of CRC [114]. Zhou et al. used single-cell RNA-sequencing to explore the early molecular alterations underlying the serrated neoplasia pathway towards CRC and suggested that upregulation of SERPINB6 in the epithelia of serrated lesions can promote serrated tumorigenesis via modulating ROS levels [115]. Kostic et al. examined the influence of gut microbiota on signaling pathways in colorectal serrated neoplasia and highlighted Fusobacterium Nucleatum, a bacterium that increases the production of ROS, possibly by selectively recruiting myeloid-derived immune cells [116], which promote carcinogenesis in CRC through oxidative metabolism [117].

The EMT process is necessary to maintain the homeostasis in adult tissue and involves epithelial cells acquiring a mesenchymal phenotype with increased motility, promoting the spread of tumor cells [118]. Inflammation-activated human mesenchymal stem cells (MSCs) promote the EMT process and progression of colorectal cancer (CRC) through the CCL5/β-catenin/Slug pathway [119]. The neural cell adhesion immunoglobin-like molecule L1CAM is a target of the Wnt-β-catenin signaling that promote stemness [120], leading to the elevated transcription of markers of intestinal and colonic epithelial stem cells [121, 122]. Detection the Wnt-β-catenin signaling by our methods supports the MSC-mediated changes in the EMT process. Wnt-β-catenin signaling is essential for the homeostasis of the intestinal epithelium [123] and plays a pivotal role in tumor initiation and development through involvement in cell survival, proliferation, apoptosis and autophagy [124]. Modifications and degradation of β-catenin are key events in the Wnt signaling pathway and the development and progression of colon cancer [125]. Previous studies support the detection of the Wnt-β-catenin signaling using our DV methods by highlighting the wide variability in β-catenin leading to conflicting results on Wnt signaling activation in SSPs [126]. Detecting significant DV in the spermatogenesis gene set is an indirect result for the defective regulation in the Wnt-β-catenin signaling which plays a major role in normal reproductive tract development, male germ cell development, and spermatogenesis [127,128,129,130,131].

Discussion

This study demonstrates a new development for gene set analysis methods, namely the analysis of gene set variability between phenotypes with different test statistics. We employ MST-based node ranking approaches to generalize two univariate statistics into new multivariate gene sets analysis methods that detect differential sample variance or mean between two phenotypes. The study shows the ability of the new methods to detect specific alternative hypotheses through simulations, as well as real gene expression data.

Our simulation results show that all four methods and MST order options (K = 1, 2, and 3) control Type I error rate at the specified significant level (α = 0.05). All methods demonstrated the desired behavior by detecting specific alternative hypotheses (DM: µX ≠ µY or DV: σX ≠ σY). The detection power under different parameter settings (Additional File 1) reveals a few patterns: First, increased sample size leads to increased detection power as expected in nonparametric methods. Second, increased intergene correlation slightly decreases detection power against DV but not DM. This is expected since a uniform increase in intergene correlation reduces variability between samples regardless of the value of variance fold-change. This is not a serious concern since the scenario of having uniform and extremely high values of intergene correlation in a gene set is hardly realistic. Third, all four methods perform similarly with RKS showing slightly lower detection power than RMD, RCVM, and RAD for small sample size. This minor difference becomes negligible when sample size increases.

Two real datasets showed that the Hallmark gene sets detected by the majority of differential variance methods are associated with the two compared phenotypes with support provided in literature. The four methods showed agreements and minor disagreements in the returned significance levels since they put emphasis on different regions of the deviation between two empirical CDFs of two groups of ranked samples in the MST. For instance, 46 and 44 out of 50 Hallmark gene sets tested respectively in the childhood B-ALL and colon polyps dataset were detected by all, majority, or none of the four methods that were designed to detect DV. We demonstrate the minor differences in the Interferon Gamma Response and KRAS Signaling Up gene sets using the childhood B-ALL dataset (Fig. 4). The MSTs (Fig. 4A, D), estimated the deviation between CDFs of two phenotypes when samples are HDP-ranked (Fig. 4B, D) or radial-ranked (Fig. 4C, F) in the MST. The curves of AD (and RAD) and MD (and RMD) were respectively normalized by n1 × n2 and N = n1 + n2 to scale them to a similar range to that of the KS (and RKS) curve and allow better visualization. For the Interferon Gamma Response, we detected significant DV (Fig. 4C) but not DM (Fig. 4B) using all four methods. The RKS statistic indicated by the maximum value of the RKS curve (marked by the black arrow), the RMD statistic indicated by the dashed green line, and the RAD statistic indicated by the area under the blue curve are larger in Fig. 4C as compared to Fig. 4B. For the KRAS Signaling Up, only RKS detects significant DV, while AD and MD detect significant DM. Observations from gene sets show that large differences between two phenotypes mostly lead to agreement among the four methods, while small differences sometimes lead to disagreement between methods.

Fig. 4
figure 4

Estimated deviation by different statistics in the Interferon Gamma Response (panels A, B, and C) and the KRAS Signaling Up (panels D, E, and F) gene sets in the childhood B-ALL dataset: Minimum Spanning Tree graph (A and D), estimated deviation between CDFs of two groups of HDP-ranked samples in the MST (B and E), and estimated deviation between CDFs of two groups of radial-ranked samples in the MST (C and F)

Conclusions

We have developed novel gene set analysis methods designed to detect differential sample mean or variance between groups in gene sets. We characterized these methods using simulated gene expression data and demonstrated the usefulness of methods designed to detect differential sample variance in providing biological interpretations when biologically relevant but heterogeneous changes between groups are prevalent in specific signaling pathways. We demonstrated the ability of these methods to detect heterogeneous changes in meaningful signaling pathways associated with compared groups in two real datasets: 1) prednisolone-resistant vs. prednisolone-sensitive children diagnosed with B-ALL, and 2) benign hyperplastic polyps vs. potentially malignant sessile serrated adenoma/polyps. Such changes render some signaling pathways undetectable by most GSA methods designed to detect differential mean. The study presents two novel multivariate generalizations of the Cramer-Von Mises and Anderson–Darling tests based on sample ranking schemes in the minimum spanning tree. Software implementation of the two new methods in addition to two methods developed earlier is available from Bioconductor package GSAR. The four methods complement each other by using test statistics that put emphasis on distinct patterns of change depending on the distribution of samples in a minimum spanning tree. Using a majority vote rule or a p-value combination scheme adds additional confidence in results. The available methods are easily applicable not only to transcriptomics and proteomics datasets, but also for any omics datasets in a normalized matrix form with available collection of gene sets.

Availability of data and materials

All the datasets used in this study are publicly available and accessible from the Gene Expression Omnibus repository under accession numbers GSE655, GSE656, and GSE76987. All the performed methods in this study are available publicly with reference manual, code examples, and vignette documentation in package GSAR from the Bioconductor repository https://bioconductor.org/packages/release/bioc/html/GSAR.html.

Abbreviations

AD:

Anderson-Darling

B-ALL:

B-lineage acute lymphoblastic leukemia

CDF:

Cumulative distribution function

CVM:

Cramer-Von Mises

DM:

Differential sample mean

DV:

Differential sample variance

GC:

Glucocorticoids

GEO:

Gene expression omnibus

GSA:

Gene set analysis

GSAR:

Gene set analysis in R

GSEA:

Gene set enrichment analysis

HDP:

High directed preorder

HP:

Hyperplastic polyps

KS:

Kolmogorov-Smirnov

MD:

Mean deviation

MSigDB:

Molecular signature database

MST:

Minimum spanning tree

RAD:

Radial Anderson–Darling

RCVM:

Radial Cramer-Von Mises

RKS:

Radial Kolmogorov–Smirnov

RMA:

Robust multi-array average

RMD:

Radial mean deviation

SSP:

Sessile serrated adenoma/polyps

References

  1. Fresard L, Smail C, Ferraro NM, Teran NA, Li X, Smith KS, Bonner D, Kernohan KD, Marwaha S, Zappala Z, et al. Identification of rare-disease genes using blood transcriptome sequencing and large control cohorts. Nat Med. 2019;25(6):911–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Kremer LS, Bader DM, Mertes C, Kopajtich R, Pichler G, Iuso A, Haack TB, Graf E, Schwarzmayr T, Terrile C, et al. Genetic diagnosis of Mendelian disorders via RNA sequencing. Nat Commun. 2017;8:15824.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Cummings BB, Marshall JL, Tukiainen T, Lek M, Sandra Donkervoort A, Foley R, Bolduc V, Waddell LB, Sandaradura SA, O’Grady GL, et al. Improving genetic diagnosis in mendelian disease with transcriptome sequencing. Sci Trans Med. 2017. https://doiorg.publicaciones.saludcastillayleon.es/10.1126/scitranslmed.aal5209s.

    Article  Google Scholar 

  4. Weinstein JN, Collisson EA, Mills GB, Mills KR, Shaw BA, Ozenberger KE, Shmulevich I, Sander C, Stuart JM. The cancer genome atlas pan-cancer analysis project. Nat Genet. 2013;45(10):1113–20. https://doiorg.publicaciones.saludcastillayleon.es/10.1038/ng.2764.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Tang H, Wang S, Xiao G, Schiller J, Papadimitrakopoulou V, Minna J, Wistuba II, Xie Y. Comprehensive evaluation of published gene expression prognostic signatures for biomarker-based lung cancer clinical studies. Ann Oncol. 2017;28(4):733–40.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Liu MC, Pitcher BN, Mardis ER, Davies SR, Friedman PN, Snider JE, Vickery TL, Reed JP, DeSchryver K, Singh B, et al. PAM50 gene signatures and breast cancer prognosis with adjuvant anthracycline- and taxane-based chemotherapy: correlative analysis of C9741 (Alliance). NPJ Breast Cancer. 2016;2:15023.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Cardoso F, van’t Veer LJ, Bogaerts J, Slaets L, Viale G, Delaloge S, Pierga JY, Brain E, Causeret S, DeLorenzi M, et al. 70-gene signature as an aid to treatment decisions in early-stage breast cancer. N Engl J Med. 2016;375(8):717–29.

    Article  CAS  PubMed  Google Scholar 

  8. Srivastava G, Renfro LA, Behrens RJ, Lopatin M, Chao C, Soori GS, Dakhil SR, Mowat RB, Kuebler JP, Kim G, et al. Prospective multicenter study of the impact of oncotype DX colon cancer assay results on treatment recommendations in stage II colon cancer patients. Oncologist. 2014;19(5):492–7.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Maak M, Simon I, Nitsche U, Roepman P, Snel M, Glas AM, Schuster T, Keller G, Zeestraten E, Goossens I, et al. Independent validation of a prognostic genomic signature (ColoPrint) for patients with stage II colon cancer. Ann Surg. 2013;257(6):1053–8.

    Article  PubMed  Google Scholar 

  10. Wang B, Wan F, Sheng H, Zhu Y, Shi G, Zhang H, Dai B, Shen Y, Zhu Y, Ye D. Identification and validation of an 18-gene signature highly-predictive of bladder cancer metastasis. Sci Rep. 2018;8(1):374.

    Article  PubMed  PubMed Central  Google Scholar 

  11. Smyth EC, Nyamundanda G, Cunningham D, Fontana E, Ragulan C, Tan IB, Lin SJ, Wotherspoon A, Nankivell M, Fassan M, et al. A seven-Gene Signature assay improves prognostic risk stratification of perioperative chemotherapy treated gastroesophageal cancer patients from the MAGIC trial. Ann Oncol. 2018;29(12):2356–62.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Schwartz GW, Petrovic J, Zhou Y, Faryabi RB. Differential integration of transcriptome and proteome identifies pan-cancer prognostic biomarkers. Front Genet. 2018;9:205.

    Article  PubMed  PubMed Central  Google Scholar 

  13. Fountzilas E, Markou K, Vlachtsis K, Nikolaou A, Arapantoni-Dadioti P, Ntoula E, Tassopoulos G, Bobos M, Konstantinopoulos P, Fountzilas G, et al. Identification and validation of gene expression models that predict clinical outcome in patients with early-stage laryngeal cancer. Ann Oncol. 2012;23(8):2146–53.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Konstantinopoulos PA, Cannistra SA, Fountzilas H, Culhane A, Pillay K, Rueda B, Cramer D, Seiden M, Birrer M, Coukos G, et al. Integrated analysis of multiple microarray datasets identifies a reproducible survival predictor in ovarian cancer. PLoS ONE. 2011;6(3): e18202.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Cao B, Luo L, Feng L, Ma S, Chen T, Ren Y, Zha X, Cheng S, Zhang K, Chen C. A network-based predictive gene-expression signature for adjuvant chemotherapy benefit in stage II colorectal cancer. BMC Cancer. 2017;17(1):844.

    Article  PubMed  PubMed Central  Google Scholar 

  16. Bertucci F, Ueno NT, Finetti P, Vermeulen P, Lucci A, Robertson FM, Marsan M, Iwamoto T, Krishnamurthy S, Masuda H, et al. Gene expression profiles of inflammatory breast cancer: correlation with response to neoadjuvant chemotherapy and metastasis-free survival. Ann Oncol. 2014;25(2):358–65.

    Article  CAS  PubMed  Google Scholar 

  17. Chen F, Zhang Y, Varambally S, Creighton CJ. Molecular correlates of metastasis by systematic pan-cancer analysis across the cancer genome atlas. Mol Cancer Res. 2019;17(2):476–87.

    Article  CAS  PubMed  Google Scholar 

  18. Alemu EY, Carl JW Jr, Corrada Bravo H, Hannenhalli S. Determinants of expression variability. Nucleic Acids Res. 2014;42(6):3503–14.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Eisenberg E, Levanon EY. Human housekeeping genes, revisited. Trends Genet. 2013;29(10):569–74.

    Article  CAS  PubMed  Google Scholar 

  20. Mar JC, Matigian NA, Mackay-Sim A, Mellick GD, Sue CM, Silburn PA, McGrath JJ, Quackenbush J, Wells CA. Variance of gene expression identifies altered network constraints in neurological disease. PLoS Genet. 2011;7(8): e1002207.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Komurov K, Ram PT. Patterns of human gene expression variance show strong associations with signaling network hierarchy. BMC Syst Biol. 2010;4:154.

    Article  PubMed  PubMed Central  Google Scholar 

  22. Cheung VG, Conlin LK, Weber TM, Arcaro M, Jen KY, Morley M, Spielman RS. Natural variation in human gene expression assessed in lymphoblastoid cells. Nat Genet. 2003;33(3):422–5.

    Article  CAS  PubMed  Google Scholar 

  23. Vinuela A, Brown AA, Buil A, Tsai PC, Davies MN, Bell JT, Dermitzakis ET, Spector TD, Small KS. Age-dependent changes in mean and variance of gene expression across tissues in a twin cohort. Hum Mol Genet. 2018;27(4):732–41.

    Article  CAS  PubMed  Google Scholar 

  24. Somel M, Khaitovich P, Bahn S, Paabo S, Lachmann M. Gene expression becomes heterogeneous with age. Curr Biol. 2006;16(10):R359-360.

    Article  CAS  PubMed  Google Scholar 

  25. Bahar R, Hartmann CH, Rodriguez KA, Denny AD, Busuttil RA, Dolle ME, Calder RB, Chisholm GB, Pollock BH, Klein CA, et al. Increased cell-to-cell variation in gene expression in ageing mouse heart. Nature. 2006;441(7096):1011–4.

    Article  CAS  PubMed  Google Scholar 

  26. Zhang F, Yao Shugart Y, Yue W, Cheng Z, Wang G, Zhou Z, Jin C, Yuan J, Liu S, Xu Y. Increased variability of genomic transcription in schizophrenia. Sci Rep. 2015;5:17995.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Ho JW, Stefani M, dos Remedios CG, Charleston MA. Differential variability analysis of gene expression and its application to human diseases. Bioinformatics. 2008;24(13):i390-398.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Ecker S, Pancaldi V, Rico D, Valencia A. Higher gene expression variability in the more aggressive subtype of chronic lymphocytic leukemia. Genome Med. 2015;7(1):8.

    Article  PubMed  PubMed Central  Google Scholar 

  29. Gorlov IP, Byun J, Zhao H, Logothetis CJ, Gorlova OY. Beyond comparing means: the usefulness of analyzing interindividual variation in gene expression for identifying genes associated with cancer development. J Bioinform Comput Biol. 2012;10(2):1241013.

    Article  PubMed  PubMed Central  Google Scholar 

  30. Bravo HC, Pihur V, McCall M, Irizarry RA, Leek JT. Gene expression anti-profiles as a basis for accurate universal cancer signatures. BMC Bioinformatics. 2012;13:272.

    Article  PubMed  Google Scholar 

  31. Yu K, Ganesan K, Tan LK, Laban M, Wu J, Zhao XD, Li H, Leung CH, Zhu Y, Wei CL, et al. A precisely regulated gene expression cassette potently modulates metastasis and survival in multiple solid cancers. PLoS Genet. 2008;4(7): e1000129.

    Article  PubMed  PubMed Central  Google Scholar 

  32. Strbenac D, Mann GJ, Yang JY, Ormerod JT. Differential distribution improves gene selection stability and has competitive classification performance for patient survival. Nucleic Acids Res. 2016;44(13): e119.

    Article  PubMed  PubMed Central  Google Scholar 

  33. Dinalankara W, Bravo HC. Gene expression signatures based on variability can robustly predict tumor progression and prognosis. Cancer Inform. 2015;14:71–81.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Gorlov IP, Yang JY, Byun J, Logothetis C, Gorlova OY, Do KA, Amos C. How to get the most from microarray data: advice from reverse genomics. BMC Genomics. 2014;15:223.

    Article  PubMed  PubMed Central  Google Scholar 

  35. Marusyk A, Almendro V, Polyak K. Intra-tumour heterogeneity: A looking glass for cancer? Nat Rev Cancer. 2012;12(5):323–34.

    Article  CAS  PubMed  Google Scholar 

  36. Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA. 2005;102(43):15545–50.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Maleki F, Ovens K, Hogan DJ, Kusalik AJ. Gene set analysis: challenges, opportunities, and future research. Front Genet. 2020;11:654.

    Article  PubMed  PubMed Central  Google Scholar 

  38. Rahmatallah Y, Emmert-Streib F, Glazko G. Gene set analysis for self-contained tests: complex null and specific alternative hypotheses. Bioinformatics. 2012;28(23):3073–80.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Afsari B, Geman D, Fertig EJ. Learning dysregulated pathways in cancers from differential variability analysis. Cancer inform. 2014;13(Suppl 5):61–7.

    PubMed  PubMed Central  Google Scholar 

  40. Hanzelmann S, Castelo R, Guinney J. GSVA: gene set variation analysis for microarray and RNA-seq data. BMC Bioinform. 2013;14:7.

    Article  Google Scholar 

  41. Zeng Y, Wang G, Yang E, Ji G, Brinkmeyer-Langford CL, Cai JJ. Aberrant gene expression in humans. PLoS Genet. 2015;11(1): e1004942.

    Article  PubMed  PubMed Central  Google Scholar 

  42. Friedman JH, Rafsky LC. Multivariate generalizations of the Wald-Wolfowitz and Smirnov 2-sample tests. Ann Stat. 1979;7(4):697–717.

    Article  Google Scholar 

  43. Rahmatallah Y, Zybailov B, Emmert-Streib F, Glazko G. GSAR: Bioconductor package for Gene Set analysis in R. BMC Bioinform. 2017;18(1):61.

    Article  Google Scholar 

  44. Barbie DA, Tamayo P, Boehm JS, Kim SY, Moody SE, Dunn IF, Schinzel AC, Sandy P, Meylan E, Scholl C, et al. Systematic RNA interference reveals that oncogenic KRAS-driven cancers require TBK1. Nature. 2009;462(7269):108–12.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Anderson TW. Distribution of two-sample Cramer-von Mises criterion. Ann Math Stat. 1962;33(3):1148–59.

    Article  Google Scholar 

  46. Scholz FW, Stephens MA. K-sample Anderson-darling tests. J Am Stat Assoc. 1987;82(399):918–24.

    Google Scholar 

  47. Pettitt AN. Two-sample Anderson-darling rank statistic. Biometrika. 1976;63(1):161–8.

    Google Scholar 

  48. Holleman A, Cheok MH, den Boer ML, Yang W, Veerman AJ, Kazemier KM, Pei D, Cheng C, Pui CH, Relling MV, et al. Gene-expression patterns in drug-resistant acute lymphoblastic leukemia cells and response to treatment. N Engl J Med. 2004;351(6):533–42.

    Article  CAS  PubMed  Google Scholar 

  49. Edgar R, Domrachev M, Lash AE. Gene expression omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 2002;30(1):207–10.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Kanth P, Bronner MP, Boucher KM, Burt RW, Neklason DW, Hagedorn CH, Delker DA. Gene signature in sessile serrated polyps identifies colon cancer subtype. Cancer Prev Res (Phila). 2016;9(6):456–65.

    Article  CAS  PubMed  Google Scholar 

  51. Rahmatallah Y, Khaidakov M, Lai KK, Goyne HE, Lamps LW, Hagedorn CH, Glazko G. Platform-independent gene expression signature differentiates sessile serrated adenomas/polyps and hyperplastic polyps of the colon. BMC Med Genomics. 2017;10(1):81.

    Article  PubMed  PubMed Central  Google Scholar 

  52. Liberzon A, Birger C, Thorvaldsdottir H, Ghandi M, Mesirov JP, Tamayo P. The molecular signatures database (MSigDB) hallmark gene set collection. Cell Syst. 2015;1(6):417–25.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Liberzon A, Subramanian A, Pinchback R, Thorvaldsdottir H, Tamayo P, Mesirov JP. Molecular signatures database (MSigDB) 3.0. Bioinformatics. 2011;27(12):1739–40.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Greenstein S, Ghias K, Krett NL, Rosen ST. Mechanisms of glucocorticoid-mediated apoptosis in hematological malignancies. Clin Cancer Res. 2002;8(6):1681–94.

    CAS  PubMed  Google Scholar 

  55. Thulasi R, Harbour DV, Thompson EB. Suppression of c-myc is a critical step in glucocorticoid-induced human leukemic cell lysis. J Biol Chem. 1993;268(24):18306–12.

    Article  CAS  PubMed  Google Scholar 

  56. Wray JP, Deltcheva EM, Boiers C, Richardson Scapital Ie C, Chhetri JB, Brown J, Gagrica S, Guo Y, Illendula A, Martens JHA, et al. Regulome analysis in B-acute lymphoblastic leukemia exposes Core Binding Factor addiction as a therapeutic vulnerability. Nat Commun. 2022;13(1):7124.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Closa A, Reixachs-Sole M, Fuentes-Fayos AC, Hayer KE, Melero JL, Adriaanse FRS, Bos RS, Torres-Diz M, Hunger SP, Roberts KG, et al. A convergent malignant phenotype in B-cell acute lymphoblastic leukemia involving the splicing factor SRRM1. NAR Cancer. 2022;4(4):zcac041.

    Article  PubMed  PubMed Central  Google Scholar 

  58. Schneider P, Crump NT, Arentsen-Peters S, Smith AL, Hagelaar R, Adriaanse FRS, Bos RS, de Jong A, Nierkens S, Koopmans B, et al. Modelling acquired resistance to DOT1L inhibition exhibits the adaptive potential of KMT2A-rearranged acute lymphoblastic leukemia. Exp Hematol Oncol. 2023;12(1):81.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Perez-Figueroa E, Sanchez-Cuaxospa M, Martinez-Soto KA, Sanchez-Zauco N, Medina-Sanson A, Jimenez-Hernandez E, Torres-Nava JR, Felix-Castro JM, Gomez A, Ortega E, et al. Strong inflammatory response and Th1-polarization profile in children with acute lymphoblastic leukemia without apparent infection. Oncol Rep. 2016;35(5):2699–706.

    Article  CAS  PubMed  Google Scholar 

  60. Lejman M, Kusmierczuk K, Bednarz K, Ostapinska K, Zawitkowska J. Targeted therapy in the treatment of pediatric acute lymphoblastic leukemia-therapy and toxicity mechanisms. Int J Mol Sci. 2021;22(18):9827.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  61. Cloppenborg T, Stanulla M, Zimmermann M, Schrappe M, Welte K, Klein C. Immunosurveillance of childhood ALL: polymorphic interferon-gamma alleles are associated with age at diagnosis and clinical risk groups. Leukemia. 2005;19(1):44–8.

    Article  CAS  PubMed  Google Scholar 

  62. Rouce RH, Shaim H, Sekine T, Weber G, Ballard B, Ku S, Barese C, Murali V, Wu MF, Liu H, et al. The TGF-beta/SMAD pathway is an important mechanism for NK cell immune evasion in childhood B-acute lymphoblastic leukemia. Leukemia. 2016;30(4):800–11.

    Article  CAS  PubMed  Google Scholar 

  63. Schober S, Rottenberger JM, Hilz J, Schmid E, Ebinger M, Feuchtinger T, Handgretinger R, Lang P, Queudeville M. Th1 cytokines in pediatric acute lymphoblastic leukemia. Cancer Immunol Immunother. 2023;72(11):3621–34.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Kim MJ, Park BJ, Kang YS, Kim HJ, Park JH, Kang JW, Lee SW, Han JM, Lee HW, Kim S. Downregulation of FUSE-binding protein and c-myc by tRNA synthetase cofactor p38 is required for lung cell differentiation. Nat Genet. 2003;34(3):330–6.

    Article  CAS  PubMed  Google Scholar 

  65. Mousavian Z, Nowzari-Dalini A, Rahmatallah Y, Masoudi-Nejad A. Differential network analysis and protein-protein interaction study reveals active protein modules in glucocorticoid resistance for infant acute lymphoblastic leukemia. Mol Med. 2019;25(1):36.

    Article  PubMed  PubMed Central  Google Scholar 

  66. Singh J, Kumari S, Arora M, Verma D, Palanichamy JK, Kumar R, Sharma G, Bakhshi S, Pushpam D, Ali MS, et al. Prognostic Relevance of Expression of EMP1, CASP1, and NLRP3 Genes in Pediatric B-Lineage Acute Lymphoblastic Leukemia. Front Oncol. 2021;11: 606370.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. Hudson WH, Vera IMS, Nwachukwu JC, Weikum ER, Herbst AG, Yang Q, Bain DL, Nettles KW, Kojetin DJ, Ortlund EA. Cryptic glucocorticoid receptor-binding sites pervade genomic NF-kappaB response elements. Nat Commun. 2018;9(1):1337.

    Article  PubMed  PubMed Central  Google Scholar 

  68. Bladh LG, Liden J, Dahlman-Wright K, Reimers M, Nilsson S, Okret S. Identification of endogenous glucocorticoid repressed genes differentially regulated by a glucocorticoid receptor mutant able to separate between nuclear factor-kappaB and activator protein-1 repression. Mol Pharmacol. 2005;67(3):815–26.

    Article  CAS  PubMed  Google Scholar 

  69. Nissen RM, Yamamoto KR. The glucocorticoid receptor inhibits NFkappaB by interfering with serine-2 phosphorylation of the RNA polymerase II carboxy-terminal domain. Genes Dev. 2000;14(18):2314–29.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  70. Scheinman RI, Gualberto A, Jewell CM, Cidlowski JA, Baldwin AS Jr. Characterization of mechanisms involved in transrepression of NF-kappa B by activated glucocorticoid receptors. Mol Cell Biol. 1995;15(2):943–53.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  71. Ray A, Prefontaine KE. Physical association and functional antagonism between the p65 subunit of transcription factor NF-kappa B and the glucocorticoid receptor. Proc Natl Acad Sci U S A. 1994;91(2):752–6.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  72. Tissing WJ, den Boer ML, Meijerink JP, Menezes RX, Swagemakers S, van der Spek PJ, Sallan SE, Armstrong SA, Pieters R. Genomewide identification of prednisolone-responsive genes in acute lymphoblastic leukemia cells. Blood. 2007;109(9):3929–35.

    Article  CAS  PubMed  Google Scholar 

  73. Guzman ML, Neering SJ, Upchurch D, Grimes B, Howard DS, Rizzieri DA, Luger SM, Jordan CT. Nuclear factor-kappaB is constitutively activated in primitive human acute myelogenous leukemia cells. Blood. 2001;98(8):2301–7.

    Article  CAS  PubMed  Google Scholar 

  74. Webster JC, Oakley RH, Jewell CM, Cidlowski JA. Proinflammatory cytokines regulate human glucocorticoid receptor gene expression and lead to the accumulation of the dominant negative beta isoform: a mechanism for the generation of glucocorticoid resistance. Proc Natl Acad Sci U S A. 2001;98(12):6865–70.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  75. Igarashi H, Baba Y, Nagai Y, Jimi E, Ghosh S, Kincade PW. NF-kappaB is dispensable for normal lymphocyte development in bone marrow but required for protection of progenitors from TNFalpha. Int Immunol. 2006;18(5):653–9.

    Article  CAS  PubMed  Google Scholar 

  76. Kagoya Y, Yoshimi A, Kataoka K, Nakagawa M, Kumano K, Arai S, Kobayashi H, Saito T, Iwakura Y, Kurokawa M. Positive feedback between NF-kappaB and TNF-alpha promotes leukemia-initiating cell capacity. J Clin Invest. 2014;124(2):528–42.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  77. Lee YR, Yu HN, Noh EM, Youn HJ, Song EK, Han MK, Park CS, Kim BS, Park YS, Park BK, et al. TNF-alpha upregulates PTEN via NF-kappaB signaling pathways in human leukemic cells. Exp Mol Med. 2007;39(1):121–7.

    Article  CAS  PubMed  Google Scholar 

  78. Dong QM, Ling C, Chen X, Zhao LI. Inhibition of tumor necrosis factor-alpha enhances apoptosis induced by nuclear factor-kappaB inhibition in leukemia cells. Oncol Lett. 2015;10(6):3793–8.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  79. Issa JP, Zehnbauer BA, Civin CI, Collector MI, Sharkis SJ, Davidson NE, Kaufmann SH, Baylin SB. The estrogen receptor CpG island is methylated in most hematopoietic neoplasms. Cancer Res. 1996;56(5):973–7.

    CAS  PubMed  Google Scholar 

  80. Gallagher KM, Roderick JE, Tan SH, Tan TK, Murphy L, Yu J, Li R, O’Connor KW, Zhu J, Green MR, et al. ESRRB regulates glucocorticoid gene expression in mice and patients with acute lymphoblastic leukemia. Blood Adv. 2020;4(13):3154–68.

    CAS  PubMed  PubMed Central  Google Scholar 

  81. Bardini M, Trentin L, Rizzo F, Vieri M, Savino AM, Garrido Castro P, Fazio G, Van Roon EHJ, Kerstjens M, Smithers N, et al. Antileukemic Efficacy of BET Inhibitor in a Preclinical Mouse Model of MLL-AF4(+) Infant ALL. Mol Cancer Ther. 2018;17(8):1705–16.

    Article  CAS  PubMed  Google Scholar 

  82. Nwabo Kamdje AH, Krampera M. Notch signaling in acute lymphoblastic leukemia: Any role for stromal microenvironment? Blood. 2011;118(25):6506–14.

    Article  PubMed  Google Scholar 

  83. Nwabo Kamdje AH, Mosna F, Bifari F, Lisi V, Bassi G, Malpeli G, Ricciardi M, Perbellini O, Scupoli MT, Pizzolo G, et al. Notch-3 and Notch-4 signaling rescue from apoptosis human B-ALL cells in contact with human bone marrow-derived mesenchymal stromal cells. Blood. 2011;118(2):380–9.

    Article  PubMed  Google Scholar 

  84. Takam Kamga P, Dal Collo G, Midolo M, Adamo A, Delfino P, Mercuri A, Cesaro S, Mimiola E, Bonifacio M, Andreini A, et al. Inhibition of notch signaling enhances chemosensitivity in b-cell precursor acute lymphoblastic leukemia. Cancer Res. 2019;79(3):639–49.

    Article  PubMed  Google Scholar 

  85. Kuang SQ, Fang Z, Zweidler-McKay PA, Yang H, Wei Y, Gonzalez-Cervantes EA, Boumber Y, Garcia-Manero G. Epigenetic inactivation of Notch-Hes pathway in human B-cell acute lymphoblastic leukemia. PLoS ONE. 2013;8(4): e61807.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  86. Read A, Schröder M. The unfolded protein response: an overview. Biology. 2021;10(5):384.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  87. Roue G, Perez-Galan P, Mozos A, Lopez-Guerra M, Xargay-Torrent S, Rosich L, Saborit-Villarroya I, Normant E, Campo E, Colomer D. The Hsp90 inhibitor IPI-504 overcomes bortezomib resistance in mantle cell lymphoma in vitro and in vivo by down-regulation of the prosurvival ER chaperone BiP/Grp78. Blood. 2011;117(4):1270–9.

    Article  CAS  PubMed  Google Scholar 

  88. Rosati E, Sabatini R, Rampino G, De Falco F, Di Ianni M, Falzetti F, Fettucciari K, Bartoli A, Screpanti I, Marconi P. Novel targets for endoplasmic reticulum stress-induced apoptosis in B-CLL. Blood. 2010;116(15):2713–23.

    Article  CAS  PubMed  Google Scholar 

  89. Martelli A, Paganelli F, Chiarini F, Evangelisti C, McCubrey J. The unfolded protein response: a novel therapeutic target in acute leukemias. Cancers. 2020;12(2):333.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  90. Bailly C, Waring MJ. Pharmacological effectors of GRP78 chaperone in cancers. Biochem Pharmacol. 2019;163:269–78.

    Article  CAS  PubMed  Google Scholar 

  91. Bertolotti A, Zhang Y, Hendershot LM, Harding HP, Ron D. Dynamic interaction of BiP and ER stress transducers in the unfolded-protein response. Nat Cell Biol. 2000;2(6):326–32.

    Article  CAS  PubMed  Google Scholar 

  92. Kharabi Masouleh B, Geng H, Hurtz C, Chan LN, Logan AC, Chang MS, Huang C, Swaminathan S, Sun H, Paietta E, et al. Mechanistic rationale for targeting the unfolded protein response in pre-B acute lymphoblastic leukemia. Proc Natl Acad Sci U S A. 2014;111(21):E2219-2228.

    Article  PubMed  PubMed Central  Google Scholar 

  93. Uckun FM, Qazi S, Ozer Z, Garner AL, Pitt J, Ma H, Janda KD. Inducing apoptosis in chemotherapy-resistant B-lineage acute lymphoblastic leukaemia cells by targeting HSPA5, a master regulator of the anti-apoptotic unfolded protein response signalling network. Br J Haematol. 2011;153(6):741–52.

    Article  CAS  PubMed  Google Scholar 

  94. Khateb A, Ronai ZA. Unfolded protein response in leukemia: from basic understanding to therapeutic opportunities. Trends Cancer. 2020;6(11):960–73.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  95. Féral K, Jaud M, Philippe C, Di Bella D, Pyronnet S, Rouault-Pierre K, Mazzolini L, Touriol C. ER stress and unfolded protein response in leukemia: Friend, foe, or both? Biomolecules. 2021;11(2):199.

    Article  PubMed  PubMed Central  Google Scholar 

  96. Gulliver LS. Xenobiotics and the glucocorticoid receptor. Toxicol Appl Pharmacol. 2017;319:69–79.

    Article  CAS  PubMed  Google Scholar 

  97. Schijvens AM, Ter Heine R, de Wildt SN, Schreuder MF. Pharmacology and pharmacogenetics of prednisone and prednisolone in patients with nephrotic syndrome. Pediatr Nephrol. 2019;34(3):389–403.

    Article  PubMed  Google Scholar 

  98. Krajinovic M, Sinnett H, Richer C, Labuda D, Sinnett D. Role of NQO1, MPO and CYP2E1 genetic polymorphisms in the susceptibility to childhood acute lymphoblastic leukemia. Int J Cancer. 2002;97(2):230–6.

    Article  CAS  PubMed  Google Scholar 

  99. Krajinovic M, Labuda D, Richer C, Karimi S, Sinnett D. Susceptibility to childhood acute lymphoblastic leukemia: influence of CYP1A1, CYP2D6, GSTM1, and GSTT1 genetic polymorphisms. Blood. 1999;93(5):1496–501.

    Article  CAS  PubMed  Google Scholar 

  100. Krajinovic M, Labuda D, Mathonnet G, Labuda M, Moghrabi A, Champagne J, Sinnett D. Polymorphisms in genes encoding drugs and xenobiotic metabolizing enzymes, DNA repair enzymes, and response to treatment of childhood acute lymphoblastic leukemia. Clin Cancer Res. 2002;8(3):802–10.

    CAS  PubMed  Google Scholar 

  101. Aydin-Sayitoglu M, Hatirnaz O, Erensoy N, Ozbek U. Role of CYP2D6, CYP1A1, CYP2E1, GSTT1, and GSTM1 genes in the susceptibility to acute leukemias. Am J Hematol. 2006;81(3):162–70.

    Article  CAS  PubMed  Google Scholar 

  102. Bettington M, Walker N, Clouston A, Brown I, Leggett B, Whitehall V. The serrated pathway to colorectal carcinoma: current concepts and challenges. Histopathology. 2013;62(3):367–86.

    Article  PubMed  Google Scholar 

  103. Rex DK, Ahnen DJ, Baron JA, Batts KP, Burke CA, Burt RW, Goldblum JR, Guillem JG, Kahi CJ, Kalady MF, et al. Serrated lesions of the colorectum: review and recommendations from an expert panel. Am J Gastroenterol. 2012;107(9):1315–29.

    Article  PubMed  PubMed Central  Google Scholar 

  104. Lash RH, Genta RM, Schuler CM. Sessile serrated adenomas: prevalence of dysplasia and carcinoma in 2139 patients. J Clin Pathol. 2010;63(8):681–6.

    Article  PubMed  Google Scholar 

  105. Torlakovic EE, Gomez JD, Driman DK, Parfitt JR, Wang C, Benerjee T, Snover DC. Sessile serrated adenoma (SSA) vs. traditional serrated adenoma (TSA). Am J Surg Pathol. 2008;32(1):21–9.

    Article  PubMed  Google Scholar 

  106. Torlakovic E, Skovlund E, Snover DC, Torlakovic G, Nesland JM. Morphologic reappraisal of serrated colorectal polyps. Am J Surg Pathol. 2003;27(1):65–81.

    Article  PubMed  Google Scholar 

  107. Tinmouth J, Henry P, Hsieh E, Baxter NN, Hilsden RJ, Elizabeth McGregor S, Paszat LF, Ruco A, Saskin R, Schell AJ, et al. Sessile serrated polyps at screening colonoscopy: have they been under diagnosed? Am J Gastroenterol. 2014;109(11):1698–704.

    Article  PubMed  Google Scholar 

  108. Kim SW, Cha JM, Lee JI, Joo KR, Shin HP, Kim GY, Lim SJ. A significant number of sessile serrated adenomas might not be accurately diagnosed in daily practice. Gut Liver. 2010;4(4):498–502.

    Article  PubMed  PubMed Central  Google Scholar 

  109. Orlandi G, Roncucci L, Carnevale G, Sena P. Different roles of apoptosis and autophagy in the development of human colorectal cancer. Int J Mol Sci. 2023;24(12):10201.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  110. Sorolla MA, Hidalgo I, Sorolla A, Montal R, Pallise O, Salud A, Parisi E. Microenvironmental reactive oxygen species in colorectal cancer: involved processes and therapeutic opportunities. Cancers (Basel). 2021;13(20):5037.

    Article  CAS  PubMed  Google Scholar 

  111. Basak D, Uddin MN, Hancock J. The role of oxidative stress and its counteractive utility in colorectal cancer (CRC). Cancers (Basel). 2020;12(11):3336.

    Article  CAS  PubMed  Google Scholar 

  112. Aggarwal V, Tuli HS, Varol A, Thakral F, Yerer MB, Sak K, Varol M, Jain A, Khan MA, Sethi G. Role of reactive oxygen species in cancer progression: molecular mechanisms and recent advancements. Biomolecules. 2019;9(11):735.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  113. Binker MG, Binker-Cosen AA, Richards D, Oliver B, Cosen-Binker LI. EGF promotes invasion by PANC-1 cells through Rac1/ROS-dependent secretion and activation of MMP-2. Biochem Biophys Res Commun. 2009;379(2):445–50.

    Article  CAS  PubMed  Google Scholar 

  114. Myant KB, Cammareri P, McGhee EJ, Ridgway RA, Huels DJ, Cordero JB, Schwitalla S, Kalna G, Ogg EL, Athineos D, et al. ROS production and NF-kappaB activation triggered by RAC1 facilitate WNT-driven intestinal stem cell proliferation and colorectal cancer initiation. Cell Stem Cell. 2013;12(6):761–73.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  115. Zhou YJ, Lu XF, Chen H, Wang XY, Cheng W, Zhang QW, Chen JN, Wang XY, Jin JZ, Yan FR, et al. Single-cell transcriptomics reveals early molecular and immune alterations underlying the serrated neoplasia pathway toward colorectal cancer. Cell Mol Gastroenterol Hepatol. 2023;15(2):393–424.

    Article  CAS  PubMed  Google Scholar 

  116. Kostic AD, Chun E, Robertson L, Glickman JN, Gallini CA, Michaud M, Clancy TE, Chung DC, Lochhead P, Hold GL, et al. Fusobacterium nucleatum potentiates intestinal tumorigenesis and modulates the tumor-immune microenvironment. Cell Host Microbe. 2013;14(2):207–15.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  117. OuYang LY, Wu XJ, Ye SB, Zhang RX, Li ZL, Liao W, Pan ZZ, Zheng LM, Zhang XS, Wang Z, et al. Tumor-induced myeloid-derived suppressor cells promote tumor progression through oxidative metabolism in human colorectal cancer. J Transl Med. 2015;13:47.

    Article  PubMed  PubMed Central  Google Scholar 

  118. Lu J, Kornmann M, Traub B. Role of epithelial to mesenchymal transition in colorectal cancer. Int J Mol Sci. 2023;24(19):14815.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  119. Chen K, Liu Q, Tsang LL, Ye Q, Chan HC, Sun Y, Jiang X. Human MSCs promotes colorectal cancer epithelial-mesenchymal transition and progression via CCL5/beta-catenin/Slug pathway. Cell Death Dis. 2017;8(5): e2819.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  120. Beck B, Lapouge G, Rorive S, Drogat B, Desaedelaere K, Delafaille S, Dubois C, Salmon I, Willekens K, Marine JC, et al. Different levels of Twist1 regulate skin tumor initiation, stemness, and progression. Cell Stem Cell. 2015;16(1):67–79.

    Article  CAS  PubMed  Google Scholar 

  121. Shvab A, Haase G, Ben-Shmuel A, Gavert N, Brabletz T, Dedhar S. Ben-Ze’ev A: Induction of the intestinal stem cell signature gene SMOC-2 is required for L1-mediated colon cancer progression. Oncogene. 2016;35(5):549–57.

    Article  CAS  PubMed  Google Scholar 

  122. Ben-Shmuel A, Shvab A, Gavert N, Brabletz T. Ben-Ze’ev A: Global analysis of L1-transcriptomes identified IGFBP-2 as a target of ezrin and NF-kappaB signaling that promotes colon cancer progression. Oncogene. 2013;32(27):3220–30.

    Article  CAS  PubMed  Google Scholar 

  123. Pinto D, Gregorieff A, Begthel H, Clevers H. Canonical Wnt signals are essential for homeostasis of the intestinal epithelium. Genes Dev. 2003;17(14):1709–13.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  124. Yuan S, Tao F, Zhang X, Zhang Y, Sun X, Wu D. Role of Wnt/beta-catenin signaling in the chemoresistance modulation of colorectal cancer. Biomed Res Int. 2020;2020:9390878.

    Article  PubMed  PubMed Central  Google Scholar 

  125. Zhao H, Ming T, Tang S, Ren S, Yang H, Liu M, Tao Q, Xu H. Wnt signaling in colorectal cancer: pathogenic role and therapeutic target. Mol Cancer. 2022;21(1):144.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  126. Fu X, Li L, Peng Y. Wnt signalling pathway in the serrated neoplastic pathway of the colorectum: possible roles and epigenetic regulatory mechanisms. J Clin Pathol. 2012;65(8):675–9.

    Article  CAS  PubMed  Google Scholar 

  127. Kumar M, Atkins J, Cairns M, Ali A, Tanwar PS. Germ cell-specific sustained activation of Wnt signalling perturbs spermatogenesis in aged mice, possibly through non-coding RNAs. Oncotarget. 2016;7(52):85709–27.

    Article  PubMed  PubMed Central  Google Scholar 

  128. Tanwar PS, Zhang L, Teixeira JM. Adenomatous polyposis coli (APC) is essential for maintaining the integrity of the seminiferous epithelium. Mol Endocrinol. 2011;25(10):1725–39.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  129. Tanwar PS, Zhang L, Tanaka Y, Taketo MM, Donahoe PK, Teixeira JM. Focal Mullerian duct retention in male mice with constitutively activated beta-catenin expression in the Mullerian duct mesenchyme. Proc Natl Acad Sci U S A. 2010;107(37):16142–7.

    Article  PubMed  PubMed Central  Google Scholar 

  130. Tanwar PS, Kaneko-Tarui T, Zhang L, Rani P, Taketo MM, Teixeira J. Constitutive WNT/beta-catenin signaling in murine Sertoli cells disrupts their differentiation and ability to support spermatogenesis. Biol Reprod. 2010;82(2):422–32.

    Article  CAS  PubMed  Google Scholar 

  131. Jordan BK, Shen JH, Olaso R, Ingraham HA, Vilain E. Wnt4 overexpression disrupts normal testicular vasculature and inhibits testosterone synthesis by repressing steroidogenic factor 1/beta-catenin synergy. Proc Natl Acad Sci U S A. 2003;100(19):10866–71.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

Not applicable.

Funding

Support to YR and GG has been provided in part by the IDeA Networks of Biomedical Research Excellence (INBRE) program with a grant from the National Institute of General Medical Sciences (NIGMS), P20 GM103429, from the National Institutes of Health.

Author information

Authors and Affiliations

Authors

Contributions

YR and GG conceived the new statistical methods and computational framework. YR performed computer simulations, real dataset analyses, software implementation in package GSAR, and contributed to the discussion. GG investigated literature, analyzed results, and contributed to the discussion. YR and GG drafted the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Yasir Rahmatallah.

Ethics declarations

Ethics approval and consent to participate

Not applicable as no new data were generated in this study.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rahmatallah, Y., Glazko, G. Improving data interpretability with new differential sample variance gene set tests. BMC Bioinformatics 26, 103 (2025). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12859-025-06117-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12859-025-06117-0

Keywords