- Research
- Open access
- Published:
HGATLink: single-cell gene regulatory network inference via the fusion of heterogeneous graph attention networks and transformer
BMC Bioinformatics volume 26, Article number: 49 (2025)
Abstract
Background
Gene regulatory networks (GRNs) involve complex regulatory relationships between genes and play important roles in the study of various biological systems and diseases. The introduction of single-cell sequencing (scRNA-seq) technology has allowed gene regulation studies to be carried out on specific cell types, providing the opportunity to accurately infer gene regulatory networks. However, the sparsity and noise problems of single-cell sequencing data pose challenges for gene regulatory network inference, and although many gene regulatory network inference methods have been proposed, they often fail to eliminate transitive interactions or do not address multilevel relationships and nonlinear features in the graph data well.
Results
On the basis of the above limitations, we propose a gene regulatory network inference framework named HGATLink. HGATLink combines the heterogeneous graph attention network and simplified transformer to capture complex interactions effectively between genes in low-dimensional space via matrix decomposition techniques, which not only enhances the ability to model complex heterogeneous graph structures and alleviate transitive interactions, but also effectively captures the long-range dependencies between genes to ensure more accurate prediction.
Conclusions
Compared with 10 state-of-the-art GRN inference methods on 14 scRNA-seq datasets under two metrics, AUROC and AUPRC, HGATLink shows good stability and accuracy in gene regulatory network inference tasks.
Introduction
Gene expression regulation is a process that describes how cells control gene expression, a process that involves multiple components and requires mapping these complex heterogeneous multilayered molecules down to the gene level and capturing the interactions between transcription factors and target genes to form a complex network [1, 2]. By constructing and analyzing gene regulatory networks, we can understand the complexity and dynamics of the networks, reveal the regulatory mechanisms of gene expression and signaling pathways, enhance the understanding of the disease progression process, and identify the genes responsible for the progression of various diseases. The development of high-throughput sequencing technologies has led to a surge in the amount of data, and although it is possible to infer regulatory relationships between some genes, it still does not fully reflect the actual interactions in biological systems [3]. The advent of single-cell sequencing (scRNA-seq) technology has made it possible to capture cell type-specific information, but dropout events, sparsity, and noise in the data can lead to biased estimates of heterogeneity, posing a challenge to GRN inference [4].
To address these challenges, many gene regulatory network inference methods have been proposed, which we broadly classify into three categories. The first category of methods is based on statistics and information theory. Some of these methods apply Pearson similarity [5, 6], Spearman correlation [7, 8], and transfer entropy [9, 10] to increase the interpretability of links by analyzing correlations between genes, but are limited by the fact that they require a priori assumptions and can be accomplished only in a unidirectional analysis or have high time complexity. Some approaches also use Boolean networks and ordinary differential equations to infer gene regulatory networks [11,12,13]; these models have the advantage of being relatively simple to compute, but Boolean networks also suffer from the same problem of undirected regulation, whereas SCODE [12], although it has demonstrated that gene regulatory networks can be learned through ordinary differential equations(ODEs), requires a priori assumptions based on linear relationships, however, in real environments, the relationships between genes are often complex and nonlinear. The second class of methods is based on traditional machine learning, e.g., GENIE3 [14] and dynGENIE3 [15] employ random forests to assess the degree of importance of a gene relative to other genes and decompose the network inference problem into different feature selection problems. However, this type of approach splits the input data to iteratively construct multiple trees in a way that incurs expensive computational resources and is difficult to apply to large datasets [16]. In addition, there are many approaches to Bayesian networks as graph probabilistic models for gene regulatory network inference, such as Score_LOPC [17] and mtDBN [18]. Score_LOPC extends the Bayesian information criterion score (BIC) and combines a dynamic Bayesian structure search method with a penalty term to infer the graph structure. mtDBN combines a tree model with a dynamic Bayesian network combination and effectively reduces the error when a baseline DBN is applied to nonlinear problems. Although gene regulatory network inference can be achieved, it faces large time complexity limitations when dealing with high-dimensional data.
To better address the limitations of the above methods, the third class of deep learning methods has attracted widespread-attention, e.g., convolutional neural networks (CNNs) and recurrent neural networks (RNNs) [19,20,21,22,23]. Among them, CNNC [19] and 3DCEMA [20] make full use of the advantage of spatial information to improve the fitting degree with the Normalized Empirical Probability Density Function(NEPDF) and 3D convolutional ideas, respectively, and are facing greater time complexity. Many methods have subsequently been proposed to apply graph neural networks (GNNs) to learn the embedding representations of nodes for gene regulatory network inference [24,25,26,27]. Among them, GNNLink [25] and GMFGRN [26] construct homomorphic and heteromorphic graphs, respectively, to learn gene embedding representations via first-order graph neural network aggregation of neighbor information, and although they yield superior prediction results, the two methods construct samples via uniform sampling, which may cause the negative samples to lack diversity, which does not guarantee the model's ability to generalize when faced with abnormal data and noise. Currently, the transformer [28] is widely used in the field of machine translation because of its encoder-decoder architecture [29], and it has shown significant advantages in recent prediction tasks. Inspired by the transformer's success in modeling sequential data in natural language processing [30], several inference methods [31,32,33] based on the transformer architecture and graph attention networks have also been proposed. Among them, GENELink [31] combines regulatory information under the supervision of graph attention networks and learned embedding space representations to learn robust and generalized gene representations respectively, which provides new ideas for constructing gene regulatory networks on the basis of single-cell data; however, the ability to exclude the possibility of introducing false-positive edges is limited. GATCL [32] implements GRN inference by using convolutional layers instead of weight matrices for graph attention network learning. STGRNS [33] is based on the transformer encoder and uses a self-attention mechanism to focus on different subcarriers within gene pairs to learn gene regulatory networks; however, its use of maximally pooled downsampling may ignore the important information of sparse connections, making the feature information too smooth, which affects the accuracy of gene regulatory network inference.
Cellular heterogeneity analysis continues to be a significant difficulty, and the biological heterogeneity of individual cells in the actual world complicates the inference of gene regulation networks. In addition, current GRN inference methods ignore the neighborhood of long-range nodes and feature extraction of long-range nodes in favor of only focusing on surrounding node feature extraction [34]. Furthermore, the goal of dataset processing is always to balance positive and negative samples, but in real-world settings, this is frequently not the case. As a result, a more efficient inference framework is required that can produce precise predictions while taking into account the cellular heterogeneity and the imbalance of positive and negative data in the real world.
-
In order to address the imbalance of positive and negative samples in real-world settings, we combine three-fold cross-validation with a unique folding division strategy and a hard negative sampling (HNS) frame. This combination can maximize the restoration of the deficiency of positive samples in the actual environment while also increasing the diversity of negative samples. Meanwhile, to be compatible with small-sample datasets, we avoid cross-contamination between datasets by dividing them into numerical intervals to ensure more fine-grained model training and testing, allowing the model to cope with the challenges of data imbalance as well as a limited number of samples.
-
Considering the issue of degree node imbalance in heterogeneous networks, we revisit the network of heterogeneous graphs utilized for graph-based GRN inference. A novel adaptive weight adjustment learning method is proposed to guarantee that crucial information of sparse connections is recorded while avoiding information smoothing. Specifically, we build a heterogeneous graph attention network based on a discretised gene expression matrix, which differs from the typical feature-based learning attention method in that it evaluates graph structural features and employs degree information as a query and key. Maximum pooling is employed for aggregation during node updating, and adaptive attention fine-tuning is used to explore the neighborhood of long-range nodes rather than the typically used sum pooling learning strategy.
-
We created a new architecture that blends the enhanced transformer with the heterogeneous graph attention network. In particular, the model learns potential gene features following the heterogeneous graph attention network. After that, we created a joint feature representation by extracting the feature vectors of every gene pair. The joint features are input into the transformer with positional encoding removed, and then link prediction is performed using a series of nonlinear transformations to achieve adaptive exploration of global characteristics from local areas.
-
We evaluate HGATLink's performance on 14 datasets containing the top 1,000 genes with most-varying and the top 500 genes, and we obtain exceptional results.
Methods
Datasets
We evaluated the performance of HGATLink on the scRNA-seq datasets provided by BEELINE [35] for seven cell types, first following BEELINE [35] for data preprocessing, excluding genes expressed at low levels in the cells and selecting the top 500 and top 1000 genes by ranking them according to P value and variance, and expanding the above 7 datasets into 14 datasets. Subsequently, we constructed gene pair data for each dataset under the cell type-specific ChIP-seq network [36]. Considering the uniform sampling method, the regulatory relationships under the real network were used as positive samples, and the same number of relationships were randomly selected as negative samples, which may introduce bias and lead to a lack of corresponding diversity in the negative samples [25].The HNS was constructed by using genes regulated by transcription factors (TFs) specific to a cell type as positive samples and other genes not regulated by these TFs, as well as genes regulated by other TF relationships, as negative samples to construct regulatory networks, and the detailed statistical information is shown in Table 1.
The HGATLink framework
HGATLink is a deep learning framework that implements gene regulatory network inference by viewing the GRN inference problem as a link prediction problem. The overall model architecture is shown in Fig. 1D, which consists of three parts, (1) heterogeneous graph construction, (2) a heterogeneous graph-based interaction encoder, and (3) a transformer-based link prediction module.
HGATLink workflow. A Adaptive weight-adjusted learning module. B Subgraphs pass updates internally. For example, a subgraph with edge type 1 learns updates and fine-tunes the maximum pooling aggregation using adaptive weights. C Embedding matrix learning. The optimal gene embedding representation is obtained and stored by fitting the embedding matrices of genes and cells to the preprocessed gene expression matrix. D HGATLink overall architecture. In the heterogeneous graph construction module, the heterogeneous map is constructed on the basis of the discretised gene expression matrix obtained from the gene expression matrix transformation. In the interactive encoder module of the heterogeneous graph, the internal transmission of updates is implemented through a graph convolutional network (GCN). Subsequently, global aggregation is implemented throughout the heterogeneous graph network (denoted by AGG in Fig. 1D), and the embedding matrices of genes and cells are learned through full concatenation (FC), and then the best gene embeddings are stored. Finally, gene pairs are represented as embedding features, which serve as inputs to the Transformer layer to predict regulatory relationships via nonlinear transformations
Heterogeneous graph construction process
Since directly taking gene expression data as input introduces bias, we refer to GMFGRN's method of constructing a heterogeneous graph network on the basis of a gene expression matrix (Fig. 1D Heterogeneous Graph Construction). On the basis of the given g × c gene expression matrix X, we perform the corresponding processing of the nonzero elements to obtain the new gene expression matrix \(X{\prime}\), which is expressed as follows:
Referring to 3DCEMA [20], considering the extreme values, we let \(X_{i}^{\prime} = (x_{i1}^{\prime} ,x_{i2}^{\prime} , \cdots ,x_{ic}^{\prime} )\) denote the expression value of gene \(\text{i}\) in all cells, and the expression range of gene i is obtained by calculating the mean and standard deviation of gene i through all cells, which is defined as Range and is denoted as follows:
We decompose Range into k equally spaced segments, where k is a custom hyperparameter, resulting in a discretised representation of all genes, denoted as follows:
where n denotes the number of subgraphs in the heterogeneous graph, i.e., the number of edge type values.
Next, we rewrite the gene expression values in \(X^{\prime}\) to obtain \(X^{^{\prime\prime}}\), again showing the rewritten discretised gene expression matrix values in terms of gene \(\text{i}\), expressed as follows:
where k denotes the corresponding discrete value and n denotes the type value of the edge in the heterogeneous graph, through the above process \(X^{\prime}\) will be converted to \(X^{^{\prime\prime}}\).Construct the heterogeneous graph network, where the same value in the discretised gene expression matrix will be regarded as a subgraph in the heterogeneous graph, and the different subgraphs are jointly constructed into an overall heterogeneous graph network.
Heterogeneous graph-based interaction encoder
The gene-cell heterogeneous graph includes two main parts: message passing and message aggregation. We set the corresponding interaction encoder for each subgraph. Taking edge type 1 as an example (Fig. 1B), the message passed from cell j to gene \(\text{i}\) is specifically represented as follows.
where \(W_{in}\) is the matrix of weight parameters learned for each subgraph, and where \(cell_{xj}\) denotes the degree vector of cells j. This process shows that the features of the source node after nonlinear changes are delivered as messages and that all target nodes with the current cell j as the source node receive the current message.
However, direct backhauling of messages can significantly reduce the efficacy of GNNs since the message-passing mechanism contaminates the node representation and propagates noise [37]. Given the correlation between edge noise and degree [37], as well as the degree imbalance in heterogeneous graph networks, we present an adaptive weight adjustment learning method (Fig. 1A). It measures graph structure properties instead of the conventional feature-based learning strategy for attention. Our heterogeneous graph network's node features contain degree information. We first build the query and key matrices using the node features, which are shown as follows:
where \(W_{k}\) is a c × d matrix of edge type weights, c is the number of cells, and d is our custom embedding dimension. Notably, the above is an example of a gene-cell graph, and the actual graph is set flexibly according to the current subgraph type. In general, the first dimension is the same as the length of the target node. \({x}_{i}\) denotes the target node feature of the current graph, \({n}_{heads}\) denotes the number of heads, \({d}_{k}\) is the dimension of the head, since our embedding dimension is set to 256, the number of heads is set to 4 and the dimension of the head is set to 64.
Next we normalize the query and key matrices and compute the attention coefficients, which are expressed as follows:
The self-attention correlation does not capture the similarity of two vectors in the direction in the multidimensional space such as cosine similarity, it captures the degree of similarity between itself via the dot product, thus, we consider that this method of calculation will make the correlation matrix to obtain a high value, and according to a previous study [38], we set a parameter of t = 0.25 to adjust the correlation score to obtain a more reasonable value representation. The specific representation is as follows:
This adaptive method of weight adjustment lowers aggregation process losses and enables a more focused approach to the propagation process. Subsequently, we adopt maximum pooling aggregation, i.e., the message that the largest feature among all the neighboring nodes of the target node is used as a neighbor (Fig. 1B). The aggregation update process of the overall heterogeneous map is represented as follows:
where n denotes the number of subgraphs in the heterogeneous graph, i.e., the number of edge types, the above shows the updating process of gene \(\text{i}\), and the same updating and aggregation is performed for cells. After updating, we obtain the gene feature size as c × n × d and the cell feature size as g × n × d, where \({W}_{out}\) is a dimensionality transformed weight matrix. The n × d mentioned above is denoted as d for feature extraction to learn the final embedding of genes c × d and the final embedding of cells g × d. This approach uses the self-attention mechanism to strengthen the target node's own feature representation, which makes the target node pay attention to the neighboring nodes and simultaneously enhances the degree of attention to itself.
After learning the embedding representation features, we normalize the nonzero expression data to obtain the score and perform the calculation to minimize the loss by fitting the normalized value of the inner product of the final gene embedding features and cell embedding features to the true expression value we calculate, which is expressed as follows:
where Z denotes the number of nonzero expression values. Through the above update aggregation and optimization process, we save the embedded features of the final gene (Fig. 1C).
Link prediction based on the transformer
The main function of the linkage prediction module is to learn the regulatory relationship between gene pairs on the basis of the given input features, i.e., between the two with regulation 1 and without regulation 0. We consider the advantages of the multi-head self-attention mechanism of the transformer architecture and its parallel computing capability to better learn the complex relationships and obtain more accurate prediction results [39], so we choose to use the improved transformer as the main architecture of our prediction module.
Transformer
Transformer [28] is a straightforward network architecture that relies just on the attention mechanism and does not use recursion or convolution. It is composed of two layers: an encoder layer and a decoder layer. The encoder layer is connected by N identical modules, each of which covers a completely feedforward fully connected layer with residuals and a self-attention sublayer with residuals. Additionally, the decoder layer adds an attention layer that applies multi-head attention to the encoder layer's output. The multi-head self-attention is represented as follows:
where Q, K and V perform multiscale dot product attention at multiple heads after linear transformation, spliced and linearly transformed to obtain the final attention score, and global learning is achieved by capturing different feature representations in parallel with multiple attention heads. In addition, both the encoder and decoder layers will have an additional input of location encoding identifying the location information.
Link prediction
On the basis of the embedded features of the learned genes, we extract the feature vectors of each gene pair from the file to construct the joint feature representation. Our input format is b × 2 × d, where b denotes the batch size, 2 denotes the two embedding tensors corresponding to each gene pair TF as well as Target, and d denotes the embedding dimensionality, which is set to 256 in this paper (Fig. 1D Transformer Layer Link Prediction). Since the transformer is a convolutional and recurrent network-free model that relies purely on the auto-attention mechanism, to make full use of the sequential order, we need to add position coding. However, the embedding features of the genes that we learned are considered in the process of propagation aggregation of the heterogeneous graph while taking into account the positional information. First, the heterogeneous graph that we constructed is divided into two types of nodes, and in the process of propagation aggregation each type of node occupies a different position in the graph. Second, we construct a heterogeneous graph with n different edge types that are aggregated according to different propagation paths. Third, we introduce self-attention correlation considering that the position of nodes in the graph dynamically adjusts the information aggregation. Therefore, we use the Simplified Transformer with reduced positional coding as the backbone to stop blindly adding redundant variables to reduce errors and ensure that the most valuable features are learnt for the most accurate predictions, which is represented as follows:
The goal of HGATLink is to optimize the difference between the predicted and labeled values. We use a combined function optimization model that combines a sigmoid function and a binary cross-entropy loss function and use the Adam optimizer to update the model parameters until HGATLink converges, as shown below:
where n is the number of samples and y is the label.
Experiments and results
Evaluate metrics settings
To evaluate the performance of HGATLink, we adopt two evaluation metrics, AUROC and AUPRC, to measure the model performance. The AUPRC measures model performance by plotting the precision and recall curves and calculating the area under the curve, with a value closer to 1 indicating better predictive performance of the model. AUROC can be a good assessment of the overall performance of the model, but can mask the performance on the minority class in imbalanced datasets, so combining it with AUPRC can show the model's ability to balance precision and recall and thus assess whether the model is able to correctly classify positive samples [16]. In addition, based on the constructed training datasets, we adopt a three-fold cross-validation approach, in which the test set accounts for 33.3% and the remaining 66.7% of the training set is further partitioned into 20% as the validation set. It is worth mentioning that we are evaluating all the test sets as a whole and not averaging the three results. Instead of setting the test set directly, this cross-validation approach ensures that all data are used as the test set to evaluate the overall performance of the model. In addition, through our customized numerical range, we can ensure the independence of each fold of data and effectively avoid the problem of cross-contamination of small sample data.
Training setting
In order to optimize HGATLink, our goal is to gradually and stably reduce the loss and avoid overfitting. We adopt the Adam optimizer and the cross-entropy loss function, and after many experiments, we finally set the learning rate to 1e-3 and the weight attenuation to 1e-6. (In the mESC dataset, we set the learning rate to 4e-4, which is large, and the operation of reducing the learning rate is performed to maintain stable convergence.) In addition, the batch size is set to 512, and the training stops when it is trained 200 times or the parameters are no longer updated when it reaches 10 times. HGATLink is trained and optimized on a GPU (Tesla V100S-PCIE-32GB). For the implementations of the other methods, we follow their respective architectures and hyperparameter values.
Experimental results
To assess the ability of HGATLink to predict GRNs, we chose to compare it with 10 state-of-the-art GRN inference methods, the details of which are listed below:
-
SCODE [12] proposes single-cell sequencing of differentiated cells to learn gene regulatory networks from ODEs and achieve dynamics.
-
GENIE3 [14] uses random forests to assess the degree of importance of a gene relative to other genes by constructing multiple decision trees, with the goal of identifying the current gene regulator to decompose the network inference problem into different feature selection problems.
-
CNNC [19] proposes constructing gene pairs as NEPDF as inputs to the convolutional structure to take full advantage of spatial information.
-
DGRNS [21] focuses on the property that gene expression data are pseudo-temporal data and demonstrates the lag of TF regulation by capturing the local correlation between gene pairs through a sliding window as an input to the Gated Recurrent Unit (GRU) network structure.
-
GNE [24] combines topological properties and the gene representation tensor to increase the accuracy of the inference algorithm.
-
GNNLink [25] uses a first-order graph neural network to aggregate neighborhood information and infer correlations between genes via a dot product.
-
GMFGRN [26] focuses on heterogeneity information, constructs heterogeneous graph to learn gene and cell embeddings efficiently, and demonstrates that heteromorphic graph-based embedding representation reduces the false-positive rate and improves inference accuracy.
-
IGEGRNS [27] learns the embedding representations of nodes through GraphSAGE and obtains the final prediction results for each relationship after three convolutional layers of feature learning to achieve GRN inference.
-
GENELink [31] proposes learning robust and generalized gene representations by combining regulatory information under the supervision of graph attention networks and the learned embedding space representation, respectively.
-
GATCL [32] implements GRN inference by using convolutional layers instead of weight matrices for graph attention network learning.
HGATLink conducts comparison experiments with 14 scRNA-seq datasets preprocessed by the above 10 GRN inference methods under real networks, all experiments performed use the same folding, and the experimental results are shown in Fig. 2. In addition, we plotted the PR curves of the top five performing models in the top 500 genes of high variance for the six datasets (Supplementary Material Fig. 1).
In summary, HGATLink achieves the optimal AUROC metric (14/14), with a maximum improvement of 4% over the second-best performing method. Meanwhile, the AUPRC metric is optimal (13/14), which demonstrates the strong link prediction capability of HGATLink, which is able to show stability and robustness across different sizes as well as different types of datasets.
Ablation experiments
In order to validate the effectiveness of our proposed HGATLink key module, we conducted extensive ablation experiments.
Validating the importance of the transformer decoder
Currently, some models use separate transformer encoders to accomplish the prediction task. In order to validate the reasonableness of our proposed structure, we predict the regulatory relationships between genes on the basis of the transformer encoder layer alone with the same parameter settings, and we evaluate the AUROC metric and the AUPRC metric on the seven preprocessed datasets of the top 500 genes. Overall, the AUROC metric and the AUPRC metric on the mESC dataset remain consistent, but both metrics have a decreasing trend on the remaining 6 datasets, with a maximum decrease of 5% for the AUROC metric and 6% for the AUPRC metric. (Supplementary Material Figs. 3 and 4). In addition, in Table 2, the statistical significance of the HGATLink versus the reduced encoder layer approach was assessed by the Wilcoxon test.
Validating whether adding location coding improves model accuracy
By eliminating the location coding, we enhance the transformer structure and confirm that the heterogeneous graph-based attention network has learned the location information. Meanwhile, the model's parameters are decreased and its structure is made simpler. Overall, the AUROC metrics and AURPC metrics decreased in all 7 datasets after position coding was added, the maximum decrease in AUROC metrics was 17%, and the maximum decrease in AUPRC metrics was 34% (Supplementary Material Figs. 5 and 6). Table 2 displays the experimental findings in both the without Transformer decoder and with positional encoding added. We take into account both the AUROC metric and the AUPRC metric, and the Wilcoxon test is used to calculate the p value in order to assess the statistical significance of HGATLink on both measures in comparison to alternative approaches. Regarding the Wilcoxon test, we used the same dataset for before and after comparisons to ensure that it is a paired test. In addition, to provide a broader test of hypotheses, we take a two-sided test.
Analysing the performance of different matrix decomposition categories on the AUROC and the AUPRC metrics
Different values of K affect the layout of the heterogeneous graph network, change the location information, learn different features, and thus affect the performance of the model. We have taken 10 and 15 for matrix decomposition categories parameter k, respectively, and carried out comparison experiments Fig. 3. It can be seen that when k = 15, both AUROC and AUPRC metrics are improved, and the maximum improvement is 3%; therefore, we finally make k = 15 for matrix decomposition operation.
Validating the impact of different embedding dimensions
The size of the gene feature dimension affects the overall model performance. To validate the effectiveness of the dimension parameters we chose, we trained the model with feature dimensions of 512 as well as 256 to evaluate the AUROC metric as well as the AUPRC metric on 7 preprocessed datasets of the top 1000 genes, as shown in Fig. 4. By comparison, an embedding dimension of 512 shows no improvement in both AUROC as well as AUPRC metrics, and performance decreases on individual datasets. In addition, in order to clearly show the stability of the model, we added error bars in the statistical graphs, which shows that the embedding dimension 512 exhibits a slight fluctuation compared to 256 and does not maintain good stability. Moreover, there are expensive computational resource issues and memory consumption for higher dimensions when training optimal gene embedding representations. In conclusion, we selected an embedding dimension of 256 as an appropriate parameter setting.
Evaluating the impact of different cross-validation approaches on model performance
We assess HGATLink using three-fold cross-validation, and Fig. 5 indicates that the five-fold cross-validation increases the standard deviation by a maximum of 1.2% while not significantly improving the AUROC and AUPRC metrics. In addition, an analysis of variance (ANOVA) has given a p-value greater than 0.95. We verified that the performance of five-fold cross-validation is comparable to that of three-fold cross-validation. Based on the model's stability across various datasets and computational resources, we ultimately opted for the three-fold cross-validation approach in this paper.
Evaluating the impact of the way the dataset is divided on model stability
Considering the effect of different ways of dataset division on the model for three-fold cross-validation, Fig. 6 is a violin plot for each fold of the results, which clearly shows the distribution of the data to validate the degree of influence of the dataset division method on the model. We also do ANOVA to compare the effect more thoroughly; * denotes the degree of significant difference: *p < 0.05, **p < 0.01, ***p < 0.001, ****p < 0.0001. The experimental findings demonstrate that HGATLink exhibits high stability across a range of dataset sizes and categories. We carried out hyperparametric sensitivity analyses involving batch size, learning rate, and epoch in order to more thoroughly evaluate the stability of our suggested framework. mDC and mHSC-L with high variation top 500 genes were chosen as examples. HGATLink shows minor changes when the hyperparameters are altered, demonstrating the stability and robustness of our suggested framework (Supplementary Material Fig. 2).
Validating the ability of models to learn potential features in finite positive examples
In order to verify that our proposed model has the ability to learn potential features in limited positive examples, in Fig. 7, we reduced the positive examples by 10–40%, respectively, and performed statistics with 95% confidence intervals for the median with the second-best performance with the hHEP and mHSC-L datasets of the top 500 genes of the high variant. The comparison shows that HGATLink maintains relatively high inferential accuracy with decreasing positive examples and is better able to cope with the challenge of data imbalance.
In conclusion, even when there are fewer positive samples, the model we propose ensures high accuracy and superior prediction capacity. Furthermore, the method that we propose shows stability in cross-validation and performs well across a range of dataset sizes and kinds.
Discussion
This study pays due attention to the challenges posed by cellular heterogeneity to the inference of gene regulatory networks. It is highlighted that gene expression varies by cell type, and the process of building a gene regulatory network should take this fact into consideration.
In this work, we propose HGATLink, a deep learning inference framework that maps gene expression data into a discretised gene expression matrix to build a heterogeneous graph attention network. A heterogeneous graph interactive encoder is used to carry out the graph convolution process, and a gene-cell heterogeneous graph network is used for global aggregation in order to continually learn embedding representations. Lastly, learning the regulatory relationships in gene association features is based on the simplified transformer.
The experimental results clearly show that HGATLink is capable of exploring the long-range neighborhood associations of nodes in addition to learning from complicated network structures. Furthermore, compared to the conventional feature learning-based attention method, our proposed adaptive weight adjustment learning method is more responsive to the topological position of nodes in the network. With the constant reduction of positive samples, the framework we propose shows an absolute advantage in inference stability.
Currently, the high dimensionality and low sample size (HDLSS) of many biological data is also a challenge for gene regulatory network inference [40]. Although our proposed framework has the advantage of analyzing small-sample data and is able to explore the relationship between long-range nodes, in the future, we will further investigate how to optimize the computational efficiency of embedded features.
In conclusion, our proposed framework provides new ideas for gene regulatory network inference studies, which are more focused on the challenges of practical applications and demonstrate the ability to accurately predict the regulatory relationships between TFs and target genes.
Conclusions
In the field of gene regulatory network research, this study presents a deep learning framework, HGATLink, using a heterogeneous graph attention network combined with an enhanced Transformer architecture. We verify that the proposed framework performs exceptionally well, accounting for cellular heterogeneity and minimizing transmissible interactions through the use of an adaptive weight tuning technique. Despite our research's successes, we are fully aware that determining gene regulatory networks from HDLSS remains a significant challenge in the study of many biological systems and diseases. We'll investigate deep learning frameworks in the future to offer fresh perspectives on investigating targeted medications that correspond to certain diseases.
Availability of data and materials
The code and data are available at https://github.com/aniani6aa/HGATLink. This study utilized the single-cell RNA dataset provided by BEELINE (https://doiorg.publicaciones.saludcastillayleon.es/10.1038/s41592-019–0690-6), which included human embryonic stem cells (hESC), human mature hepatocytes (hHEP), mouse dendritic cells (mDC), mouse embryonic stem cells (mESC), mouse hematopoietic stem cells with erythroid-lineage (mHSC-E), mouse hematopoietic stem cells with granulocyte-monocyte-lineage (mHSC-GM), and mouse hematopoietic stem cells with lymphoid-lineage (mHSC-L).
References
Badia-i-Mompel P, Wessels L, Müller-Dott S, Trimbour R, Ramirez Flores RO, Argelaguet R, Saez-Rodriguez J. Gene regulatory network inference in the era of single-cell multi-omics. Nat Rev Genet. 2023;24(11):739–54.
Liu Z. Advances in gene regulatory network inference research. J Oper Res. 2021;25(3):173–82 in Chinese.
Zhao M, He W, Tang J, Zou Q, Guo F. A comprehensive overview and critical evaluation of gene regulatory network inference technologies. Brief Bioinform. 2021;22(5):bbab009.
Wagner A, Regev A, Yosef N. Revealing the vectors of cellular identity with single-cell genomics. Nat Biotechnol. 2016;34(11):1145–60.
Coscia M. Pearson correlations on complex networks. J Complex Netw. 2021;9(6):cnab036.
Cao Z, Guan L, Yu R, Chen J. Identifying autophagy-related lncRNAs and potential ceRNA networks in NAFLD. Front Genet. 2022;13:931928.
Zhou P, Zheng G, Li Y, Wu D, Chen Y. Construction of a circRNA-miRNA-mRNA network related to macrophage infiltration in hepatocellular carcinoma. Front Genet. 2020;11:1026.
Wang J, Chen Y, Zou Q. Inferring gene regulatory network from single-cell transcriptomes with graph autoencoder model. PLoS Genet. 2023;19(9):e1010942.
Qiu X, Rahimzamani A, Wang L, Ren B, Mao Q, Durham T, Kannan S. Inferring causal gene regulatory networks from coupled single-cell expression dynamics using scribe. Cell Syst. 2020;10(3):265–74.
Zeng Y, He Y, Zheng R, Li M. Inferring single-cell gene regulatory network by non-redundant mutual information. Brief Bioinform. 2023;24(5):bbad326.
Trinh HC, Kwon YK. A novel constrained genetic algorithm-based Boolean network inference method from steady-state gene expression data. Bioinformatics. 2021;37:i383–91.
Matsumoto H, Kiryu H, Furusawa C, Ko MSH, Ko SBH, Gouda N, Nikaido I, Hayashi T, Nikaido I. SCODE: an efficient regulatory network inference algorithm from single-cell RNA-Seq during differentiation. Bioinformatics. 2017;33:2314–21.
Ma B, Fang M, Jiao X. Inference of gene regulatory networks based on nonlinear ordinary differential equations. Bioinformatics. 2020;36(19):4885–93.
Huynh-Thu VA, Irrthum A, Wehenkel L, Geurts P. Inferring regulatory networks from expression data using tree-based methods. PLoS ONE. 2010;5(9):e12776.
Huynh-Thu V, Geurts P. dynGENIE3: dynamical GENIE3 for the inference of gene networks from time series expression data. Sci Rep. 2018;8:3384.
Wang Y, Chen X, Zheng Z, Huang L, Xie W, Wang F, Zhang Z, Wong KC. scGREAT: transformer-based deep-language model for gene regulatory network inference from single-cell transcriptomics. Iscience. 2024;27(4):109352.
Ajmal HB, Madden MG. Dynamic Bayesian network learning to infer sparse models from time series gene expression data. IEEE/ACM Trans Comput Biol Bioinf. 2021;19(5):2794–805.
Quesada D, Bielza C, Fontán P, Larrañaga P. Piecewise forecasting of nonlinear time series with model tree dynamic Bayesian networks. Int J Intell Syst. 2022;37(11):9108–37.
Yuan Y, Bar-Joseph Z. Deep learning for inferring gene relationships from single-cell expression data. Proc Natl Acad Sci. 2019;116(52):27151–8.
Fan Y, Ma X. Gene regulatory network inference using 3D convolutional neural network. Proc AAAI Conf Artif Intell. 2021;35(1):99–106.
Zhao M, He W, Tang J, Zou Q, Guo F. A hybrid deep learning framework for gene regulatory network inference from single-cell transcriptomic data. Brief Bioinform. 2022;23(2):568.
Gan Y, Hu X, Zou G, Yan C, Xu G. Inferring gene regulatory networks from single-cell transcriptomic data using bidirectional RNN. Front Oncol. 2022;12:899825.
Yuan Y, Bar-Joseph Z. Deep learning of gene relationships from single cell time-course expression data. Brief Bioinform. 2021;22(5):bbab142.
Kc K, Li R, Cui F, Yu Q, Haake AR. GNE: a deep learning framework for gene network inference by aggregating biological information. BMC Syst Biol. 2019;13:1–14.
Mao G, Pang Z, Zuo K, Wang Q, Pei X, Chen X, Liu J. Predicting gene regulatory links from single-cell RNA-seq data using graph neural networks. Brief Bioinform. 2023;24(6):bbad414.
Li S, Liu Y, Shen LC, Yan H, Song J, Yu DJ. GMFGRN: a matrix factorization and graph neural network approach for gene regulatory network inference. Brief Bioinform. 2024;25(2):bbad529.
Gan Y, Yu J, Xu G, Yan C, Zou G. Inferring gene regulatory networks from single-cell transcriptomics based on graph embedding. Bioinformatics. 2024;40:btae291.
Vaswani A. Attention is all you need. Adv. Neural Inf. Process. Syst. 30; 2017.
Cho K, Van Merriënboer B, Bahdanau D, Bengio Y. On the properties of neural machine translation: encoder-decoder approaches. arxiv preprint arxiv:1409.1259; 2014.
Wang C, Chen Y, Zhang S, Zhang Q. Stock market index prediction using deep transformer model. Expert Syst Appl. 2022;208:118128.
Chen G, Liu ZP. Graph attention network for link prediction of gene regulations from single-cell RNA-sequencing data. Bioinformatics. 2022;38(19):4522–9.
Liu J, Zhou S, Ma J, Zang M, Liu C, Liu T, Wang Q. Graph attention network with convolutional layer for predicting gene regulations from single-cell ribonucleic acid sequence data. Eng Appl Artif Intell. 2024;136:108938.
Xu J, Zhang A, Liu F, Zhang X. STGRNS: an interpretable transformer-based method for inferring gene regulatory networks from single-cell transcriptomic data. Bioinformatics. 2023;39(4):btad165.
Chen Z, Wu G, Gao H, Ding Y, Hong D, Zhang B. Local aggregation and global attention network for hyperspectral image classification with spectral-induced aligned superpixel segmentation. Expert Syst Appl. 2023;232:120828.
Pratapa A, Jalihal AP, Law JN, Bharadwaj A, Murali TM. Benchmarking algorithms for gene regulatory network inference from single-cell transcriptomic data. Nat Methods. 2020;17(2):147–54.
ENCODE Project Consortium, Moore JE, Purcaro MJ, Pratt HE, Epstein CB, Shoresh N, Weng Z, Adrian J, Kawli T, Davis CA, Dobin A, et al. Expanded encyclopaedias of DNA elements in the human and mouse genomes. Nature. 2020;583(699–710):47.
Wu J, Hooi B. Decor: degree-corrected social graph refinement for fake news detection. In Proceedings of the 29th ACM SIGKDD conference on knowledge discovery and data mining;2023. pp. 2582–2593.
Wang F, Liu H. Understanding the behaviour of contrastive loss. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition;2021. pp. 2495–2504.
Yao D, Li B, Zhan X, Zhan X, Yu L. GCNFORMER: graph convolutional network and transformer for predicting lncRNA-disease associations. BMC Bioinform. 2024;25(1):5.
Kingma D, Ba J. Adam: a method for stochastic optimization. CoRR 2014;abs/1412.6980. https://api.semanticscholar.org/ CorpusID: 6628106.
Acknowledgements
The authors thank the anonymous reviewers for their valuable suggestions.
Funding
This work was supported by the following grants: Inner Mongolia Autonomous Region Science and Technology Major Project-Research and Development of Cloud Computing Application Technology (2019ZD016) and Inner Mongolia Autonomous Region Science and Technology Department-Research and Development and Application of Key Technology for Intelligent Control of Potato Industry Chain (2021ZD0005).
Author information
Authors and Affiliations
Contributions
Y.S: Conceptualization, Methodology, Data Preprocessing, Writing-Original Draft, Editing. J.G: Review. All authors have read and agreed to the published version of the manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not required.
Consent for publication
Not applicable.
Competing Interests
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Sun, Y., Gao, J. HGATLink: single-cell gene regulatory network inference via the fusion of heterogeneous graph attention networks and transformer. BMC Bioinformatics 26, 49 (2025). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12859-025-06071-x
Received:
Accepted:
Published:
DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12859-025-06071-x