- Research
- Open access
- Published:
Adaptive deep feature representation learning for cross-subject EEG decoding
BMC Bioinformatics volume 25, Article number: 393 (2024)
Abstract
Background:
The collection of substantial amounts of electroencephalogram (EEG) data is typically time-consuming and labor-intensive, which adversely impacts the development of decoding models with strong generalizability, particularly when the available data is limited. Utilizing sufficient EEG data from other subjects to aid in modeling the target subject presents a potential solution, commonly referred to as domain adaptation. Most current domain adaptation techniques for EEG decoding primarily focus on learning shared feature representations through domain alignment strategies. Since the domain shift cannot be completely removed, target EEG samples located near the edge of clusters are also susceptible to misclassification.
Methods:
We propose a novel adaptive deep feature representation (ADFR) framework to improve the cross-subject EEG classification performance through learning transferable EEG feature representations. Specifically, we first minimize the distribution discrepancy between the source and target domains by employing maximum mean discrepancy (MMD) regularization, which aids in learning the shared feature representations. We then utilize the instance-based discriminative feature learning (IDFL) regularization to make the learned feature representations more discriminative. Finally, the entropy minimization (EM) regularization is further integrated to adjust the classifier to pass through the low-density region between clusters. The synergistic learning between above regularizations during the training process enhances EEG decoding performance across subjects.
Results:
The effectiveness of the ADFR framework was evaluated on two public motor imagery (MI)-based EEG datasets: BCI Competition III dataset 4a and BCI Competition IV dataset 2a. In terms of average accuracy, ADFR achieved improvements of 3.0% and 2.1%, respectively, over the state-of-the-art methods on these datasets.
Conclusions:
The promising results highlight the effectiveness of the ADFR algorithm for EEG decoding and show its potential for practical applications.
Introduction
Brain-computer interfaces (BCIs) have the capability to decode neural activity and translate it into control commands. BCIs can establish a direct communication path between the human brain and external devices [1] without relying on conventional neuromuscular pathway. Electroencephalogram (EEG) is one of the most widely used techniques in BCIs due to its non-invasiveness, high temporal resolution, and the portability of acquisition equipment, facilitating the measurement of neuroelectrical activity on the scalp. Motor imagery (MI) is the important paradigm in BCIs, which has considerable potential for the rehabilitation of upper and lower limb movements [2]. Individuals with disabilities (e.g., stroke or locked-in syndrome) can modulate sensorimotor rhythms through recognizing EEG signals of imagined movements to facilitate neural plasticity and functional recovery [3].
Advanced machine learning techniques have been applied to various challenging problems in biomedical engineering [4]. Current MI-based BCIs mainly utilize data-driven machine learning approaches to decode EEG signals. Despite the considerable success of these methods, traditional machine learning approaches necessitate sufficient labeled EEG data, making the development of subject-specific classifiers time-consuming and labor-intensive [5]. More specific.ally, a 20–30 min calibration is usually required before recording EEG data, which is inconvenient and fatiguing for people [6]. This presents a significant challenge to the usability and scalability of MI-based BCIs. Reducing or eliminating the calibration time is of great importance [7], particularly for disabled subjects with limited motor functions. However, insufficient EEG data might weaken the generalization capability of the decoding model. Given that EEG signals corresponding to the same MI task have similar distribution, thus other subjects’ EEG data can be leveraged to facilitate the construction of EEG decoding model for the target subject. However, the inter-subject variability [8] often results in degraded classification performance when applying an existing EEG decoding model to a new subject. To address this issue, transfer learning [9, 10] is a feasible approach that exploits the shared knowledge between the source and target subjects to facilitate the construction of target EEG decoding model.
To date, two primary categories of transfer learning have been systematically investigated to realize cross-subject transfer in MI-based BCIs. The first category is the inductive transfer learning [11], which requires a subset of labeled target EEG data to construct the target predictive model. For example, Chen et al. [12] proposed an innovative transfer support matrix machine for the classification of MI EEG data, which requires some labeled target EEG data. Besides, Liang et al. [13] developed an adaptive multimodel knowledge transfer matrix machine for EEG classification, which adaptively selects multiple correlated source model knowledge though a leave-one-out cross-validation strategy using the available target training EEG data. Although effective, the above methods still require a certain quantity of labeled EEG data from the target subjects to learn the classifier, which limits the practicality of EEG decoding methods in certain scenarios. Recent studies have demonstrated the efficacy of the second category transfer learning in MI-based BCIs, i.e., the transductive transfer learning (domain adaptation, DA) [11]. In this situation, no labeled target data are required, which greatly improves the practicality of the EEG decoding methods.
In this paper, we present a novel domain adaptation framework that enables the adaptive learning of transferable EEG feature representations. The motivation of the proposed ADFR is illustrated in Fig. 1. Consider a labeled source subject \({D_s}\) and an unlabeled target subject \({D_t}\) as illustrated in subplot Fig. 1a. Due to the substantial distribution difference between source and target domains, the classifier f trained on source EEG data cannot completely discriminate target EEG data. Referring to previous studies, we firstly use maximum mean discrepancy (MMD) regularization to reduce the distribution discrepancy between the source and target domains. Although the above domain alignment can improve the transferability of feature representations, the target EEG features located near the edge of the corresponding clusters are still likely to be misclassified, as shown in Fig. 1b. To this end, we introduce an instance-based discriminative feature learning (IDFL) regularization to enhance the discriminability of source EEG features within the shared feature space. Combined with MMD regularization, IDFL can align target EEG features of different categories with that of source subjects, thus adaptively making the features more separable, as illustrated in Fig. 1c. Although effective, cross-subject EEG features learning usually cannot completely remove the distribution discrepancy between the source and target domains. In view of this, we utilize the entropy minimization (EM) regularization to make the classifier pass through the low-density region between clusters, as shown in Fig. 1d. The synergistic learning between above three regularizations during the training process can enhance EEG decoding performance across subjects. Extensive experiments performed on publicly available MI-based EEG datasets demonstrate the remarkable performance of our ADFR framework.
The main contributions of this work are as follows.
-
We present a novel deep domain adaptation framework for cross-subject EEG decoding, which can jointly adapt both features and classifier to learn deep transferable EEG feature representations.
-
The proposed ADFR jointly incorporates domain alignment, deep discriminative feature learning, and low-density separation in a unified framework to enhance the transferability of feature representations.
-
We conducted a comprehensive evaluation of the proposed ADFR on two public MI-based EEG datasets. The experimental results verify the superiority of our framework.
The remaining sections are organized as follows. Section introduces the related works. Section presents a detailed explanation of the proposed ADFR framework and the learning algorithm. We present the extensive experimental evaluations of the proposed method and provide a thorough discussion of the results in Sects. and . Finally, we conclude our framework in Sect. .
Related works
Several transfer learning methods [11] methods have been developed to achieve cross-subject transfer in MI-based BCIs. These methods can be broadly classified into three groups based on the type of transferred knowledge: instance [14, 15], feature [16,17,18,19], and classifier [20,21,22,23,24] transfer. The fundamental concept behind instance transfer methods is that certain parts of source EEG data are correlative to the target data. For instance, Hossain et al. [14] proposed to choose partial EEG data from source subjects using active learning strategy, which were then combined with limited target EEG data to train the decoding method. For feature transfer methods, the common practice is to leverage the source data to learn a well-suited feature representation for the target domain. Most of these methods are built on common spatial patterns (CSP) algorithm [25] by modifying either the covariance matrix estimation method [16] or the optimization function [17]. Moreover, deep learning approaches have potential in learning domain-invariant feature representations. As in [19], Jeon et al. employed a multiple pathway deep model to learn feature representations of both the selected source EEG data and the target EEG data. Subsequently, it encourages the consistency of these feature representations by minimizing the classification error between the two domains. For the classifier transfer methods, the basic assumption is that the model parameters are shared between the source and target domains. In previous studies [20, 21], sufficient source EEG data were used to train the network, which was then fine-tuned using limited target EEG data. For example, Azab et al. [22] employed the Kullback-Leibler (KL) divergence method to measure the similarity of source and target subjects and then determine the weights assigned to source subjects. Additionally, the ensemble learning [23] and multi-task learning [24] techniques were also exploited to learn the source model parameters to facilitate the construction of target model.
Although effective, the above methods still require a certain amount of labeled target EEG data to construct the target classifier. However, MI-based BCIs rely on spontaneous brain activity, which adversely impacts the construction of target classifier when the target subject is improperly performing MI tasks [26]. In practice, there may exist mislabeling EEG data during the calibration session [27] for a new subject using MI-based BCIs from scratch. This poses a challenge to establish a reliable EEG decoding model on the target subject. Recent studies have demonstrated the efficacy of unsupervised domain adaptation methods in MI-based BCIs. These methods have demonstrated the capability to learn the domain-invariant features without leveraging the label of target EEG data [28, 29]. For instance, He et al. [29] proposed to map both the source and target EEG data into the Euclidean space and minimize their distribution divergence. This method can obtain promising classification performance only using unlabeled target EEG data. Moreover, certain domain adaptation techniques, including transfer component analysis (TCA) and subspace alignment (SA), have been employed in EEG-based emotion recognition [30]. Most of current domain adaptation methods used for cross-subject EEG recognition belong to the shallow learning method, which rely heavily on the handcrafted features. In recent years, deep domain adaptation [31, 32] has gained increasing popularity for cross-subject EEG classification. For example, Hang et al. [33] proposed a deep domain adaptation network for cross-subject EEG classification. Besides, Song et al. [34] developed a domain adaptation method by utilizing an attention-based adaptor to facilitate the transfer of source features to the target domain for cross-subject EEG decoding. Xu et al. [35] proposed a contrastive learning-based unsupervised multi-source domain adaptation method for learning subject-independent representations in MI EEG signals. Existing domain adaptation methods in the context of MI-based BCIs primarily emphasize the learning of shared feature representations through domain alignment strategies. However, domain shift cannot be completely removed, target EEG samples located near the edge of clusters are also susceptible to misclassification. To address this issue, we propose a novel adaptive deep feature representation framework to adaptively learn transferable EEG feature representations through jointly adapting both features and classifier.
The diagram of the proposed ADFR framework for cross-subject EEG decoding. Source and target EEG data are first pass through convolution layers to learn deep feature representations. The MMD regularization and IDFL regularization are introduced to learn transferable and discriminative EEG features. The EM regularization is then used to adjust the classifier to pass through the low-density region between clusters
Methods
We present a comprehensive overview of the proposed ADFR framework for cross-subject MI-based EEG decoding. Figure 2 illustrates the diagram of the proposed ADFR, which integrates domain alignment, deep discriminative feature learning, and low-density separation in a unified framework. The subsequent sections will give the detailed explanation of each component.
Deep EEG feature representation
Suppose source subject \({D_s}\) consists of \({N_s}\) labeled EEG trials \({D_s} = \left\{ {\left( {{{\textbf {X}}}_i^s,{{\textbf {y}}}_i^s} \right) } \right\} _{i = 1}^{{N_s}}\), where \({{\textbf {X}}}_i^s \in {{\mathbb {R}}^{e \times t}}\) represents EEG data with e electrodes and t sampling points. \({{\textbf {y}}}_i^s \in {{\mathbb {R}}^C}\) is the corresponding label. Suppose target subject \({D_t}\) consists of \({N_t}\) EEG data, i.e., \({D_t} = \left\{ {{{\textbf {X}}}_i^t} \right\} _{i = 1}^{{N_t}}\). Our objective is to learn a deep network \(y = f\left( {{\textbf {X}}} \right)\) to predict the label of target subject EEG data with the given labeled of source subject EEG data.
Deep learning can automatically learn high-level EEG features, which is emerging as the dominant paradigm in MI-based EEG decoding [36,37,38,39,40]. Inspired by the classical filter bank common spatial patterns (FBCSP) algorithm [41], Schirrmeister et al. developed a MI-based EEG decoding, i.e., Shallow ConvNet [36]. As depicted in Fig 3, Shallow ConvNet consists of three main blocks. The first block comprises two convolution layers, which is used for capturing temporal information. The second block involves a single convolution layer that performs spatial filtering. Subsequently, a squaring nonlinearity, a mean pooling, and a logarithmic activation operation are designed to emulate the operation in FBCSP. The third block is the classification layer. The loss function \({{{\mathcal {L}}}_S}\) of Shallow ConvNet is formulated as follows:
where \(f\left( {{{\textbf {X}}}_i^s} \right)\) is the predictions of the source EEG data \({{\textbf {X}}}_i^s\). \(L\left( \cdot \right)\) denotes the cross-entropy loss.
Domain alignment
Previous studies have demonstrated that features extracted by deep neural networks transition from general representations to task-specific representations as the network depth increases [42]. Consequently, the EEG features learned from the convolutional layers can be effectively shared with the target subject, as they capture more generic information. However, features at higher layers are more subject-specific, resulting in a significant decrease in their transferability as cross-subject variability increases. To this end, we minimize the distribution discrepancy between the deep features extracted from the source and target subjects. Herein, we employ MMD method to calculates the squared distance between the means of the feature distributions in the reproducing kernel Hilbert spaces (RKHS) \({{\mathcal {H}}}\). After aligning the source and target feature distributions, we can effective adapt to the target subject while retaining the subject-invariant properties encoded in the shared features. Let \({{{\textbf {H}}}^s} = \left\{ {{{\textbf {h}}}_i^s} \right\} _{i = 1}^{{N_s}}\) and \({{{\textbf {H}}}^t} = \left\{ {{{\textbf {h}}}_i^t} \right\} _{i = 1}^{{N_t}}\) represent the deep feature representations of the source and target EEG data, respectively. Then, we minimize the domain discrepancy using the squared MMD as follows:
where \(\varphi \left( \cdot \right)\) represents the nonlinear feature mapping function. \(k\left( { \cdot , \cdot } \right)\) is the kernel function derived from \(\varphi \left( \cdot \right)\), and \(k\left( {{{{\textbf {h}}}_i},{{{\textbf {h}}}_j}} \right) = \varphi {\left( {{{{\textbf {h}}}_i}} \right) ^T} \cdot \varphi \left( {{{{\textbf {h}}}_j}} \right)\).
In Eq. (2), calculating the sum of similarities between pairs of all data instances presents a challenging task when dealing with large-scale datasets [42]. To reduce the computational complexity, we reformulate Eq. (2) by employing the linear-time unbiased estimate of MMD [42, 43]:
where \({{{\textbf {e}}}_i} \buildrel \Delta \over = \left( {{{\textbf {h}}}_{2i - 1}^s,{{\textbf {h}}}_{2i}^s,{{\textbf {h}}}_{2i - 1}^t,{{\textbf {h}}}_{2i}^t} \right)\) denotes the quad-tuple. \(\phi \left( {{{{\textbf {e}}}_i}} \right)\) can be calculated using the kernel function k on each quad-tuple \({{{\textbf {e}}}_i}\):
Discriminative feature learning
Improving the intra-class compactness and inter-class separability of target EEG data helps improve the classification performance of target EEG data. However, it proves challenging in the absence of supervision information. An alternative approach is to enhance the discriminative capability of the source EEG feature representations in the shared feature space. Combined with MMD regularization, it subsequently leads to increased discriminability of the target EEG features through feature alignment. Therefore, the target EEG data can exhibit better separability in the absence of label information. Specifically, we employ a discriminative feature learning technique to enhance the intra-class compactness and inter-class separability of the source EEG features [44]. We introduce an instance-based discriminative feature learning regularization, which can be formulated as follows:
where \({{{\textbf {M}}}_{ij}} = 1\) and \({{{\textbf {M}}}_{ij}} = 0\) indicates \({{\textbf {h}}}_i^s\) and \({{\textbf {h}}}_j^s\) belong to the same or different classes, respectively. From Eq. (6), we can find that \({{{\mathcal {L}}}_D}\) enforce the distance between EEG data from same class no more than \({d_1}\) as well as the distance between EEG data from different class at least \({d_2}\).
Let \({D_{ij}} = {\left\| {{{\textbf {h}}}_i^s - {{\textbf {h}}}_j^s} \right\| _2}\) denotes the distance between the features \({{\textbf {h}}}_i^s\) and \({{\textbf {h}}}_j^s\), Eq. (5) can be reformulated as:
where the operators \(\circ\) and \({\left\| \cdot \right\| _{sum}}\) denote the element-wise multiplication and the sum of all the elements, respectively. Additionally, the tradeoff parameter \(\beta\) is used to balance the intra-class compactness and inter-class separability within the discriminative feature learning process.
Entropy minimization
Although maximum mean discrepancy regularization and instance-based discriminative feature learning regularization can reduce the distribution discrepancy of source and target domains, it is generally impractical to entirely eliminate the distribution discrepancy that exists across subjects, as shown in Fig. 1c. Besides, due to the absence of supervision information of target EEG data, the learned classifier may be biased towards the source domain. However, most current domain adaptation methods in EEG decoding ignore above issue.
To address the aforementioned issue, it is better to enable the classifier automatically adjust itself to past through the low-density regions and generate high-confident predictions [45]. To improve the classification performance of the model on the target EEG data, we introduce an entropy minimization regularization to encourage the classifier past through the low-density regions between different clusters. Specifically, the entropy minimization regularization can be expressed as follows:
where \(f\left( {{{\textbf {X}}}_i^t} \right)\) the prediction of target EEG data \({{\textbf {X}}}_i^t\). Equation (8) can make the classifier f adjust itself to past through the low-density of target EEG data, thereby further improving the classification performance of target subject.
Objective function
Overall, the objective function of the proposed adaptive deep feature representation framework integrates maximum mean discrepancy regularization, instance-based discriminative feature learning regularization, and entropy minimization regularization within a unified framework, which can be formulated as:
Here, \({{{\mathcal {L}}}_S}\) denotes the cross-entropy loss used for source EEG data. \({\lambda _1}\), \({\lambda _2}\) and \({\lambda _3}\) are the trade-off parameters for balancing the maximum mean discrepancy loss \({{{\mathcal {L}}}_M}\), the instance-based discriminative feature learning loss \({{{\mathcal {L}}}_D}\), and the entropy minimization loss \({{{\mathcal {L}}}_E}\), respectively.
Experiments and results
To evaluate the efficacy of the proposed ADFR framework for MI-based EEG decoding, we conduct comprehensive experiments on two publicly available EEG datasets, i.e., Dataset IVa of BCI Competition III and Dataset IIa of BCI Competition IV [46]. We firstly describe the employed EEG datasets. Then, we outline the data preprocessing steps. Subsequently, we list the comparison methods, along with their corresponding parameters. Finally, we present the experimental results and provide the detailed analysis.
EEG preparation and preprocessing
-
Dataset IVa of BCI Competition III (Dataset 1): This dataset comprises 118-channel EEG signals for five subjects (denoted as aa, al, av, aw, and ay). The signals were the sampled at a rate of 100Hz. During each trial, subjects were asked to perform either a right hand or foot MI-based tasks in response to visual cues. For each subject, 280 trials were collected. In the experiment, we randomly select two subjects to form the source and target domain, allowing us to generate \(C_5^2\)= 10 domain adaptation tasks. We then exchange the source/target pairs, resulting in an additional set of 10 domain adaptation tasks. Consequently, we have a total of 20 domain adaptation tasks for this dataset.
-
Dataset IIa of BCI Competition IV (Dataset 2): EEG signals were acquired from 22 electrodes with a sampling rate of 250 Hz. During the experimental trials, nine subjects (denoted as S1, S2, S3, S4, S5, S6, S7, S8, and S9) were asked to perform four MI tasks, i.e., left hand, right hand, feet and tongue MI-based tasks. 576 trials were collected per subject. In a similar manner to Dataset 1, we randomly select two subjects to form source/target pairs, resulting in a total of \(C_9^2\) = 36 domain adaptation tasks. By exchanging the source/target pairs, we generate another set of 36 domain adaptation tasks. In total, we obtain 72 domain adaptation tasks for analysis and evaluation.
For both Dataset 1 and Dataset 2, the interval of [0.5, 3] seconds after the cue of each trial were used in our experiment. To preprocess the EEG signals, we applied a fifth-order Butterworth filter to bandpass filter EEG signals between 8Hz and 30Hz for two datasets. This step aims to retain relevant frequency components associated with MI tasks.
Experimental setting
We conduct a comprehensive comparison of the proposed ADFR framework with several baseline methods and state-of-the-art domain adaptation approaches, including:
-
Shallow ConvNet (EEG_ConvNet) [36]
-
Subspace Alignment (SA) [30]
-
Transfer Component Analysis (TCA) [47]
-
Transfer Joint Matching (TJM) [48]
-
Deep Domain Confusion (DDC) [49]
-
Deep Correlation Alignment (D_CORAL) [50]
-
Our proposed ADFR framework.
In the experiment, SA, TCA and TJM belong to shallow domain adaptation methods. For a fair comparison, we utilize the deep features learned from EEG_ConvNet as input for these comparison methods. We employ k-Nearest Neighbor (kNN) as the base classifier for these methods. Moreover, we determine the optimal value of k through a 5-fold cross-validation strategy, considering values ranging from 1 to 10. It is important to mention that EEG_ConvNet serves as the network backbone for all the comparison methods. The detailed architecture of EEG_ConvNet is illustrated in Table 1.
Additionally, DDC, D_CORAL and ADFR are deep domain adaptation methods. For these three comparison methods, we utilize the raw EEG data as input. Regarding the TCA, DDC and ADFR, we employ Radial Basis Function (RBF) kernel \(k\left( {{x_i},{x_j}} \right) = {e^{{{ - {{\left\| {{x_i} - {x_j}} \right\| }^2}}/ \sigma }}}\) for all tasks. We set the kernel width \(\sigma\) with the median squared distances between training instances [43]. For DDC, the trade-off parameter \(\lambda\) balances domain matching loss and supervised loss. We gradually update it from 0 to 1 during training through the function \(\lambda = \frac{2}{{1 + \exp \left( { - \eta p} \right) }} - \mathrm{{1}}\) [42]. Here, p denotes the training progress linearly changes from 0 to 1 and \(\eta = 10\). Moreover, we employ the same setting for the parameters \({\lambda _1}\) in ADFR. For the instance-based discriminative loss, we set the parameters \({\lambda _2}\), \({d_1}\) and \({d_2}\) to 0.01, 0 and 100, respectively. For entropy minimization loss, the parameter \({\lambda _3}\) is set to 0.01. For EEG_ConvNet, DDC, D_CORAL and ADFR, the learning rate is set to \(1e - 3\). Besides, the batch size is set to 72.
Results on dataset 1
Table 2 lists the classification performance obtained by seven comparison methods on Dataset 1. The highest classification results for each subject are highlighted in bold. Based on the experimental results from all four datasets, we can make the following observations. When deep features are used as input, the shallow domain adaptation methods, i.e., SA, TCA, and TJM, generally surpass the baseline method EEG_ConvNet in most cases. However, in certain instances, such as when subject av serves as the source domain and subject aa as the target domain, the baseline method EEG_ConvNet outperforms the domain adaptation method TCA. We attribute this discrepancy to the fine-tuning procedure employed by EEG_ConvNet, which allows it to benefit from additional optimization steps. In general, the experimental results demonstrate that deep domain adaptation methods can obtain better classification performance than shallow domain adaptation methods. This observation confirms the advantages of integrating domain adaptation strategies with deep neural networks, resulting in improved transfer learning performance. It is notably that our ADFR framework achieves best classification performance compared to other comparison methods across all tasks on Dataset 1. The promising results may be attributed to the fact that our ADFR not only learns the shared and discriminative feature representations but also allows the model to adaptively pass through the low-density regions of the target EEG data. These results further demonstrate that domain alignment and discriminative feature learning are insufficient to fully eliminate distribution divergence between two domains. The lack of supervisory information of target domian can lead to the learned classifier being biased toward the source domain.
Specifically, the proposed ADFR framework achieves an average classification accuracy of 76.48%. Notably, ADFR can achieve better classification performance than other comparison methods across all tasks. Compared to the baseline method EEG_ConvNet and the competitive method D_CORAL, ADFR shows an absolute increase in average classification accuracy by 13.88% and 3.00%, respectively. Additionally, ADFR outperforms the shallow domain adaptation methods SA, TCA and TJM by average of 10.38%, 8.18% and 7.23%, respectively. These results demonstrate the effectiveness of the fine-tuning procedure for promoting feature alignment and discriminative feature learning. Furthermore, in comparison to the deep domain adaptation method DDC, ADFR shows a 3.93% improvement in average accuracy. The above experimental results verify the EEG decoding ability of the proposed ADFR framework, which can jointly adapt the feature representations and classifier.
To gain a better visualization of the learned features by our ADFR framework, we visualize the deep feature representations using t-SNE embeddings method [42]. Without loss of generality, the first domain adaptation task (aa/al) was selected, and their deep features were visualized as obtained by the baseline method EEG_ConvNet, the deep domain adaptation methods DDC, D_CORAL, and the proposed ADFR. To enhance feature visualization, we adopt distinct colors to indicate features from different classes. As illustrated in Fig. 4, it is evident that the deep features of different categories learned by EEG_ConvNet tend to mix together. By considering domain alignment, domain adaptation methods DDC and D_CORAL demonstrate improved discriminative feature learning. However, the points located near the edges of the clusters are still prone to be misclassified. Notably, the feature representations learned by the proposed ADFR exhibit more separation compared to the other comparison methods. The feature visualization results verify the benefit of transferable feature learning schemes of ADFR. The above promising experimental results verify the EEG decoding ability of our ADFR framework, which integrates distribution divergence minimization regularization, discriminative feature learning regularization and low-density separation regularization. The synergistic learning between these regularizations during the training process enhances EEG decoding performance across subjects.
Results on dataset 2
Figures 5a–i illustrate the classification results of seven comparison methods across 72 tasks on Dataset 2. From Fig. 5, it can be seen that deep domain adaptation methods consistently achieve higher classification accuracies than shallow methods, particularly on the target subjects S3, S7, S8 and S9. In addition, ADFR outperforms other deep domain adaptation methods in almost all cases. For domain adaptation tasks, such as S3/S4, S6/S5 and S8/S5, TJM demonstrates better classification performance. By jointly matching the distribution between two domains and reweighting the source samples, TJM can effectively select the relevant source data, thereby reducing the domain differences. Overall, our proposed ADFR framework consistently outperforms EEG_ConvNet in terms of classification accuracy across all tasks. Furthermore, compared to the domain adaptation methods SA, DDC and D_CORAL, ADFR yielded the highest classification accuracy for 71 out of 72 tasks. Additionally, ADFR outperformed TJM and TCA in 69 and 68 out of 72 domain adaptation tasks on Dataset 2, respectively.
To provide a more comprehensive comparison, we present the average classification accuracies of all 72 tasks for each comparison methods, as shown in Fig. 6. As observed, ADFR shows a 10.3% improvement in average classification accuracy than the baseline method EEG_ConvNet. In comparison to the most competitive method D_CORAL, ADFR shows a 2.12% improvement in average classification accuracy. These promising results verify the effectiveness of the proposed ADFR in considering distribution matching, discriminative feature learning, and low-density separation. Furthermore, the experimental results highlight that simultaneously adapting feature representations and classifier can significantly enhance the transferable feature learning capabilities in cross-subject EEG decoding.
Empirical analysis
To assess the statistical significance of the results, the pairwise two-tailed t-tests were employed to identify the significant differences between the results of our ADFR method and other comparison methods. The results of statistical tests for 20 and 72 tasks from Dataset 1 and Dataset 2 are summarized in Table 3. In the experiment, the significance level of 0.05 was applied to all statistical tests, with p-values under 0.05 highlighted in bold. The results indicate that for all cases, we can reject the null hypothesis with a 95% confidence level. This indicates that the proposed ADFR framework significantly outperforms the remaining methods with a significance level of 0.05.
We further conducted the experiments to evaluate the influence of hyper-parameters \({\lambda _1}\), \({\lambda _2}\) and \({\lambda _3}\) on the classification performance of the proposed ADFR, as presented in Fig. 7. Due to space limitations, we conducted experiments on domain adaptation tasks S3\(\rightarrow\)S1 and S8\(\rightarrow\)S3. In the experiment, we fixed one parameter and changed another to observe the classification results of ADFR. In Fig. 7a, we fixed hyper-parameters \({\lambda _2}\) and \({\lambda _3}\) as 0.01 and 0.1, and vary the hyper-parameters \({\lambda _1}\) from the set \(\left\{ {0.01,0.05,0.1,0.5,1,3,5} \right\}\). In Fig. 7b, we fixed hyper-parameters \({\lambda _1}\) and \({\lambda _3}\) as 1 and 0.1, and vary the hyper-parameters \({\lambda _2}\) from the set \(\left\{ {0.001,0.005,0.01,0.02,0.05,0.1,1} \right\}\). In Fig. 7c, we fixed hyper-parameters \({\lambda _1}\) and \({\lambda _2}\) as 1 and 0.01, and vary the hyper-parameters \({\lambda _3}\) from the set \(\left\{ {0.005,0.01,0.02,0.05,0.1,0.5,1} \right\}\). When \({\lambda _2}=0.01\) and \({\lambda _3}=0.1\), with the increase of \({\lambda _1}\), the test accuracies are enhanced accordingly, demonstrating that maximum mean discrepancy regularization brought gains to the classification results. As \({\lambda _1}\) continues to increase, the average test accuracy degrades, which means that ignoring other losses may undermine the classification performance. We can observe the similar phenomena for parameters \({\lambda _2}\) and \({\lambda _1}\). Generally, ADFR demonstrates stable classification performance across different parameter settings. These findings highlight the robustness and effectiveness of ADFR.
Discussion
The proposed ADFR framework integrates MMD regularization, IDFL regularization, and EM regularization to ensure that the learned model fits the target EEG data as well as possible. The MMD measurement requires estimation of the means of both source and target EEG features, which might be highly inaccurate when the available data is limited. Nevertheless, the experimental results demonstrated that our proposed method can achieve superior classification performance even using a single source subject through the synergistic learning between three regularizations. To further enhance EEG decoding performance, future work will aim to incorporate multiple available source subjects (as demonstrated in [31, 32], where the negative impact of inaccurate MMD measurement can be mitigated).
Another issue to discuss is the use of IDFL regularization in our objective function. The discriminative feature learning strategy requires supervisory information when applied to the target subject. Existing methods typically rely either on pseudo-labels generated by the source model [51] or on a small amount of labeled calibration data from the target subject [52]. However, the source model may generate erroneous pseudo-labels due to the significant domain shift between source and target subjects. These unreliable pseudo-labels for target EEG data can disrupt model training, ultimately degrading the classification performance for target subjects. Additionally, the absence of labeled target calibrated EEG data may render adaptive methods ineffective in certain scenarios. In the absence of supervisory information, our method seeks to enhance the discriminative capability of the source EEG feature representations. This improvement in source feature discriminability facilitates the increased discriminability of target EEG features through the feature alignment strategy (i.e., MMD regularization).
Regarding the EM regularization, it is more commonly employed in semi-supervised learning [53] and the increasingly popular test-time adaptation problems [54]. A common practice in test-time adaptation is to disregard the data used during training, primarily due to high memory requirements and concerns over privacy leakage. However, training data serve as the only source of supervision, and the absence of training data can significantly impact the effectiveness of adaptation [55]. In this study, we innovatively introduced EM regularization into domain adaptation, significantly enhancing the performance of cross-subject EEG decoding.
Conclusions
In this study, we introduce a novel adaptive deep feature representation framework termed ADFR, aiming to facilitate cross-subject EEG decoding. ADFR can adaptively learn transferable EEG feature representations by simultaneously manipulating the EEG data and the classifier. ADFR integrates three key components: maximum mean discrepancy regularization, instance-based discriminative feature learning regularization and entropy minimization regularization. By employing maximum mean discrepancy regularization, the proposed ADFR can reduce the distribution gap between the source and target subjects. Then, the instance-based discriminative feature learning regularization makes the learned feature representation more discriminative. We further utilize the entropy minimization regularization to adjust the classifier to pass through the low-density region between clusters. The comprehensive experimental results on publicly available EEG datasets demonstrated that ADFR can yield improved classification performance than comparison methods.
The proposed ADFR demonstrates a substantial increase in classification accuracy across the majority of tasks. However, for a few subjects, the observed improvement was less pronounced. This could be attributed to the fact that directly using all the source data for domain adaptation may be ineffective since not all source EEG data are relevant to the target subject. The finding indicates the necessary of selecting relevant source EEG data in domain adaptation scenarios to enhance the EEG decoding performance for target data. Future research may develop a selective transfer learning strategy to adaptively identify the related source EEG data, which may further enhance the target EEG decoding performance.
Availability of data and materials
The publicly available Dataset 1 and Dataset 2 can be accessed at https://www.bbci.de/competition/iii/ and https://www.bbci.de/competition/iv/.
References
Wolpaw JR, Birbaumer N, McFarland DJ, Pfurtscheller G, Vaughan TM. Brain-computer interfaces for communication and control. Clin Neurophys. 2002;113(6):767–91.
Ang KK, Guan C. Eeg-based strategies to detect motor imagery for control and rehabilitation. IEEE Trans Neural Syst Rehabilit Eng. 2016;25(4):392–401.
Grosse-Wentrup M, Mattia D, Oweiss K. Using brain-computer interfaces to induce neural plasticity and restore function. J Neural Eng. 2011;8(2):025004.
Haq I, Mazhar T, Malik MA, Kamal MM, Ullah I, Kim T, Hamdi M, Hamam H. Lung nodules localization and report analysis from computerized tomography (ct) scan using a novel machine learning approach. Appl Sci. 2022;12(24):12614.
He B, Baxter B, Edelman BJ, Cline CC, Wenjing WY. Noninvasive brain-computer interfaces based on sensorimotor rhythms. Proc IEEE. 2015;103(6):907–25.
Krauledat M, Schröder M, Blankertz B, Müller K-R. Reducing calibration time for brain-computer interfaces: a clustering approach. Adv Neural Inf Proc Syst. 2006;19:1023.
Lotte F. Signal processing approaches to minimize or suppress calibration time in oscillatory activity-based brain-computer interfaces. Proc IEEE. 2015;103(6):871–90.
Ahn M, Jun SC. Performance variation in motor imagery brain-computer interface: a brief review. J Neurosci Methods. 2015;243:103–10.
Jayaram V, Alamgir M, Altun Y, Scholkopf B, Grosse-Wentrup M. Transfer learning in brain-computer interfaces. IEEE Computat Intell Magaz. 2016;11(1):20–31.
Saqib SM, Iqbal M, Asghar MZ, Mazhar T, Almogren A, Rehman AU, Hamam H. Cataract and glaucoma detection based on transfer learning using mobilenet. Heliyon. 2024;10(17):10.
Wan Z, Yang R, Huang M, Zeng N, Liu X. A review on transfer learning in eeg signal analysis. Neurocomputing. 2021;421:1–14.
Chen Y, Hang W, Liang S, Liu X, Li G, Wang Q, Qin J, Choi K-S. A novel transfer support matrix machine for motor imagery-based brain computer interface. Front Neurosci. 2020;14:606949.
Liang S, Hang W, Lei B, Wang J, Qin J, Choi K.-S, Zhang Y. Adaptive multimodel knowledge transfer matrix machine for EEG classification. IEEE Transactions on Neural Networks and Learning Systems (2022)
Hossain I, Khosravi A, Nahavandhi S. Active transfer learning and selective instance transfer with active learning for motor imagery based bci. In: 2016 International Joint Conference on Neural Networks (IJCNN), pp. 4048–4055 (2016). IEEE
Zanini P, Congedo M, Jutten C, Said S, Berthoumieu Y. Transfer learning: a riemannian geometry framework with applications to brain-computer interfaces. IEEE Trans Biomed Eng. 2017;65(5):1107–16.
Kang H, Nam Y, Choi S. Composite common spatial pattern for subject-to-subject transfer. IEEE Signal Proc Lett. 2009;16(8):683–6.
Samek W, Kawanabe M, Müller K-R. Divergence-based framework for common spatial patterns algorithms. IEEE Rev Biomed Eng. 2013;7:50–72.
Samek W, Meinecke FC, Müller K-R. Transferring subspaces between subjects in brain-computer interfacing. IEEE Trans Biomed Eng. 2013;60(8):2289–98.
Jeon E, Ko W, Suk H.-I. Domain adaptation with source selection for motor-imagery based bci. In: 2019 7th International Winter Conference on Brain-Computer Interface (BCI), pp. 1–4 (2019). IEEE
Sakhavi S, Guan C. Convolutional neural network-based transfer learning and knowledge distillation using multi-subject data in motor imagery bci. In: 2017 8th international IEEE/EMBS conference on neural engineering (NER), pp. 588–591 (2017). IEEE
Fahimi F, Zhang Z, Goh WB, Lee T-S, Ang KK, Guan C. Inter-subject transfer learning with an end-to-end deep convolutional neural network for eeg-based bci. J Neural Eng. 2019;16(2):026007.
Azab AM, Mihaylova L, Ang KK, Arvaneh M. Weighted transfer learning for improving motor imagery-based brain-computer interface. IEEE Trans Neural Syst Rehabilit Eng. 2019;27(7):1352–9.
Tu W, Sun S. A subject transfer framework for eeg classification. Neurocomputing. 2012;82:109–16.
Alamgir M, Grosse-Wentrup M, Altun Y. Multitask learning for brain-computer interfaces. In: proceedings of the thirteenth international conference on artificial intelligence and statistics, JMLR Workshop and Conference Proceedings. pp. 17–24 (2010)
Ramoser H, Muller-Gerking J, Pfurtscheller G. Optimal spatial filtering of single trial eeg during imagined hand movement. IEEE Trans Rehabilitat Eng. 2000;8(4):441–6.
Ang K.K, Guan C, Wang C, Phua K.S, Tan A.H.G, Chin Z.Y. Calibrating eeg-based motor imagery brain-computer interface from passive movement. In: 2011 annual international conference of the IEEE engineering in medicine and biology society, pp. 4199–4202 (2011). IEEE
Hübner D, Verhoeven T, Schmid K, Müller K-R, Tangermann M, Kindermans P-J. Learning from label proportions in brain-computer interfaces: Online unsupervised learning with guarantees. PloS one. 2017;12(4):0175856.
Vidaurre C, Kawanabe M, Bünau P, Blankertz B, Müller K-R. Toward unsupervised adaptation of lda for brain-computer interfaces. IEEE Transactions on Biomedical Engineering. 2010;58(3):587–97.
He H, Wu D. Transfer learning for brain-computer interfaces: a Euclidean space data alignment approach. IEEE Trans Biomed Eng. 2019;67(2):399–410.
Lan Z, Sourina O, Wang L, Scherer R, Müller-Putz GR. Domain adaptation techniques for EEG-based emotion recognition: a comparative study on two public datasets. IEEE Trans Cognit Develop Syst. 2018;11(1):85–94.
Hong X, Zheng Q, Liu L, Chen P, Ma K, Gao Z, Zheng Y. Dynamic joint domain adaptation network for motor imagery classification. IEEE Trans Neural Syst Rehabilit Eng. 2021;29:556–65.
Zhao H, Zheng Q, Ma K, Li H, Zheng Y. Deep representation-based domain adaptation for nonstationary EEG classification. IEEE Trans Neural Netw Learn Syst. 2020;32(2):535–45.
Hang W, Feng W, Du R, Liang S, Chen Y, Wang Q, Liu X. Cross-subject EEG signal recognition using deep domain adaptation network. IEEE Access. 2019;7:128273–82.
Song Y, Zheng Q, Wang Q, Gao X, Heng P-A. Global adaptive transformer for cross-subject enhanced eeg classification. IEEE Trans Neural Syst Rehabilit Eng. 2023;31:2767–77.
Xu C, Song Y, Zheng Q, Wang Q, Heng P-A. Unsupervised multi-source domain adaptation via contrastive learning for eeg classification. Expert Syst Appl. 2025;261:125452.
Schirrmeister RT, Springenberg JT, Fiederer LDJ, Glasstetter M, Eggensperger K, Tangermann M, Hutter F, Burgard W, Ball T. Deep learning with convolutional neural networks for eeg decoding and visualization. Human Brain Mapping. 2017;38(11):5391–420.
Lawhern VJ, Solon AJ, Waytowich NR, Gordon SM, Hung CP, Lance BJ. Eegnet: a compact convolutional neural network for EEG-based brain-computer interfaces. J Neural Eng. 2018;15(5):056013.
Lu N, Li T, Ren X, Miao H. A deep learning scheme for motor imagery classification based on restricted boltzmann machines. IEEE Trans Neural Syst Rehabilit Eng. 2016;25(6):566–76.
Zhang P, Wang X, Zhang W, Chen J. Learning spatial-spectral-temporal eeg features with recurrent 3d convolutional neural networks for cross-task mental workload assessment. IEEE Trans Neural Syst Rehabilit Eng. 2018;27(1):31–42.
Sakhavi S, Guan C, Yan S. Learning temporal information for brain-computer interface using convolutional neural networks. IEEE Trans Neural Netw Learn Syst. 2018;29(11):5619–29.
Ang K.K, Chin Z.Y, Zhang H, Guan C. Filter bank common spatial pattern (fbcsp) in brain-computer interface. In: 2008 IEEE international joint conference on neural networks (IEEE World Congress on Computational Intelligence), pp. 2390–2397 (2008). IEEE
Long M, Cao Y, Cao Z, Wang J, Jordan MI. Transferable representation learning with deep adaptation networks. IEEE Trans Patt Anal Mach Intell. 2018;41(12):3071–85.
Gretton A, Sejdinovic D, Strathmann H, Balakrishnan S, Pontil M, Fukumizu K, Sriperumbudur BK. Optimal kernel choice for large-scale two-sample tests. Adv Neural Inf Process Syst. 2012;25:10245.
Chen C, Chen Z, Jiang B, Jin X. Joint domain alignment and discriminative feature learning for unsupervised deep domain adaptation. Proc AAAI Conf Artif Intell. 2019;33:3296–303.
Ma N, Bu J, Lu L, Wen J, Zhou S, Zhang Z, Gu J, Li H, Yan X. Context-guided entropy minimization for semi-supervised domain adaptation. Neural Netw. 2022;154:270–82.
Hang W, Feng W, Liang S, Wang Q, Liu X, Choi K-S. Deep stacked support matrix machine based representation learning for motor imagery eeg classification. Comput Methods Progr Biomed. 2020;193:105466.
Zhang Y.-Q, Zheng W.-L, Lu B.-L. Transfer components between subjects for eeg-based driving fatigue detection. In: Neural Information Processing: 22nd International Conference, ICONIP 2015, November 9-12, 2015, Proceedings, Part IV 22, pp. 61–68 (2015). Springer
Long M, Wang J, Ding G, Sun J, Yu P.S. Transfer joint matching for unsupervised domain adaptation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1410–1417 (2014)
Tzeng E, Hoffman J, Zhang N, Saenko K, Darrell T. Deep domain confusion: Maximizing for domain invariance. arXiv preprint arXiv:1412.3474 (2014)
Sun B, Saenko K. Deep coral: Correlation alignment for deep domain adaptation. In: Computer Vision–ECCV 2016 Workshops: Amsterdam, The Netherlands, October 8-10 and 15-16, 2016, Proceedings, Part III 14, pp. 443–450 (2016). Springer
Heremans ER, Phan H, Borzée P, Buyse B, Testelmans D, De Vos M. From unsupervised to semi-supervised adversarial domain adaptation in electroencephalography-based sleep staging. J Neural Eng. 2022;19(3):036044.
Wei F, Xu X, Jia T, Zhang D, Wu X. A multi-source transfer joint matching method for inter-subject motor imagery decoding. IEEE Trans Neural Syst Rehabilit Eng. 2023;31:1258–67.
Chen Y, Tan X, Zhao B, Chen Z, Song R, Liang J, Lu X. Boosting semi-supervised learning by exploiting all unlabeled data. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7548–7557 (2023)
Li S, Wang Z, Luo H, Ding L, Wu D. T-time: Test-time information maximization ensemble for plug-and-play bcis. IEEE Transactions on Biomedical Engineering (2023)
Kang J, Kim N, Kwon D, Ok J, Kwak S. Leveraging proxy of training data for test-time adaptation. In: proceedings of the 40th international conference on machine learning, pp. 15737–15752 (2023)
Acknowledgements
Not applicable.
Funding
This work was supported by the National Natural Science Foundation of China under Grants (61902197), Natural Science Research of Jiangsu Higher Education Institutions of China (23KJB520012), and Postgraduate Research & Practice Innovation Program of Jiangsu Province (KYCX23_1073).
Author information
Authors and Affiliations
Contributions
S.L proposed the algorithm. L.L and W.Z coded the algorithm. W.F designed and evaluated the experiments. W.H wrote the manuscript. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Liang, S., Li, L., Zu, W. et al. Adaptive deep feature representation learning for cross-subject EEG decoding. BMC Bioinformatics 25, 393 (2024). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12859-024-06024-w
Received:
Accepted:
Published:
DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12859-024-06024-w