Vous êtes sur la page 1sur 6

Mental Task Classification for Brain Computer Interface Applications

Kouhyar Tavakolian, Engineering Science, Simon Fraser University Faratash Vasefi, Engineering Science, Simon Fraser University Kaveh Naziripour, Engineering Science, Simon Fraser University Siamak Rezaei, Computer Science, University of Northern British Columbia

Abstract: In this work the application of different machine learning techniques for classification of mental tasks from Electroencephalograph (EEG) signals is investigated. The main application for this research is the improvement of brain computer interface (BCI) systems. For this purpose, Bayesian graphical network, Neural Network, Bayesian quadratic, Fisher linear and Hidden Markov Model classifiers are applied to two known EEG datasets in the BCI field. The Bayesian network classifier is used for the first time in this work for classification of EEG signals. The Bayesian network appeared to have significant accuracy. In addition to classical correct classification accuracy criteria, the mutual information is also used to compare the classification results with other BCI groups. Keywords: EEG, brain computer interface, Bayesian network classifier, neural networks, mutual information.

1. INTRODUCTION

Mental

task classification by recognizing Electroencephalographic (EEG) patterns is an important and challenging biomedical signal processing problem. Such classification can be utilized to enable a patient to communicate without any overt physical movement. This is done just by the computer processing of the patients brain waves as can be seen in the block diagram of Figure 1. Developments of faster digital computers and better EEG devices have motivated many researchers to work on BCI systems [1] [2]. So far the accuracy of classification has been one of the main pitfalls of the developed BCI systems which directly affects the decisions made as the BCI output. This accuracy is affected by the quality of EEG signal and the processing algorithms. The processing algorithms include preprocessing, feature extraction and feature classification. In our previous research the effect of different feature extraction algorithms [3] and different number of EEG channels [4] on classification accuracy was investigated. In the current work, the effects of different types of classifiers on the accuracy of classification are investigated and compared. In the present research, the classification of mental tasks using the Purdue University EEG dataset [5] and the EEG

dataset from Department of Medical Informatics, University of Technology Graz [5] are investigated. Both datasets are known and well established datasets in the BCI field and are accessible from internet. Autoregressive (AR) and adaptive autoregressive coefficients (AAR) were extracted from the EEG windows for all classifiers. These extracted features were inputted to the next stage of BCI, which is the classifier. The same extracted features for all classifiers facilitated the comparison of classifiers efficiencies. The main focus was on investigation and comparison of feed forward neural network, Bayesian quadratic, Bayesian network, Fisher linear classifier and Hidden Markov Models (HMM) in mental task classification. These classifiers are known methods in the machine learning literature. The classifiers are intentionally chosen to cover both linear and nonlinear methods. The Gaussian mixture model is represented as a Bayesian network and this is the first time that such a classifier is used for the EEG signal classification [7]. We trained the Bayesian network and Hidden Markov model using expectation maximization (EM) algorithm. Mixture models are a type of density models. They are comprised of a number of component functions that in our case were Gaussians. These component functions are combined to provide a multimodal density.

Classifiers:
Mental Task EEG Signals

Preprocessing

Feature Extraction AR or AAR

HMM, Neural Networks, Bayesian networks, Bayes quadratic classifier Fisher linear classifier

Mental Task Recognized (BCI output)

Figure1. Flow of the methodology Following in the paper in the method section first the datasets and preprocessing are introduced and then the five different classifiers are briefly explained. The EEG data was classified by these classifiers and results are presented in the tables and the figure of the result section. There is a discussion of results and a conclusion at the end of the paper. (e) Geometric figure rotation. The subjects were asked to visualize a particular three dimensional block figure being rotated about an axis. Data was recorded for 10 seconds during each task and each task was repeated five times per session. In this work the algorithms were applied to the subjects having more than one session of EEG signal which were subjects 1, 3, 6 and 5. The eye blinks were removed with two different methods. In the first method a simple time filter was used by excluding sudden jumps of EEG made by the eye movement. To do this the EOG channel of the EEG dataset was used. This was accomplished by calculating the average of the signal on windows of length 20 milliseconds and then removing those windows having averages greater than two times the average of signal in window of length 500 millisecond. In the other method independent component analysis (ICA) was used which resulted in much better classification accuracy [7]. In this paper the results with the time filter are reported because the differences of classifiers were more distinct in this method.

2. METHODS
In this section first the EEG datasets are introduced and then the applied machine learning algorithms are briefly explained as also can be seen in Figure 1. 2.1. EEG recordings and Preprocessing The main differences between the two EEG datasets are that the Purdue dataset is taken during the performance of five mental tasks while for the Graz dataset there are just two mental activities of left and right hand movement. On the other hand, the Graz dataset has many more sessions compared to the Purdue dataset. 2.1.1 Purdue dataset: The Purdue dataset was acquired by Aunon and Keirn [5] in university of Purdue and has been taken from seven subjects during performance of five different mental tasks. An elastic electrode cap was used to record from positions C3, C4, P3, P4, O1, and O2 on the scalp as can be seen in Figure 2. Data was recorded at a sampling rate of 250 Hz with a 12 bit A/D converter. Eye blinks were detected by means of a separate channel of data recorded from two electrodes placed above and below the subject's left eye. The subjects were asked to perform five mental tasks: (a) Baseline task. The subjects were asked to relax as much as possible. (b) Letter task. The subjects were instructed to mentally compose a letter to a friend or relative without vocalizing. (c) Math task. The subjects were given nontrivial multiplication problems, such as 49 times 78. (d) Visual counting task. The subjects were asked to imagine a blackboard and to visualize numbers being written on the board sequentially.

Figure2. The electrode placement for Purdue dataset 2.1.2 Graz dataset: The Graz dataset [6] was recorded from a normal female subject during a feedback session. The task was to control a feedback bar by means of imagery and left or right hand movements. The experiment consisted of 7 runs with 40 trials each. All runs were conducted on the same day with several

minutes break in between them. Three bipolar EEG channels were measured over C3, Cz and C4. The EEG was sampled with 128Hz sampling rate and was filtered between 0.5 and 30Hz. The trials for training and testing were randomly chosen. This can prevent any systematic effect due to the feedback. In this dataset the eye artifact had been already removed. 2.2. Classifiers In this section the five classifiers are briefly introduced for more detailed information one can refer to the references [8] [9] [10] [11] [12] and [13] or other texts on machine learning or pattern recognition. 2.2.1 Bayesian graphical network classifier Bayesian Network is a modeling tool that combines directed acyclic graphs with Bayesian probability. Figure 3 shows the example of Bayesian network which consists of a causal graph combined with an underlying probability distribution. Each node of the network in the figure corresponds to a variable and edges represent causality between these events. The other elements of a Bayesian network are probability distributions associated with each node. With this information the network can model probabilities of complex causal relationships [13]. The graphical model corresponding to the Bayesian network used in this work is shown in Figure 3. Note that the square box in the figure corresponds to the input extracted features. The rectangular box corresponds to the Gaussian mixture components. The square and rectangular nodes represent discrete values while the round node in the figure represents continuous values. The graph structure of this model can be represented by the following adjacency matrix: 011, 001, 000. The Bayesian Network Toolbox (BNT) was used for implementing the classifier [10]. The model was trained using the EM algorithm. EM works by starting with a randomly initialized model (mean and covariance here), and then iteratively refining the model
Node1

parameters to produce a locally optimal maximumlikelihood fit. So, the EM algorithm is composed of two steps. In the first step, each data point undergoes a softassignment to each mixture component. In the second step, the parameters of the model are adjusted to fit the data based on the soft assignment of the previous step [8]. 2.2.2 Neural Networks For this classifier a two layer feed forward neural network was implemented with 20 neurons in the hidden layer and one neuron in the output layer. This structure (using 20 neurons in the hidden layer) was found to be the optimum one according to [3]. We set a 0.5 threshold for the output neuron. Values more than 0.5 and lower than 1 were assigned to one of the tasks and values between 0 and 0.5 to the other task. The network was trained using the error back propagation algorithm. The Neural Network toolbox of Matlab was used for this part of the research. 2.2.3 Hidden Markov Model The Hidden Markov Model is a finite set of states, each of which is associated with a probability distribution which is generally multidimensional like this case. Transitions among the states are governed by a set of probabilities called transition probabilities. In a particular state an outcome or observation can be generated, according to the associated probability distribution [9]. In this research the observation was the extracted EEG features explained previously generated by a Gaussian mixture model that is characterized by three matrices for mean, variance and mixture percentages. There was a transition matrix for moving between our two states. These parameters are all updated by the EM algorithm explained earlier and at the end there will be two trained HMMs corresponding to the two mental tasks.To classify the test vectors given by the 5-fold cross validation scheme the likelihood of them to belong to each of these HMMs were calculated. The one having more likelihood was assigned to that mental task.

Class B/M

Node2 Component 1 or 2

Node3

Gaussian Mean, covariance

Figure3. Gaussian mixture model represented as a simple graphical model. B stands for baseline and M for Multiplication tasks in the Purdue dataset.

Table1. Bayesian Graphical Networks (BNT), Neural Network, Bayes Quadratic classifier, Fisher linear and Hidden Markov Model are compared for classification of binary combinations of five mental tasks. The results in table are averaged over 10 different possible binary combinations of mental tasks. Sub. BNT 1 3 5 6 means 94.072.2 87.433.9 82.482.8 90.312.7 88.573.0 Neural Network 92.482.9 85.044.3 82.613.0 89.393.1 87.383.4 Bayes 93.782.8 89.223.5 86.583.4 92.493.2 90.513.2 Fisher Linear 91.152.7 82.774.1 81.793.1 90.383.1 86.633.3 HMM 70.188.8 64.109.1 62.437.8 64.618.3 65.338.5

2.2.4 Bayesian Quadratic classifier Given a set of classes (mental task here) M characterized by a set of known parameters in model a set of EEG extracted feature vector X belongs to the class (mental task) which has the highest probability. This is shown in (1) and is known as Bayes decision rule (1) X Mk P(M k | X , ) P(M l | X , ), l k To calculate the a posteriori probability shown, Bayes law was used which finally by assuming that features are distributed normally, lead to a quadratic classifier format known as Bayes Quadratic classifier [12]. The parameters are mean and covariance of our training vectors and likelihoods are calculated as stated above. 2.2.5 Linear Fisher Classifier According to equation W T X + w0 in the case of two classes (mental tasks) a linear classifier can assign a negative value to feature vector X, belonging to one mental task and positive values to it belonging to the other class. The aim was to find W that reduces number of misclassification and to do so there were some criterions to be optimized [11]. The approach taken by fisher was to find a linear combination of the variables that separates the two classes as much as possible. That is, the direction is sought, along which the two classes are best separated in some sense. The criterion proposed by Fisher is the ratio between-class to with-in class variances. Formally, a direction w is wanted such that (2) is maximized. | wT (m1 m2 ) | 2 (2) JF = wT SW w

3.1. Purdue dataset results: Binary combinations of five mental tasks were classified for subjects one, three five and six and classifications were done on the total of two or three sessions of EEG dataset, depending on the subject. Considering the number of mental tasks which is five this leads to 10 pairs of binary mental tasks. For this dataset AR coefficients of order six were considered as features vectors. For each individual pair considering the 5-fold cross validation scheme, the classification was performed for 44 times and averaged over all combinations to compute the final classification accuracy.In Table 1, the results for five classifiers can be seen. Each item in Table 1, was calculated by averaging over all ten pairs of mental tasks. The last row of the table is the average of each classification method on all subjects altogether For each individual pair considering the 5-fold cross validation scheme, the classification was performed for 44 times and averaged over all combinations to compute the final classification accuracy. In Table 1, the results for five classifiers can be seen. For each of the results in Table 1 the above results were averaged for all ten pairs of mental tasks. The last row of the table is the average of each classification method on all subjects altogether. As of Table 1, in subject one Bayesian network is better than Bayesian quadratic classifier. In the average for all subjects the Bayesian network is just two percent lower than Bayes quadratic classifier and is better compared to Neural Network, Fisher linear classifier and HMM. On the other hand the standard deviation is lower by Bayesian network. This can show a more consistent classification of this classifier. Considering the execution time Bayesian network was the most time consuming classifier while Fisher linear

m1 and m 2 are group means and SW is the within class


sample covariance matrix [12].

3. RESULTS
In this section the results of classifications are presented. First the Purdue dataset results are presented followed by the results taken from Graz dataset.

Table2. The summary of the results of different groups on the Graz dataset. The three last rows are results obtained in this research. Considering the value of MI for Bayesian network, the result of this work ranks second compared to others. Ranking Groups Minimum Maximum Minimum MI Error SNR C 10.71 1.34 0.61 1 F 15.71 0.90 0.46 2 B 17.14 0.86 0.45 3 A 13.57 0.85 0.44 4 G 17.14 0.50 0.29 5 I 23.57 0.44 0.26 6 E 17.14 0.34 0.21 7 D 32.14 0.14 0.09 8 H 49.29 0.00 0.00 9 16.43 1.00 0.50 Bayesian network 15.71 1.04 0.51 Neural network 17.14 0.71 0.38 Bayes classifier classifier and Bayesian quadratic classifiers used the minimum amount of time to be trained and classify EEG extracted vectors. 3.2. Graz dataset results: It is quite common to use the error rate for comparing different methods. However, the error rate takes into account just the sign of the classifier output but not the magnitude. For this reason, the mutual information is used to compare the different results. On the other hand, other groups working on the same dataset have expressed their results on Graz dataset in the form of mutual information, so the present results can be compared with theirs by using mutual information. For the Graz dataset AAR coefficients of order six were considered as feature vectors [6]. Nine other groups from different universities have applied their algorithms to the same dataset and their results can be found in Table 2 together with the results from this research (the last three rows). The details of other groups algorithms can also be found on the BCI 2003 website. The time course of mutual information during time can be seen in Figure 4. the Graz dataset. While Bayesian quadratic classifier did not have the same good results on Graz dataset. From the point of the standard deviation of the error, the Bayesian network has always been better compared to other classifiers and this means a more consistent classification. This improvement can also be seen in Figure 4 which gives an almost smooth curve of mutual information and classification error during time. The Bayesian network classifier was for the first time used for such a purpose and gave good classification accuracy as can be seen in the results nevertheless the execution time was too long making this classifier unsuitable for online BCI system developments at least by considering our current speed of processing. The results with Bayesian quadratic classifier was more for Purdue dataset, compared to others. It should also be considered that EEG signal is non-stationary, meaning that EEG statistics varies during the time so the model (refer to method section) may not well represent the signal if it is taken in many more sessions. This can be clearly seen in the reduced results for subject five which is taken over more sessions (three) compared to others (two) and also the results take from Graz dataset that there are many more sessions of EEG. The Fisher linear classifier has given comparable accuracy compared to nonlinear methods but with considerable less amount of time. In several developed BCIs so far linear discriminant analysis has been implemented as the classifier [1]. The HMM classifier has had the lowest results compared to other classifiers. In the present study a simple structure of HMM was implemented. This might have been the reason for the HMM resulting in significantly poor results compared to other classifiers. Merging these classifiers in a form of a hybrid classifier will be the topic for the future work.

4. DISCUSSION AND CONCLUSION


In this research, the EEG signal was classified using different machine learning techniques. The algorithms were applied to two known datasets in the BCI field and the results were presented in the previous section. Aside from the Bayesian quadratic classifier, and except for subject six, the Bayesian network is better than other classifiers for Purdue dataset. For subject one it is even better than the Bayesian quadratic classifier. This comparison is made according to classification accuracy. The Bayesian network has also kept comparable results in

Figure4. Time course of mutual information (bits) and error rate (the bottom figure) for Bayesian network classifier

5. REFERENCES
[1] Jonathan R. Wolpaw, Niels Birbaumer, Dennis J. McFarland, Gert Pfurtscheller, and Theresa M. Vaughan Brain-Computer Interfaces for Communication and Control Clinical Neurophysiology Vol. 113, p 767-791, 2002. [2] Dennis J. McFarland, William A. Sarnacki, Theresa M. Vaughan, Jonathan R. Wolpaw Brain-computer interface (BCI) operation: signal and noise during early training sessions Clinical Neurophysiology Vol. 116, p 5662, Jan 2005. [3] Kouhyar Tavakolian Investigation and Comparison of Different Mental Task Classification by Linear and Nonlinear Techniques Applied to EEG Signal, Master of Science thesis, Department of Electrical and Computer Engineering, University of Tehran, July 2003. [4] Kouhyar Tavakolian, A. M. Nasrabadi, Siamak Rezaei, Selecting Better EEG channels for classification of Mental Tasks, In the Proceedings of the IEEE International Symposium On Circuits and Systems ISCAS2004, pages 537-540, Vancouver, Canada, May 2004. [5] Zachary A. Keirn Jorge I. Aunon A New Mode of Communication between Man and His Surroundings IEEE Trans. On BME, Vol. 37, No. 12, Dec 1990. [6]Graz dataset: http://ida.first.fraunhofer.de/projects/bci/competition

[7] Kouhyar Tavakolian, Siamak Rezaei Classification of Mental Tasks Using Gaussian Mixture Bayesian Network Classifiers IEEE international workshop on biomedical circuits and systems. Singapore, Dec 2004. [8] Todd K. Moon, Wynn C. Stirling Mathematical Methods and Algorithms for signal processing Prentice Hall 2000. [9] L. Rabiner A tutorial on Hidden Markov Models and selected applications in speech recognitionProc. IEEE 77(2):257286, 1989. [10] Bayes Net Toolbox for Matlab written by Kevin Murphy. www.ai.mit.edu/~murphyk/Software/BNT/bnt.html [11] Andrew Webb, Statistical Pattern Recognition, Arnold, London 1999. [12] Sergios Theodoridis, Konstantinos Koutroumbas, Pattern Recognition, 1999 Academic Press. [13] Finn V. Jensen. Bayesian Networks and Decision Graphs. Springer 2001. [14] Nai-Jen Huan and Ramaswamy Palaniappan Neural network classification of autoregressive features from electroencephalogram signals for braincomputer interface design Institute of Physics Publishing Journal of Neural Engineering Vol1 pages 142-150 2004. [15] Matlab software website: www.mathwork.com

Vous aimerez peut-être aussi