Académique Documents
Professionnel Documents
Culture Documents
BBCI team:
Gabriel Curio Florian Losch Volker Kunzmann Frederike Holefeld Vadim Nikulin@Charite Benjamin Blankertz Michael Tangermann Claudia Sannelli Carmen Vidaurre Bastian Venthur Siamac Fazli Martijn Schreuder Matthias Treder Stefan Haufe Thorsten Dickhaus Frank Meinecke Paul von Bnau Marton Danozcy Felix Biessmann Klaus-Robert Mller@TUB
Matthias Krauledat Guido Dornhege Roman Krepki@industry
Andreas Ziehe Florin Popescu Christian Grozea Steven Lemm Motoaki Kawanabe Guido Nolte@FIRST Yakob Badower@Pico Imaging Marton Danozcy
Collaboration with: U Tbingen, Bremen, Albany, TU Graz, EPFL, Daimler, Siemens, MES, MPIs, U Tokyo, TIT, RIKEN, Bernstein Focus Neurotechnology, Bernstein Center for Computational Neuroscience Berlin, picoimaging, Columbia, CUNY Funding by: EU, BMBF and DFG
DECODING
use ICA/NGCA projections for artifact and noise removal feature extraction and selection
BBCI paradigms
Leitmotiv: let the machines learn
- healthy subjects (BCI untrained) perform "imaginary movements (ERD/ERS)
- instruction: imagine
- feel touch
Single channel
Single channel
BBCI paradigms
Leitmotiv: let the machines learn
A: training 20min: right/left hand imagined movements infer the respective brain acivities (ML & SP) B: online feedback session
BBCI Set-up
Artifact removal
[cf. Mller et al. 2001, 2007, 2008, Dornhege et al. 2003, 2007, Blankertz et al. 2004, 2005, 2006, 2007, 2008]
[cf. Sugiyama & Mller 2005, Shenoy et al. 2005, Sugiyama et al. 2007]
Neurophysiological analysis
, choosing
BCI Illiteracy
Percentage of ~20% of nave users: BCI accuracy does not reach level criterion, i.e., control not accurate enough to control applications
Screening Study (N=80): Cat I: good calibration (cb), good feedback (fb) Cat II: good cb, no good fb Cat III: no good cb design a predictor !
SMR-Predictor
Calculate the power spectral density (PSD) in three Laplacian channels C3, Cz, C4 under rest cond. Model each resulting curve by g = g1 + g2, with g1 = g1 (x, , k) = k1 + k2 / x (estimated noise) (2 peaks)
g2 = g2 (x, , , k) = k3 (x ; 1 , 1) + k4 (x ; 2 , 2)
SMR-Predictor (Results)
Pearson R = 0.55 Pearson R = 0.51
As much as R2 = 30% (calibration) and R2 = 26% (feedback) of variability in BCI accuracy can be explained by the SMR predictor in our study sample!
Runs 4 6
Runs 1 3
fixed Laplace
Direct feedback -> Unspecific LDA classifier. Each trial, perform adaptation of the cls. Features: log band power (alpha and beta). Laplacian channels C3, C4 and Cz.
Compute CSP and sel. Laps. from runs 1-3. Fixed CSP filters, automated laps. selection. Each trial retrain the classifier.
Runs 7 8
Compute CSP from runs 4-6. Perform unsupervised adaptation of pooled mean. Update the bias of the classifier.
[cf. Vidaurre, Blankertz, Mller et al. in preparation]
Runs 1 and 2
Runs 7 and 8
Research platform: BCI gaming + Machine Learning + Single Trial Analysis (MSM)
Popescu et al 2007
Conclusion
BBCI: non-invasive with high Information transfer rates for the Untrained BBCI: Untrained, Calibration < 20min, data analysis <<10min, BCI experiment 5-8 letters/min mental typewriter on CeBit 06. Brain2Robot@Medica 07, lNdW 09 Machine Learning and modern data analysis is of central importance for BCI et al Applications:
Rehabilitation: TOBI EU IP
Computational Neuroscience: Bernstein Centers Berlin Man Machine Interaction: using BCI as a measuring device: brain@work
BBCI team:
Gabriel Curio Florian Losch Volker Kunzmann Frederike Holefeld Vadim Nikulin@Charite Benjamin Blankertz Michael Tangermann Claudia Sannelli Carmen Vidaurre Bastian Venthur Siamac Fazli Martijn Schreuder Matthias Treder Stefan Haufe Thorsten Dickhaus Frank Meinecke Paul von Bnau Marton Danozcy Felix Biessmann Klaus-Robert Mller@TUB
Matthias Krauledat Guido Dornhege Roman Krepki@industry
Andreas Ziehe Florin Popescu Christian Grozea Steven Lemm Motoaki Kawanabe Guido Nolte@FIRST Yakob Badower@Pico Imaging Marton Danozcy
Collaboration with: U Tbingen, Bremen, Albany, TU Graz, EPFL, Daimler, Siemens, MES, MPIs, U Tokyo, TIT, RIKEN, Bernstein Focus Neurotechnology, Bernstein Center for Computational Neuroscience Berlin, picoimaging, Columbia, CUNY Funding by: EU, BMBF and DFG
BCI Competitions
blanker@cs.tu-berlin.de
08-Jul-2009
Method:
classication of
spatio-temporal features;
Application:
classication of single-trial ERPs in an attention-based speller
Cz
N1
The presentation of a visual stimulus elicits a Visual Evoked Potential (VEP) in visual cortex if focused:
P1 Oz
N1
Oz
Experimental Design
Classic Matrix Speller Attention-based Hex-o-Spell
See Poster W07 (Treder et al.) for a investigation of overt vs. covert attention and a comparison of those two speller designs.
Experimental Design
Classic Matrix Speller Attention-based Hex-o-Spell
See Poster W07 (Treder et al.) for a investigation of overt vs. covert attention and a comparison of those two speller designs.
AF3
AF4
F8
FT8
C5
Cz
C6
CP3 TP7 Pz
CP4 TP8
+ 5 V 200 ms
PO7 Oz
PO8
Target Nontarget
Target
0 5
vP1
115135 ms
N1
135155 ms
vN1
155195 ms
P250
205235 ms
vN2
285325 ms
P400
335395 ms
N500
495535 ms 5 0 5
Nontarget
50 45 40 35 30 25 20 15 error [%]
115 135 ms
135 155 ms
155 195 ms
205 235 ms
285 325 ms
335 395 ms
495 535 ms
r(x) :=
Spatial Features
#01: std #02: std #03: std #04: std #05: dev #06: std
#07: std
#08: dev
#09: std
#10: std
#11: std
#12: dev
#13: std
#14: std
#15: std
#16: std
#17: dev
#18: std
#19: std
#20: std
#21: std
#22: std
#23: dev
#24: dev
Xf := w X
R1#time points
is the result of spatial ltering: each channel of X is weighted with the corresponding component of w and summed up.
0.1 0.08 0.06 0.04 0.02 0 0.02 0.04 0.06 0.08 0.1
vP1
N1
vN1
P250
vN2
P400
N500
Spatio-Temporal Features
Spatio-temporal features are typically high-dimensional (here 59 EEG channels 7 time intervals = 413 dimensional features):
15 Fz space [channels] FCz Cz 0 CPz 5 Pz 10 Oz 125 145 175 220 305 365 mean of time interval [ms] 515 15 10 5 [V]
error [%]
Combined
vP1
N1
vN1
P250
vN2
P400
N500
Although information was added, classication on the concatenated feature becomes worse: overtting.
)(xk )
dimension d, the estimation is error-prone. There is a systematical bias: Large Eigenvalues of are too large
200
150
100
50
1.5 2 1 0 1 2
50
200
() = (1 ) + I
for a [0, 1] and dened as average Eigenvalue trace(Si )/d.
() = (1 ) + I
for a [0, 1] and dened as average Eigenvalue trace(Si )/d. Since is positive semi-denite we can have an Eigenvalue decomposition = VDV with orthonormal V and diagonal D. From
= (1 )VDV + I = V ((1 )D + I) V
we see that () and have the same Eigenvectors (columns of V) extreme Eigenvalues (large/small) are shrunk/extended towards the average . = 0 yields LDA without shrinkage, = 1 assumes spherical covariance matrices.
Modelselection
LDA with shrinkage of the empirical covariance matrix has one free parameter ( ), also called hyperparameter, that needs to be selected. There is no general way to do it. Numerous strategies with dierent properties exist, e.g. empirical Bayes shrinkage estimator MDL: Minimum Description Length Model-selection based on cross-validation. ... An easy (and also time-consuming) way is model-selection based on cross-validation.
15
15
20
20
20
15
10
10
10
shrinkage:
() = (1 ) + I =1 w 1 2
shrinkage:
() = (1 ) + I =1 w 1 2
shrinkage:
() = (1 ) + I =1 w 1 2
Trial-to-trial variation of P3
Visual alpha
(b)
FCz 0.2
0.4 CPz
Oz
(b)
with disturbance from Oz 10 8 potential at CPz [V] 6 4 2 0 2 4 6
10
10
Two channel classication of (a): 15% error, (b): 37% error When disturbing channel Oz is added to the data (3D): 16% error. Here, channel Oz is required for good classication although itself is not discriminative.
=0
= 0.001
= 0.005
= 0.05
= 0.5
=1
Recently, a method to analytically calculate the optimal shrinkage parameter was published ([1]). Thanks to Nicole Krmer for pointing the BBCI group to this method.
Aim:
get a better estimate of the true covariance matrix (especially in case n < d) than the sample covariance matrix 1 = n1 n (xk )(xk ) by selecting a in k=1
() := (1 ) + I.
Aim:
get a better estimate of the true covariance matrix (especially in case n < d) than the sample covariance matrix 1 = n1 n (xk )(xk ) by selecting a in k=1
() := (1 ) + I.
We denote by (xk )i resp. ()i the i-th element of the vector xk resp. . Furthermore we denote by sij the element in the i-th row and j -th column of . We dene
n = (n 1)2
35 30
LDA
error [%]
25 20 15 10 115135 ms 135155 ms 155195 ms 205235 ms 285325 ms 335395 ms 495535 ms 5 0 5
with shrinkage
5%
Combined
vP1
N1
vN1
P250
vN2
P400
N500
80 accuracy [%]
60
40
20
Accuracy in
3.33%.
80 accuracy [%]
60
40
20
See Poster W07 (Treder et al), and for other applications of this technique A02 (Thurlings et al), W10 (Hhne et al).
0.02 0 0.02
Method:
Classication of spectral features, namely modulations of the amplitude in specic frequency bands. In particular, Common Spatial Pattern (CSP) analysis to classify dierent conditions that are characterized by a modulation of the amplitude of brain rhythms ([3, 4]).
Application:
Classication of motor imagery conditions in a BCI paradigm.
left
C4
hand
hand
~ 1/f
10 20 30 40 50
left hemisphere
right hemisphere
Frequency [Hz]
left
C4
hand
hand
left hemisphere
right hemisphere
Frequency [Hz]
Imagining a movement of a limb causes a local blocking of the corresponding sensorimotor rhythm (SMR), see [5, 6, 7].
5.5 [dB]
4.5
3.5
Conclusion
Locations C3 and C4 are good candidates to observe SMR modulations. These cover the sensorimotor areas of the right and the left hand.
Spatial Smearing
Raw EEG scalp potentials are known to be associated with a large spatial scale owing to volumne conduction. In a simulation of Nunez et al [8] only half the contribution to one scalp electrode comes from sources within a 3 cm radius.
1
0.8333
0.6667
0.5
0.3333
0.1667
30 25 20 15
[dB]
[dB]
r2
F3 lar
Fz lar
C1 lar
Cz lar
C2 lar
C4 lar
CP3 lar
CP1 lar
CP2 lar
CP4 lar
P5 lar
P3 lar
Pz lar
P4 lar
P6 lar
10
20
30
40
First step: determine a suitable frequency band that shows good discrimination between the conditions.
C1 lar
Cz lar
C2 lar
C4 lar
CP3 lar
CP1 lar
CP2 lar
CP4 lar
P5 lar
P3 lar
Pz lar
P4 lar
P6 lar
Second step: determine a suitable time interval during which discrimination is most prominent.
Goal:
observed signals
projection
discriminative signals
csp:R1,2,... filter
V1
forward model
EEG
V
backward model csp:L1,2,...
CSP Analysis
The goal of CSP:
right csp:R left right
csp:L
2425
2430
2435
[s]
C3
XL XR (Tchans) (Tchans) L := XL XL R := XR XR
728 730 732 734 736 738 [s]
C4
V L V = D
& V (L + R )V = I
CSP
728 730 732 734 736 738 [s]
C3
XL XR (Tchans) (Tchans) L := XL XL R := XR XR
728 730 732 734 736 738 [s]
C4
V L V = D
& V (L + R )V = I
CSP
728 730 732 734 736 738 [s]
0 log(var(CSPR))
3 3 2 1 log(var(CSPL)) 0 1
Here, only two dimensions are shown. Note, that applying the logarithm to the band power features makes the distribution more Gaussian and therefore enhances linear separability.
12
0.4 0.3
1 left right
10
left right
8 [dB]
0.3 0.4
(1)
2 5 10 15 20 25 [Hz] 30 35 40 45
(2)
csp3 [0.60]
1000
2000 [ms]
3000
4000
5000
(4)
3 3 2 1 log(var(CSPL)) 0 1
csp1 [0.68]
csp2 [0.64]
csp4 [0.71]
csp5 [0.68]
csp6 [0.61]
28.2
17.0
39.3
15.5
(3)
Project EEG with spatial CSP lters and apply band-pass lter, calculate the variance in short windows (e.g. last 500 ms), take the logarithm, and apply the classier weighting. One nice feature of CSP is that the length of the classication window can be changed at runtime (i.e. during feedback). For more theoretical considerations as well as practical hints see [3].
Remark:
Block Design
Assume the task is to discriminate between mental states in dierent conditions. We say that an experiment has a block design, if the periods for which there is no alternation between conditions are longer than the intended change of states in online operation.
block #1 block #2 block #3 block #4 block #5 block #6 block #7
cond #1
cond #2
cond #1
cond #2
cond #1
cond #2
cond #1
A problem arises, if the performance is estimated for such a data set by cross validation.
Test set
In EEG there are many slowly changing variables of background activity, therefore the single-trials are not independent. For an ordinary cross validation in a block design data set, the requirement of independence between training and test set is violated.
A Validation Test
To demonstrate impact of block design in cross validation, we perform cross validation in the following setting. Taking an arbitrary EEG data set, we assign fake labels (regardless of what happened during the recording) like this:
nBlocksPerClass=1:
class #1
class #2
nBlocksPerClass=2:
class #1 class #2 class #1 class #2
nBlocksPerClass=3:
class #1 class #2 class #1 class #2 class #1 class #2
and so on.
leave-one-block-out
validation are
The severeness of the underestimation of the true error depends on the complexity of the features and the classier. Cross validation in block design data might also give the correct result but alternative evaluation is required. The situation gets worse if trials are extracted from overlapping segments. The most realistic validation is to train the methods on the rst N 1 runs and to evaluate on the last run. Leave-one-block-out and leave-one-run-out have larger standard errors than cross validation.
References
O. Ledoit and M. Wolf, A well-conditioned estimator for large-dimensional covariance matrices, J Multivar Anal, 88(2): 365411, 2004. J. Schfer and K. Strimmer, A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics, Statistical Applications in Genetics and Molecular Biology, 4(1), 2005. B. Blankertz, R. Tomioka, S. Lemm, M. Kawanabe, and K.-R. Mller, Optimizing Spatial Filters
http://dx.doi.org/10.1109/MSP.2008.4408441.
for Robust EEG Single-Trial Analysis, IEEE Signal Proc Magazine, 25(1): 4156, 2008, URL
H. Ramoser, J. Mller-Gerking, and G. Pfurtscheller, Optimal spatial ltering of single trial EEG during imagined hand movement, IEEE Trans Rehab Eng, 8(4): 441446, 2000. C. Neuper and W. Klimesch, eds., Event-related Dynamics of Brain Oscillations, Elsevier, 2006. G. Pfurtscheller, C. Brunner, A. Schlgl, and F. L. da Silva, Mu rhythm (de)synchronization and EEG single-trial classication of dierent motor imagery tasks, NeuroImage, 31(1): 153159, 2006. G. Pfurtscheller and F. Lopes da Silva, Event-related EEG/MEG synchronization and desynchronization: basic principles, Clin Neurophysiol, 110(11): 18421857, 1999. P. L. Nunez, R. Srinivasan, A. F. Westdorp, R. S. Wijesinghe, D. M. Tucker, R. B. Silberstein, and P. J. Cadusch, EEG coherency I: statistics, reference electrode, volume conduction, Laplacians, cortical imaging, and interpretation at multiple scales, Electroencephalogr Clin Neurophysiol, 103(5): 499515, 1997.