Académique Documents
Professionnel Documents
Culture Documents
Support vector machines are one of the most common pattern recognition machines used in Machine learning. It is considered as a successor to logistic regression due to the fact that it is based on maximum class seperatability.[1]
b. Mathematical definition Problem: Given a vector and We need to find probability distribution drawn and identically distributed). Solution: Consider, a machine or simply a function, f(X, ), where is the training parameter. Expectation of the tests error: | Where R() cannot be calculated as both and out such that the limit of the bound approximates Consider, | | are not known, so a bound is to be found | , for given vector , is called a trusted source. , assuming that the data is IID (Independently
Now it can be proved that, ( ) We need to find h, such that the RHS is almost equal to LHS, then R() can be estimated, h is called VC dimension which is an integer, thus a perfect fit cannot be described, but a best fit is described by changing VC dimensions.
Linear SVM:
It is one of the most important models used in neuro applications as the sample size is smaller than the dimensionality of the system in many cases[3]. Mathematical Basis: If W is the hyper plane, since the hyperplane or discriminatory function is linear, we must have The optimum W is decided by optimizing "Margin". i.e., Thus the distance optimization gives rise to the distance. However, doing this optimization directly is difficult. Hence, convex optimization tools are used to get the solution, One of the approach, commonly followed is Lagrange Multipliers, Lagrange Multipliers[4]: Lagrange multipliers are used in optimization problems,
|| ||
is a curve in 3 dimension. However, using makes the curve a constrained curve restricted to 2 dimensions, differentiating the curve w.r to x, we get
Consider a two dimensional vector, T the tangent to the curve, Consider a two dimensional vector, Equation 1, can be written as only if , This means that equivalently written as, , Also we need to minimize and this is possible and are parallel to each other, which can be
consider,
,[
Solving the above three linear equations simultaneously, we the value of the Lagrange multiplier, hence by substituting the value of in the same set of equations we get the values of y and x. x corresponds to minima point and y is the minima. Now, we can use the concept of Lagrange multipliers shown above to solve the problem of ||w|| minimization. Here, we write the equation in terms of Lagrange multiplier i, where i=1,2,...l vector The solution of this equation is got by quadratic programming for those points for which i>0 are called support vectors, These points lie on the two hyperplanes H1 and H2, These points in hyperplanes are important because even though the points which are not in hyperplanes(points that are far away from discriminatory function) are removed, the discriminatory function does not change. However, If the points on Hyperplanes are removed the discriminatory function changes. Hence, in feature selection it is important, to make sure that feature matrix does not exclude the support vectors. , Where Lp is the Lagrangian and is the Lagrange multiplier
The two dotted lines are hyperplanes H1 and H2 on which the support vectors lie, The margin maximization yields support vectors and the classifier.[5] The above discussed is theoretical method, hence it is an analytical way of soling for the solution. However, once the sample size becomes large it is hard to solve the equation, as seen from the example a two sample problem requires simultaneous solution of 3 equations, However, there is no guarantee on the linearity of the equations, Hence, the algorithmic solution is required for solving the equation. Hence, an iterative method is required to solve for solution. Algorithms to solve the Lagrangian[3,16]: 1. Sequential Minimal Optimization: This algorithm is used in SVM solution and it uses the KarushKuhnTucker conditions , to solve the equations, steps: 1. guess the value of 1 and see whether it violates KKT conditions, if it does retain it else guess another 1 that violate the KKT conditions 2. Find 2 and using the pair (1,2) optimize 3. Go back to 1, if not converged, convergence is checked by KKT conditions KKT conditions[16] :
These four conditions together constitute KarushKuhnTucker conditions, used for optimization problems. This algorithm is also called Co-ordinate ascent, as only two of the variables are used in update rule
2. Sigmoid: Sigmoid is a kernel used for probabilistic applications, that would be in gene prediction problems[8]. The sigmoid kernel, a one dimensional one would be * +. Sigmoid function estimates the probability. Thus sigmoid shown in bracket, its range varies from [0,1]. 3. Gaussian: Gaussian Kernels are used in nature-related problems. This is because, most of the natural phenomenon have Gaussian decay function. A two dimensional Gaussian kernel is of the form [
[ ]
Discriminatory features are normally, marked by integers, for example: white matter if encoded as zero, then grey matter is encoded as one. In the case of scaling, the data is scaled to follow a normal distribution, i.e., , Where X is the value of the feature, and are mean and standard deviation of distribution respectively. By, normalization a standard reference is got[10]
b. Model Selection: By model selection, a kernel selection is implied. In bio medical applications, normally linear kernels are selected because the feature vector is much larger compared to sample size, typically feature vectors have about 3000 pixels in case of MRI images. However, a sample size of such a large size is not possible, thus linear SVM is a good model[14]. c. Validation: There could be two types of validation methods carried out, a simple validation or a cross validation. In the case of a simple validation method the data is divided into training (50%) and test data(50%) and then the model is trained and tested and accuracy is reported. In case of cross validation, a method called v-fold validation is carried out. Here, the database is partitioned into v subsets. With each (v-1) combinations of subsets the model is trained and the remaining subset is used to evaluate the performance, the combination of subsets that gives greatest accuracy is chosen and the accuracy is marked the accuracy of the model[11]. In the case of medical applications, accuracy alone is not a good criteria, here sensitivity and specificity is used. Sensitivity is the percentage of diseased individuals marked as diseased, and specificity is the number of healthy controls marked as healthy by the algorithm[14]
1. Feature Extraction: There are various methods to extract the features. The feature extraction is mainly consists of two types namely structural features and functional features. Structural features are the features got from MRI and functional features are got from fMRI. Structural features can be got in many ways. One of the crudest methods being taking voxels containing the volume of interest and encoding these intensity values of the voxels into a single column matrix[12]. The methods encoding pattern of grey and white matter also give good results. Here, in a given image if grey matter is encoded as zero, white matter could be encoded as one. This encoded image is again made a single column matrix used as a feature[13]. Through structural features a good accuracy of classification of about 76-88% is possible for classification of PDAT from HC[14]. Functional features can be got from encoding from fMRI data and the accuracy varies between 71-85%. fMRI is popularly used in functional feature encoding. However a combination of both structural and function features gives better accuracy, even a 100% accuracy is possible[15] Set of Features:
Biomarkers
Functional MRI
Structural MRI
Mean Diffusability
Functional Connectivity
Grey Matter , White matter , CSF percentages Shape and size of hippocampus
Spatial coherence
Functional Activation
spacial connectivity
Cortical Thickness
Biomarkers
CT scan
PET scan
Functional connectivity
Voxel electron density Medial Temporal Lobe width percentage volume of CSF
Blood flow
Glucose uptake
Amyloid binding
Regional Activation
Image obtained from the MRI scanner Segment the image into white matter, grey matter and CSF regions. Registration of grey matter done by marking it as 1, and marking others as 0 This procedure followed on all the samples to obtain dataset which is trained using SVM Validation carried out to obtain the sensitivity and specificity values
The feature matrix is simply the segmented image arranged as columns.
[ where , represents the kth sample and its marked training matrix of the support vector machine.
Stand score method: In the first method all the voxels were used, In this method only the voxels having CSF more than 50% are used to form feature matrix other voxels are discarded[19]. This method reduces the dimension of feature space, to optimize the classification. However, the results have not improved compared to the first method. This method uses, feature reduction to increase the classification efficiency. Atlas based methods (voxel atlas): Voxels are segmented into anatomical regions using labeled atlas into a total of 116 regions. Here a software called COMPARE is used and not SPM5[20]. Based on these each voxel would be having a map indexed from 0 to 115. This method too, has reported substantial improvement in the results.
2. Vertex based Methods (Using Cortical Thickness): Here the cortical thickness is the feature used in classification , it is a direct marker of Atrophy. Cortical thickness becomes input for the support vector machine. The various methods used are direct measurement of cortical thickness, or a thickness atlas is used, the atlas has 68 gyrals based ROI, in each ROI cortical thickness is computed. In another method, ROI is determined and the cortical thickness pattern in the ROI becomes a feature matrix[21]. A Better method than just finding cortical thickness would be to measure cortical thickness in different regions. The study has shown that Entrohinal Cortex, Lateral middle temporal gyrus, Inferior pariental cortex, Medial Orbito frontal Cortex show significant change in thickness for AD and non-AD patients, while Rostral Midfrontal Cortx and Retro spinal Cortex show not much significant diiference in thickness[37,39]. 3.Hippocampus Segmentation methods: Volume and the shape of the hippocampus is used as a feature to be fed into SVM to the train the support vector. Even though, this method is most commonly used clinical method, it yields lower accuracy in SVM modeling of about 68%[22]
Get the fMRI data from the machine Identify about 100 ROIs from the brain anatomy, and calculate the correlation matrix for the ROIs Identify the ROI pair with correlations greater than T1, and lesser than T2. Using these setof values obtained in the set the DCI (<T1) and ICI(>T2) is computed. Train the SVM using DCI and ICI values
After 100 ROIs are identified a Pearson correlation matrix is formed to which is a 100x100 matrix. The data is quantized as a histogram, and the indexes Decreased Connectivity Index(DCI) and Increased Connectivity Index(ICI) are computed.
DCI and ICI values are calculated for every sample and hence, the feature space becomes two dimensional, Hence, a 2D SVM classifier can be used , some methods even use Fisher Discriminant Analysis(FDA), to classify.
Mean diffusability is computed for every voxel, hence the mean diffusabilty becomes a feature to be classified[27]
CT Scan features:
CT scan uses X-ray and gives a volumetric data, with the help of rendering and reconstruction techniques, CT scan images are obtained. It uses ionizing radiations hence it is not a safe method as MRI 1. Ventricular Brain ratio: Ventricular brain ratio is the ratio of the ventricle to the brain volume(intra cranial volume). It was found that the percentage decrease of Ventricular brain ratio was 2% in healthy controls and it was 9% in Alzheimer's disease patients. Hence, Ventricular brain ratio is an important bio marker in AD. Since CT scan gives a contrast for soft tissues and fluid, from this contrast difference Ventricular volume is measured[28]. 2. Percentage Volume of CSF: percentage of CSF can be found out from a CT scan as CT scan gives a difference between soft tissues and fluids by giving various contrasts to the images.
Age is directly related to percentage of Cerebral Spinal Fluid, thus, care has to be taken while interpreting the results, always healthy population and AD cohorts taken for study must be of the same age[29] . The studies have shown a significant difference in CSF percentage in AD patients in comparison with healthy patients. The percentage CSF in healthy adults was 12.59% while in AD patients it was 18.2%[30]. 3.Medial Temporal Lobe Width: From CT scan or MRI, Medial Temporal Lobe Features can be obtained: Four features can be measured from Medial Temporal Lobe: 1. A- Largest Vertical Height of Hippocampus formation
2. B- Largest horizontal width between hippocampus and brain stem 3. C- Largest vertical width of choroid fissure 4. D- Width of Temporal Horn. Atrophy can be measured by measuring these four quantities. Atrophy can be 1,2,3 or 4. Based on the increase of C,B and decrease of D,A[31]. To support this hypothesis study was conducted and from CT scan Medial Temporal width of AD patients and healthy controls was found out. It was found that the width was is smaller in AD patients by 16%[32].
4. Functional Connectivity: Functional connectivity is analysis is done from resting state PET scan, functional connectivity is similar to functional connectivity analysis in a fMRI scan. An Independent Components analysis is applied to get independent components and principal component analysis is applied on the data to reduce the number of components, An SVM classification is finally applied to get the differentiating features[36]. 5. Regional Activation: The information processing like encoding of object information and their recognition happens differently in AD and non-AD patients. In these tasks, shapes are shown one by one and the subject has to identify it as old or new object. The experiment was done using injected into the blood. The PET images are taken over the time and relative Cerebral Blood Flow (rCBF) is obtained using SPM software. rCBF activation information is analyzed statistically using correlation measures[40].
From ,the two spectrograms obtained difference is calculated, the so obtained difference spectrogram indicates the level of GSH. GSH is the direct indicator of oxidative stress. Oxidative stress is main indicator of Alzheimer's disease[41]. SVM hypothesis
Solving this equation by using convex optimization in MATLAB toolbox, we could find the W matrix which indicates the three n-dimensional planes used to bifurcate the data into four groups. This algorithm in MATLAB uses Cross-validation and Expectation Maximization. The support vector machine is trained on a large data to give class discriminatory features. The vector X could be entire spectrograph of n samples or could be only the concentration of GSH could be used.
To validate the data, multiple features can be helpful, 2. pH in the left Hippocampus pH levels in left hippocampus is a good indication of A.D, hence it can be a great bio marker[14]
is chemical shift difference between PCR and resonant peak of Pi. It was observed that pH in left hippocampus changes towards Alkali region in AD patients[42]. Thus pH is a biomarker that can be used in pattern recognition algorithm. Since the system would have three markers a neural network could be trained to recognize the patterns of Alzheimer's disease subject to the tree bio markers. 3.Cho/Cr ratio: In AD patients Cho/Cr ratio obtained from MRS is lower than the control subjects, Cho is Choline containing Compounds, The measure is not done as Absolute quantity of Cho as there is great variations among individuals in Cho concentration. However, the ratio is bio marker to be used the study[43] 4. NAA concentration The NAA concentration is a marker for AD, however, the absolute quantization gives no results. Hence quantization by ratio is chosen, NAA is N-Acetyl aspartic acid. The ratio commonly used are NAA/Cr, However NAA/Cr has lesser sensitivity compared to NAA/myo-Inositol. Thus this method is more popular and gives a better accuracy over NAA/Cr[44]. 5. Whole spectrum classification by SVM: To do a whole brain MRS spectra study with the help of SVM, voxel of interest is chosen to get the spectra, once the spectra is obtained; the same procedure is repeated over all samples. The spectra amplitude values can be represented by a column matrix and hence the column matrix becomes the training data for SVM and optimal classifier may be found out. However, the data can be classified into metabolites concentration by ICA and thus reduces the feature space and SVM algorithm is run on the reduced feature space and a optimal classifier can be got to distinguish between AD and non-AD patients[45]
References:
[1] Vapnik ,V.N., 2000. The Nature of Statistical Learning Theory, Springer, NY [2] Nello Cristianini, John Shawe-Taylor.,2000. An Introduction to Support Vector Machines and other kernel-based learning methods, Cambridge University Press [3] Andrew Ng class notes [4] Lagrange Multipliers, Com S 477/577 ,Nov 18, 2008
[5] Christopher J.C. Burges ,A Tutorial on Support Vector Machines for Pattern Recognition, Kluwer Academic Publishers, Boston [6] http://nlp.stanford.edu/IR-book/html/htmledition/nonlinear-svms-1.html, accessed on may 23, 2012 at 3:34pm IST. [7] A. Blum and P. Langley. Selection of relevant features and examples in machine learning. Artificial Intelligence, 97(1-2):245271, December 1997. [8] Christopher D. Manning, Prabhakar Raghavan, Hinrich Schtze., 2009. An Introduction to Information Retrieval, Cambridge University Press Cambridge, England [9] Chih-Wei Hsu and Chih-Jen Lin, A Comparison of Methods for Multiclass Support Vector Machines, IEEE Transactions on Neural Networks Vol. 13, No. 2, March 2002 [10] Cortes, Vapnik Support-Vector Networks, Machine Learning, 20, 273-297 (1995), 1995 Kluwer Academic Publishers, Boston [11] Chih-Wei Hsu, Chih-Chung Chang, and Chih-Jen Lin,A Practical Guide to Support Vector Classification [12] Matthew J. Clarkson, M. Jorge Cardoso, Gerard R. Ridgway, Marc Modat, Kelvin K. Leung, Jonathan D. Rohrer, Nick C. Fox, Sbastien Ourselin, A comparison of voxel and surface based cortical thickness estimation methods, NeuroImage, Volume 57, Issue 3, 1 August 2011, Pages 856-865, ISSN 1053-8119 [13] Daliri, Mohammad., Automated Diagnosis of Alzheimer Disease using the Scale-Invariant Feature Transforms in Magnetic Resonance Images, Journal of Medical Systems, Volume 36, Issue 2, 1 April 2012 pages 995-1000 Springer Netherlands, ISSN 0148-5598 [14] Graziella Orru, William Patterson-Yeo, Andre F. Marquand, Giuseppe Sartori Andrea Mechelli., Using Support Vector Machine to identify imaging biomarkers of neurological and psychiatric diseases: A critical review, Neuroscience and Biobehavioral Reviews, Volume 36, pages 1140-1152 [15] Fan, Y., Resnick, S.M., Wu, X., Davatzikos, c., 2008a. Structural and Functional biomarkers of prodromal Alzheimer's disease: a high-dimensional pattern classification study. Neuroimage 41, 277-285 [16] Dimitri P. Bertsekas and Dimitri P. Bertsekas. Nonlinear Programming. Athena Scientific, 2nd edition, September 1999. [17] Cuingnet, R., et al., Automatic classification of patients with Alzheimer's disease from structural MRI: A comparison of ten methods using the ADNI database, NeuroImage (2010)
[18] Fan, Y., Shen, D., Davatzikos, C., 2005. Classification of structural images via high dimensional image warping, robust feature extraction, and SVM. Proceedings of the 8th International Conference on Medical Image Computing and Computer-Assisted Intervention 8 (Pt 1), pp. 18. [19] Vemuri, P., Gunter, J.L., Senjem, M.L., Whitwell, J.L., Kantarci, K., Knopman, D.S., Boeve, B.F., Petersen, R.C., Jack Jr., C.R., 2008. Alzheimer's disease diagnosis in individual subjects using structural MR images: validation studies. Neuroimage 39 (3), 11861197. [20] Fan, Y., Shen, D., Gur, R.C., Davatzikosa, C., 2007. COMPARE: classification of morphological patterns using adaptive regional elements. IEEE Trans. Med. Imaging 26 (1), 93 105. [21] Klppel, S., Stonnington, C.M., Chu, C., Draganski, B., Scahill, R.I., Rohrer, J.D., Fox, N.C., Jack Jr., C.R., Ashburner, J., Frackowiak, R.S.J., 2008. Automatic classification of MR scans in Alzheimer's disease. Brain 131 (3), 681689. [22] Gerardin, E., Chtelat, G., Chupin, M., Cuingnet, R., Desgranges, B., Kim, H.-S., Niethammer,M., Dubois, B., Lehricy, S., Garnero, L., Francis, E., Colliot, O., 2009. Multidimensional classification of hippocampal shape features discriminates Alzheimer's disease and mild cognitive impairment from normal aging. NeuroImage 47 (4), 14761486. [23] K.J. Friston, P. Fletcher, O. Josephs, A. Holmes, M.D. Rugg, R. Turner, Event-Related fMRI: Characterizing Differential Responses, NeuroImage, Volume 7, Issue 1, January 1998, Pages 30-40, ISSN 1053-8119 [24] Karl J Friston and Christian Bchel, Functional Connectivity: Eigenimages and multivariate analyses, Lecture notes accessed from Queens university website. [25] Serge A.R. et.al," Altered resting state networks in mild cognitive impairment and mild Alzheimer's disease: An fMRI study", Human Brain Mapping Volume 26, Issue 4, pages 231 239, December 2005 [26] Denis Le Bihan, Jean-Francois Mangin, Cyril Poupon, Chris A. Clark, Sabina Pappata, Nicolas Molko, Hughes Chabriat, "Diffusion Tensor Imaging: Concepts and Applications", Journal of Magnetic Resonance Imaging, Volume-13 ,pages 534546 , 2001 [27] David A Medina and Moises Gaviria , "Diffusion tensor imaging investigations in Alzheimers disease: the resurgence of white matter compromise in the cortical dysfunction of the aging brain", Neuropsychiatric Disease Treatment ,Volume 4, Issue 4 pages: 737742, August 2008 [28] de Leon, M. J, George, A. E., Reisberg, B., Ferris, S. H., Kluger, A., Stylopoulos, L. A., Miller, J. D., La Regina, M. E., Chen, C., Cohen, J., "Alzheimer's disease: longitudinal CT
studies of ventricular change", AJR Am J Roentgenol, Volume 152, Isuue 6, pages 1257-62, June 1989 [29] Hiroshi Wanifuch et.al, "Age and its relation with CSF percentage", Journal of Neuro surgery, Volume - 97, Issue 3, pages- 601-610 [30] Gordon J. Harris, Edward H. Rhew, Thomas Noga, Godfrey R. Pearlson, "User Friendly method for Rapid Brain and CSF volume calculation using Transaxial MRI images", Psychiatry Research : Neuroimaging , Volume 40, pages 61-68, June 1991. [31] Ph Scheltens, D Leys, F Barkhof, D Huglo, H C Weinstein, P Vermersch, M Kuiper, M Steinling, E Ch Wolters, J Valk, "Atrophy of medial temporal lobes on MRI in "probable" Alzheimer's disease and normal ageing: diagnostic value and neuropsychological correlates", Journal of Neurology, Neurosurgery, and Psychiatry ,Volume:55 pages :967-972,1992 [32] Barber, R., Gholkar, A., Scheltens, P, Ballard, C. ,McKeith, I. G., O'Brien, J. T."MRI volumetric correlates of white matter lesions in dementia with Lewy bodies and Alzheimer's disease", Int J Geriatr Psychiatry, Volume : 15, Issue:10, pages:911-916, October 2010 [33] Agneta Nordberg, "PET imaging of amyloid in Alzheimers disease", Lancet Neurol, Volume:3, pages: 51927, 2007 [34] Harr et.al [35] David C. Alsop, John A. Detre, Murray Grossman, Assessment of Cerebral Blood Flow in Alzheimers Disease by Spin-Labeled Magnetic Resonance Imaging, Annals of Neurology Vol 47 , Issue 1, January 2000 [36] Paule-Joanne Toussaint, Vincent Perlbarg, Pierre Bellec, Serge Desarnaud, Lucette Lacomblez , Julien Doyon, Marie-Odile Habert, Habib Benali, "Resting state FDG-PET functional connectivity as an early biomarker of Alzheimer's disease using conjoint univariate and independent component analyses", NeuroImage (2012) [37] Christine Fennema-Notestine, Donald J. Hagler Jr, Linda K. McEvoy, Adam S. Fleisher, Elaine H. Wu, David S. Karow, Anders M. Dale, "Structural MRI Biomarkers for Preclinical and Mild Alzheimers Disease", Human Brain Mapping, Volume: 30, pages: 32383253 (2009) [38] B.C. Dickerson, T.R. Stoub, R.C. Shah, R.A. Sperling, R.J. Killiany, M.S. Albert, B.T. Hyman, D. Blacker, L. deToledo-Morrell, "Alzheimer-signature MRI biomarker predicts AD dementia in cognitively normal adults", Journal of Neurology Volume :76, pages: 1395- 1402 April , 2011 [39] Miroslaw Brysa, Lidia Glodzika, Lisa Mosconia, Remigiusz Switalskia, Susan De Santia, Elizabeth Pirragliaa, Kenneth Richa, Byeong C. Kimc, Pankaj Mehtad, Ray Zinkowskie, Domenico Praticof, Anders Walling, Henryk Zetterbergg, Wai H. Tsuia, Henry Rusineka, Kaj
Blennowg, Mony J. de Leon," Magnetic Resonance Imaging Improves Cerebrospinal Fluid Biomarkers in the Early Detection of Alzheimers Disease", Journal of Alzheimers Disease, Volume 16, Issue 2, pages: 351362, 2009 February [40] N. Scarmeas, K.E. Anderson, J. Hilton, A. Park, C. Habeck, J. Flynn, B. Tycko, Y. Stern, "APOE-dependent PET patterns of brain activation in Alzheimer disease", Journal of Neurology Volume 63, pages:913915, 2004 [41] Pravat K. Mandal , Manjari Tripathi , Sreedevi Sugunan, "Brain oxidative stress: Detection and mapping of anti-oxidant marker Glutathione in different brain regions of healthy male/female, MCI and Alzheimer patients using non-invasive magnetic resonance spectroscopy", Biochemical and Biophysical Research Communications pp-43-48 [42] Pravat K. Mandal,, Himanshu Akolkar and Manjari Tripath, "Mapping of Hippocampal pH and Neurochemicals from in vivo Multi-Voxel 31P Study in Healthy Normal Young Male/Female, Mild Cognitive Impairment, and Alzheimers Disease", Journal of Alzheimers Disease 29 (2012) 112 [43] Jonathan M. Schott, Chris Frost, David G. MacManus, Fowzia Ibrahim,Adam D. Waldman4, Nick C. Fox1, "Short echo time proton magnetic resonance spectroscopy in Alzheimers disease: a longitudinal multiple time point study", doi:10.1093/brain/awq208 [44] John Q. Trojanowski , "Searching for the Biomarkers of Alzheimers :The ability to detect this widespread but elusive condition through simple, noninvasive means has been a dream of neurologists for years. Can a landmark study make it a reality?", Practical Neurology, Volume:3: pages:30-34, 2004 [45]Jian Ma, "MRS classification based on independent component analysis and support vector machines" Hybrid Intelligent Systems, November 2005