Vous êtes sur la page 1sur 6

1

THE USE OF MULTIVARIABLE STATISTICS IN THE STUDY OF VEHICLE DYNAMICS


Univ. Prof. eng. Ion COPAE PhD Military Technical Academy, Bucharest, email: ioncopae@hotmail.com

Abstract. The paper shows the main aspects that define geostatistics (spatial statistics or multivariable statistics) and concerns large sets of data, with application in time, as the classic statistics, but also in space. Multivariable statistics began its application on vehicles at the same time as the apparition of electronic control, in which the board computer operates with large sets of data received from the translators incorporated at construction. Geostatistics turns to specific concepts and algorithms as the descriptive statistics. This way, aspects concerning spatial correlation are evidenced, using also the distance, which is the defining element of geostatistics. The paper shows aspects concerning cluster analysis, which define what data have in common, and also aspects of discriminant analysis, which define the differences between the studied data. Also, there are shown aspects concerning the analysis of the principal components, factors analysis a data classification procedures. The paper also contains examples for the mentioned concepts using the data received form the board computer.

Geostatistics, also known as spatial statistics or multivariable statistics, its the part of the statistics that concerns large sets of data, with application in time, as the classic statistics, but also in space [1]. Multivariable statistics began its application on vehicles at the same time as the apparition of electronic control, in which the board computer operates with large sets of data received from the translators incorporated at construction [2]. Geostatistics turns to specific concepts and algorithms as is shown in figure 1 as the descriptive statistics which assures the graphic or table visualization of the results. This way, aspects concerning spatial correlation are evidenced, using also the distance, which is the defining element of geostatistics. Geostatistics also turns to cluster analysis [3], which define what data have in common, and also aspects of discriminant analysis [4], which define the differences between the studied data. By using large sets of data of multiple variables, geostatistics uses multiple regressions and recurrences, not single-variables or double-variables as statistical inference uses. Because a human cannot perceive space variations with more than three dimensions, geostatistics uses transformations that assure the replacement of all initial multivariable tables with equivalent ones, therefor the graphic representations become at the most three-spatial; this is the result of the analysis of the main components [5]. In order to draw conclusions after analyzing the data, geostatistics also turns to factor analysis [6], as well as to classification methods that are related to the decision theory.

Figure 1. Geostatistics Areas Figure 2 shows a specific graphic form that assures, in this case, the observation of the depending of different functional elements of an engine, as there are shown here from top to bottom and left to right, the position of the throttle, the speed of the engine, the injection time and the engine torque, their histograms being found on the main diagonal. Even the frequent used name of the graphic of gplotmatrix shows that geostatistics turn to matrix algebra, the data having on lines the values of the same element and on the columns the values of different types of variables.

Figure 2. Multivariable Analysis - Specific Graphic (gplotmatrix) As we said before, distance is the main specific element of geostatistics and that is why it is necessary to add this dimension to all multiple analysis procedures.

A second essential element of the geostatistics is cluster analysis, that is data groups of the same characteristics. For experimental data the same characteristics means the same distance to the medium value. By the same distance we understand an accepted value with a certain level of trust; so we operate with many values situated in trustworthy intervals, as statistical inference. This is the purpose of the use of the test T2 in geostatistics. Cluster analysis is the first step for the analysis of the main components. As an example, figure 3a shoes two series of data, that in certain time intervals have common elements and in others their values differ; figure 3b shows cluster grouping and the coordinates of their centers. Figure 3c shows a specific graphic boxplot type that sets the quartiles in a similar way to the statistic inference. On the other hand figure 3d shows a specific graphic for the multivariable analysis called the dendrogram which is obtained by the analysis of clusters. The dendrogram which has on the ordinates axe the distance between the centers of two clusters in the case of the k-means algorithm unites these two ones in the order of the increasing of the distance. For example, from the dendrogram it results that the smallest distance is the one between the centers of the 1th and 4th clusters, followed by the distance between the 2nd and the 3rd clusters; next the smallest distance is between the common center of the 2nd and 3rd cluster and the 5th cluster etc.

Figure 3. Cluster Analysis By using the Euclidean distance we obtain the five clusters from figure 4. Geostatistics uses in the analysis either these centers, either the closest points from each cluster, either the furthest ones. Most often there are used the centers, when it is obtained the algorithm called k-means, where k represents the number of the clusters, and a second symbol shows that a medium it is used, in this case the aritmethical one. As it was mentioned before, the cluster analysis assures also the stability of the main components, which has appeared due to the impossibility of the humans to observe the

variations of the elements in a space with more than three dimensions. As a consequence we use at the most three variable, many times only two, equivalents to the initial ones which are many more; with these components we go on to the study of the functioning of the system. Each main component represents a linear combination of the initial variables resulting from the demand of maximization of their dispersions on each main axe; all main components are orthogonal between them, so this will not introduce redundant information. Figure 4a shows an analysis example by using the experimental data of the position of the throttle, the speed and the engine torque. The graphic shows that the two main components replace in a 99.2% percentage the three initial elements mentioned; the first main component, PC1, has a 76.5% contribution, and the second one, PC2, 22.7%.

Figure 4. Principal Component Analysis In the figure 4b another initial element was added, the ignition advance. In this case the contribution of the two main components has decreased to 90.1%, first at 59.9%, and the second one at 30.2%. This comes from the fact that an element was introduced that affects negatively the spatial correlation by diminishing the correlation coefficient. A different approach it owned by the discriminant analysis, which shows what the analyzed data have, not in common, what differentiates them. This analysis is also based on clusters, called in this case classes, and the main components are becoming vectors. Both analysis types use classification algorithms for the grouping, concepts of the decisional theory. The difference between the analysis of the main components an the discriminant analysis is the fact that the firs one doesnt know the clusters before, and the second one establishes them before based on the distance and only after puts the data in the predefined classes.

Figure 5 shows an example of discriminatory analysis on a sample formed by some experimental tests with the values of the engine torque; applying the discriminant analysis on obtain ten classes, numbered in the graphic. The percentage number of available data that havent being incorporated in any cluster presents the result of the previous analysis in the figure 6. From the graphic we can see a good class classification, the values of the error being somewhere between 0.16 -1.4%.

Figure 5. Principal Components (Classes)

Figure 6. Misclassification Error

Finally, we also need to mention that geostatistics uses multiple regressions [7], which allows the establishment of mathematical models for the dynamics of the vehicle and of the engine, as it is shown in the example from the figure 7. In the graphic are shown the experimental dynamic series and the one obtained based on the mathematical model, as well as the rest (the difference between the two); also are shown the values of the coefficients of the multiple regression according to the linear mathematical model in general frequently used in these fields literature.

Figure 7. Multiple Regression As a conclusion that can be drawn, the application of multivariable statistics in the study of vehicle dynamics allows the indication of functional connections between the different elements and the establishment of mathematical models that describe these liaisons by using multiple regression. Bibliography 1. H. Wackernagel, Multivariate Geostatistics, Springer-Verlag, Berlin, 1995 2. I. Copae, Car Dynamics. Theory and Experimental Research, Military Technical Academy, Bucharest, 2003 3. P. Arabie, L. J. Hubert and G. De Soete, Clustering and Classification, Word Scientific Publishing, Singapore, 1996 4. G. J. McLachlan, Discriminant Analysis and Statistical Pattern Recognition, John Wiley & Sons, New York, 1992 5. I. T. Jollife, Principal Component Analysis, Springer-Verlag, New York, 1986 6. A. Basilevsky, Statistical Factor Analysis and Related Methods, Wiley, New York, 1994 7. L. Fahrmeir and G. Tutz, Multivariate Statistical Modelling Based on Generalized Linear Models, Springer-Verlag, New York, 1994

Vous aimerez peut-être aussi