Vous êtes sur la page 1sur 3

New Versions of Principal Component Analysis for Image Enhancement and Classification

Qiuming Cheng
Department of Earth and Atmospheric Science, Department of Geography, York University
4700 Keele Street, Toronto, Ont. M3J 1P3, Canada

Abstract – Two new versions of principal component other examples involving selection of subset of samples
analysis methods were introduced. The first applies a in conducting PCA can be found in the literature. In
spatial weighting factor to the samples on the basis of most commercial GIS and image processing packages, it
location properties of the samples. The second version provides two standard means for selecting pixels for
involves a new definition of high-order correlation image processing: one is to set a rectangle window with
coefficient. The former can enhance the effect of an extent so that the processes only apply to the pixels
important samples and reduce the influence of the less within the window; and the other sets a polygon mask in
important samples, whereas the later can enhance the such only the pixels within the mask are included in the
influence of sample high or low values. The principal of image processing. Constructing a mask in GIS is very
the methods were introduced with the case study of flexible, for example a rock formation or other sort of
identification of Au/Cu associated alteration zones from patterns can be selected by querying from a map and
Landsat TM images. then defined as a mask. However these two methods can
be considered as using the binary mask with two values
I. INTRODUCTION 1 and 0. The former can be treated as the special case of
the later when the mask becomes a regular shape. There
Principal component analysis (PCA) has become a is no doubt that the constraints added by these two
standard statistical approach for image processing for methods to remove the irrelevant samples from the
the two basic purposes: to reduce the numbers of analysis are essential for some of the applications. In
correlated images to form a small number of the current paper this idea will be extended to have a
independent principal components to represent the most more general form to affect the contributions of samples
variability of the data carried by the multiple band by setting a spatially weighting factor. The brief
images; and to increase the interpretability of the introduction of the principal of the method is given
components as the combinations of the multiple images. below. Denote A, B, C as images to be processed and
The foundation of PCA is the correlation matrix their values at the location (i, j) as Aij, Bij, Cij,
measuring the interrelationships among multiple respectively. A weighting factor was defined as an
variables. When it is used in dealing with spatial data in image W with values from 0 to 1, 0 ≤ Wij ≤ 1 [2]. The
GIS and for image processing, a number of properties weighted correlation coefficient between images A and
can be taken into account to improve the method in B was defined as [2]
order to increase the effectiveness of the method. In this
paper, two versions of newly modified PCA will be R( A, B) =
∑ Wij ( Aij − A )( Bij − B ) (1)
introduced [1]. The first version involves a modification ∑ Wij ( Aij − A ) 2 ∑ Wij ( Bij − B ) 2
of the correlation matrix by weighting the samples
where the letters A and B with bar at the top stand for
based on the properties of sample locations. The second
the mean values of Aij and Bij. The effect of applying
version introduces a high-order statistics in construction
the weight W to the correlation coefficient can be seen
of correlation matrix. The former can enhance the
from the following properties:
importance of samples with special locations and the
1. R(A, B) is symmetrical,
later enhance the samples with high or lower values in
2. -1 ≤ R(A, B) ≤ 1,
the calculation of principal components. These two
3. R(A, B) = 1 or –1 iff A and B are in positive or
versions of PCA can be useful for identifying special
negative linear relationship from each other.
entities as well as for general image enhancement.
4. If Wij = constant, R(A, B) reduces to the
ordinary correlation coefficient.
II. SPATIALLY WEIGHTED PCA
5. If Wij is a binary image with two values 1 and
0, then W is equivalent to the ordinary mask.
As an ordinary statistical method, PCA requires random
The first three properties ensure that the definition (1)
samples with multiple attributes. Obviously, to make a
meets the basic properties required as correlation
selection to subset samples to represent particular area is
coefficient. The last two properties demonstrate that the
often necessary. For example, if the study is mainly for
definition (1) is a generalized form in comparisons with
characterizing certain rock types, a limitation of pixels
the ordinary correlation coefficient and the correlation
within the rocks might be needed to ensure the main
coefficient with a binary mask applied. The ordinary
principal components reflecting the rock patterns. Many
mask treatment becomes a special case of the spatial

0-7803-7536-X/02/$17.00 (C) 2002 IEEE 3372


weighting approach. Similarly the Eigen values and The high-order correlation coefficients with mixing
Eigen vectors can be calculated and loadings of each negative and positive power orders (q1 q2 = - 1) are
image on all components and the scores of all images on similar to taking the “ratios” of images. Applying ratio-
each component can be calculated and mapped. This transformation to images is commonly seen in remote
method has been implemented in a newly developed sensing image processing. It can be seen that the
GeoDAS GIS [2]. Using GeoDAS one can define a ordinary correlation coefficient and ratio-transformation
weighting image with values representing relative are similar to applying the special high-order correlation
importance of pixel locations. The values on the coefficient. In other words the high-order correlation
weighting image can be, for example, the distance from coefficient is a more generalized form of the ordinary
ore deposits, density of ore deposits, distance from correlation coefficient and the commonly used ratio
contacts or faults, and concentration values of trace transformation. Should the ordinary (low-order) or
elements, etc. The scores of images on a principal high–order statistics be used depend on the objectives of
component can be taken as a weighting factor as well. individual applications. For some applications such as
The spatially weighted PCA can enhance the influence identification of alteration zones for mineral exploration
of those pixels with large weights (weights close 1) and or impact zones for environmental assessment from
reduces the effect of pixels with small weights (weights remotely sensed images, the useful information might
close to 0). This method has been used to analysis be carried over by weak or strong signals hidden in the
geochemical anomalies for mineral exploration [2]. average or background values caused by normal
features acting as noise. Use of high-order statistics can
III. HIGH-ORDER CORRELATION COEFFICIENT enhance the contribution of high or low values in the
analysis. The high-order correlation coefficient for
In general, spatial patterns with self-similarity or self- multiple images can be used as the correlation matrix to
affinity can be characterized by means of non-linear conduct PCA. The results will be affected by the
approaches such as multifractal modelling. Non- determination of orders of power-transformation. This
linearity of high-order statistical property involved in method will be demonstrated with a case study of
changing scales or resolutions are commonly observed identification of alteration zones from Landsat TM
in various types of remotely sensed images. Most images from the Mitchell-Sulphurets mineral district,
statistics based on means and standard deviations are northwestern British Columbia.
lower-order moment statistics. From multifractal point
of view, lower-order moment statistics can only be IV. IDENTIFICATION OF ALTERATION ZONES
capable of characterizing the properties dominated by FROM LANDSAT TM IMAGES
values around the mean value. The behaviour of high or
low values along the two tails needs to be characterized The data to be used to demonstrate the application of the
with high-order (both positively and negatively) PCA with high-order correlation coefficient are the
moment statistics. The statistical regularities of the seven band Landsat TM images, received on 9
anomalous or extreme values (both directions) often September 1985, covering the Mitchell-Sulphurets
show non-linearity with respect to changing measuring mineral district, northwestern British Columbia [3].
scales. The definition of high-order correlation These images have been studied for alteration
coefficient was proposed to enhance the influence of the identifications [1][3] and for non-linear modeling [1].
anomalous and extreme values on the measurement of The dataset consists of seven TM images (bands 1 to 7)
correlation [1]. The definition can be expressed as with 30-meter resolution for bands 1 to 5 and 7 (120
∑ ( A q ij − A q )( B q ij − B q )
1 1 2 2 meter for band 6) each of which contains 496c × 777r
R( A q1 , B q2 ) = pixels covering an area of about 350 km2. The PCA will
∑ ( A q ij − A q ) 2 ∑ ( B q ij − B q
1 1 2 2
)2 be applied to bands 1 to 5 and 7 only for maintaining a
where Aq and Bq are the pixel values raised to the power uniform resolution. Fig.1 shows the TM band 5 image
of q. R(Aq1, Bq2) is the high-order correlation coefficient and the property of the scaling independence. Self-
which can be calculated with the ordinary correlation similar patterns are shown in Fig. 1 as changing the
coefficient equation applying to the power-transformed resolutions from 30 meter to 480 meter by re-sampling
image values. The order of the power q1 and q2 can be the image. Fig. 2 shows the power-law relationship
any values including negative values and non-integer between the high-order correlation coefficients
values. It has been proved that the effect of applying q- calculated (R) for bands 2 and 5 and bands5 and 7,
th power-transformation to the image values are either respectively, and the resolution (ε) at which the
enhancing the influence of high values if with positive q correlation coefficients were calculated. Among
(>>1) or the influence of low values with negative q various combinations, R with q1=1 and q2 = 1 provides
(<<0). When q1 = q2 = 1 the high-order correlation good correlation between bands 5 and 7, whereas R with
coefficient becomes the ordinary correlation coefficient. q1= -0.5 and q2 = 0.5 optimizes the best for bands 2 and

3373
5. Considering the optimum correlations among all flexible TM and TM
TM and TM
5 7
q q
1 2
images, the values of q = -2 for bands 1 to 3 and q = 3 mean for
2 5
0 1 1
3 3
for bands 4, 5, 7, were used to calculate the correlation optimizing -0.1
-.5 .5
-.5 .5
coefficient matrix for the 6 images. PCA has been the -1 1

Log R( q 1, q2 , ε )
-0.2
1 1
further applied to the high-order correlation matrix and correlation -0.3

the scores on the first three principal components were s between -0.4
-1 1
mapped and compared with other geological and images. It -0.5

alteration information available in the area for can be -0.6

-0.7
geological interpretation. The patterns shown by the applied to -2 3
-0.8
scores on the first and the second components clearly enhance 1.4 1.65 1.9 2.15 2.4 2.65 2.9
Log ε
represent the alteration zones. The first component the
Figure 2. Plot showing relationship
reflects the entire outcropping altered area and the influence between R and measuring scale (ε). Log-
second component mainly highlights the alteration of the high transform are 10 based.
zones with NNE-SSW orientation probability or low
corresponding to the more intensive potassium values of
alteration zone with gold/copper mineralization. A images in calculation of the principal components. As a
colour composite map of the first three components was more generalized method in comparison with the ratio-
constructed and is shown in Fig. 3. Unlike the ordinary transformation, PCA with high-order correlation matrix
PCA, the new results obtained by the new version of will provide a powerful tool for general image
PCA highlight not only the main alteration zones but enhancement.
also the differences within the alteration zones.

480
240
120

60

TM Band 5

ε = 30m
Figure 1. Landsat TM band 5 from the Mitchell-
Sulphurets mineral district, northwestern British
Columbia received on 9 September 1985 (Rencz et al.,
1994). Images re-sampled to reduce the resolution.
Figure 3. Composite map created
from three principal components.
V. CONCLUSIONS Purple patterns – alteration zones,
light blue – intensive alteration zones,
pink – wed land and water, and black
Incorporated the spatial information about the locations – snow and ices.
of samples into account in the calculation of the
correlation matrix, the PCA can enhance the effect of REFERENCE
the important samples and reduce the influence of the
less important samples. This is beyond the capability of 1. Q. Cheng, “Multifractality and spatial statistics,”
the ordinary masking approach provided by some of the Computers & Geosciences, v. 25, 1999, p. 949-961.
commercial GIS and image processing systems. With 2. Q. Cheng, “GeoDAS Phase I:User’s guide &
the aid of GIS and the support of diverse data layers, the Exercise manual,” unpublished notes, York
spatially weighted PCA will become a standard image University, 2000, 298p.
processing technique. 3. A. Rencz, J. Harris, B.B. Ballantyne, “Landsat TM
imagery for alteration identification,” Current
Constructing the high-order correlation coefficient Research 1994E, Geological Survey of Canada,
index introduced in the current paper has provided a 1994, p. 277-282.

3374

Vous aimerez peut-être aussi