A Modified Metric To Compute Distance

Pattern Recognition, Vol. 25, No. 7, pp.
667-677, 1992 Printed in Great Britain
0031-3203/92 $5.00 + .00 Pergamon Press Ltd (~ 1992 Pattern Recognition Society
A MODIFIED METRIC TO COMPUTE DISTANCE

D. CHAUDHURI, C. A. MURTHY and B. B. CHAUDHURIt Electronics and Communication Sciences Unit, Indian Statistical Institute, 203 Barrackpore Trunk Road, Calcutta 700035, India
(Received 31 August 1990; in revised form 18 June 1991; received for publication 13 November 1991)
Abstract--Euclidean distance is used in many practical problems. This paper proposes a new metric which is close to the Euclidean distance and also computationally more efficient. This metric is helpful when the dimension of the data set is large. Bounds of a measure of merit of the new metric as well as of the City-block and Chessboard metrics with respect to the Euclidean metric are analyticallyestablished. The utility of this metric is shown on a randomly generated data set in the context of clustering. Euclidean distance City-block distance Chessboard distance Minimal spanning tree Clustering Image processing Pattern recognition
1. I N T R O D U C T I O N
Distance is a very important concept used widely in applied science problems such as pattern recognition and image processing.0-5) It is desirable that the distance is a metric. Three special cases of the Lp (Minkowsky) metric, namely City-block distance dc, Euclidean distance dE and Chessboard distance dM, are popular. For two n-dimensional points, X = (x~, X2. . . . . Xn) and Y = (Yl, Y2 . . . . . y~), the LP metric is defined as
n
where dc, dR and dM correspond to p = 1, 2, and ~, respectively. Since the conventional data space is Euclidean, it is natural to use Euclidean distance in such a space. But computation of Euclidean distance is expensive in a high dimensional space, especially when such computation is to be performed on a large amount of data. A n example is the processing of multispectral imagery which contains more than a million pixels per image frame and many such frames are to be processed. In statistical pattern recognition(13) methods also, the distance is to be computed iteratively on a large amount of data. In order to reduce the computation, City-block and Chessboard distances are often used in image processing(12) and related problems. While these distances are computationally more efficient, they deviate markedly from the Euclidean framework and cause the accuracy of the final result to suffer. It is, therefore, desirable to find a distance function that is as close to Euclidean as possible and yet requires much less computation effort. Also, it is t Author to whom correspondence should be addressed. 667
desirable that this new distance is a metric. The purpose of the present paper is to propose such a metric. While various other metrics are defined~6/or distance functions on a digital grid are widely investigated, 17-9) no work similar to the present study is known to the present authors. The advantage of the metric proposed here is that its applicability is not restricted to digital space. It is a simple combination of City-block and Chessboard distances requiring very little computation effort. The new distance dN is proposed and its metricity is established in Section 2. Various properties of dN including its upper bound with respect to Euclidean distance are established in Section 3. An application of dN to cluster analysis of artificial data is presented in Section 4. Finally, a generalized class of metrics is proposed on the basis of the results in this paper.
2. M A T H E M A T I C A L F O R M U L A T I O N
Definition 1. Consider a bounded set S C_ Rn so that Int (S) 4: O where "Int" denotes interior.(1) A metric d in S is a mapping d : S S ~ [0, oo) satisfying the following properties: d(X, Y ) >- O V X , Y@ S d(X, Y ) = OC:>X = Y d(X, Y ) = d(Y, X)
and (la) (lb) (lc)
d(X, Y) <- d(X, Z) + d(Z, Y ) V X, Y, Z E S.

(ld) The last relation is called a triangular inequality.
Definition 2. For two points, X = (Xl, x2 . . . . . xn) and Y = (Yl, Y2. . . . , yn), let IXi - - Yil be the maximum for
668
D. CHAUDHURIet al.
i = ixa-. The proposed distance is defined as
dN(X, Y) = [ x i ~ - Yixel
i.e.
ICi - hi[ ~ ~ Ici[ Jr"~ [bi[

i=1
i=1
i=l
1
n -
~
i~ix.y
Ixi-Yi[
(2)
i=l [ai[ <- ~ Icil + i=l i=1
~ Ibi[.
(I3)
where [a] means the integral part of "a", i.e. the largest integer -< a.
Thus inequality (I1) holds with the help of relations (I2) and (I3). So (ld) is satisfied by dN. Hence it is a metric. Note that the inequalities are numbered with "I" to distinguish them from equations.
Theorem 1. dN is a metric. Proof. Let X = (xl, x2, . . . , xn), Y = (Yl, Y2 . . . . .

Yn) and Z = (zl, z2 . . . . . zn) be three n-dimensional points in S. Let [xi - Yil = )a,l, [Y, - zil -- Ib, t and
Ix,- z,l = Ic, I.

It can be shown easily that dN satisfies (la)-(lc). Property (ld) is true for n = 1. Without loss of generality let n->2. Now
3. P R O P E R T I E S O F T H E P R O P O S E D M E T R I C
dN(X,Z) + dN(Z, Y) -> dN(X,Y)

*~ Ico~l +
n --
As stated in Section 1, dN should satisfy the following desirable properties: (i) dN is computationally more efficient than dE; (ii) dN should be close to dE than dc and dM. Computational aspects are discussed at the end of this section. About the second property, it may be verified that dc over-estimates and dM under-estimates the Euclidean distance, i.e. dc -> dE > dM. To satisfy the desirable property (ii), the new distance also should satisfy dc-> dN-----dM so that it can be closer to dE. It is readily seen that dN given by equation (2) satisfies this inequality. Our next step is to find how close dN is to dE. To compare any metric d with respect to dE, let us define
An(d) =
i~ixz
lea
+ [bizy[
1
n[-~]
~ Ib, l
i=l
i~iz Y
-> la,~l +
n -i-~ixy
la,I
{Jd(X, Y ) - d E ( X ,
dE(X,Y)
Y)I
.X, Y E S
(3)
+ i Ict + i I ,l
i=l i=1
and gn(d)= sup An(d). It is desirable that gn(d) should be as small as possible.
->(n--[n~22]--l)
laixvl+~lail.
i=l
(II)
The following interesting results can be proved.
Now Ici - bi[ <-I%zl + Ib izr[, i.e. max

i
Theorem 2. If dN> dE then
Ic, -
b,I -
max i
{ICixz[ dr"Ibi~,l}
JV~1+ ~-+-~T) -I
-g'(dN)-- [ ~ ( 1 + 4(n-) ~T~
F ]/
4 ( n - 1)\
forn=2,4,6 ....
i.e.
la,A-< Icixzl + Ib,z,I.

Observe that
1) _ 1 forn=3,5,7 ....
(4)
Theorem 3.
g , ( d c ) = X/n - 1 for all n.
SO
(5)
Theorem 4.
gn(dM) = 1 - ~ n n-2 (I2)
foralln.
(6)
Again Ici - b i I <- [ci I + [bi l, i.e.
Theorems 2--4 provide the exact upper bounds of deviation of the new metric under the condition dN> dE as well as the City-block and Chessboard
A modified metric to compute distance

4.00 3.6e
669
m gnCdN )
0 Un(d c)
~, Qn(dM )
3.20
A
2.80
2.40
0
0 0 0 0 0 0
|
.
v "I[3 C
2.00
1.60 1.20
0
0.80
0.40
0
0
e El
,~
B.
,
[]
I
[]
I
r~
I
r~
l
r~
l g
B
l
ra
, i . w ) | i i | i
4 6 n (Dimension)
10 .....
12
14
16
lg
20
Fig. 1. g,(ds), g,(dc) and g,(dM).
metric from the Euclidean metric. They are compared in Fig. 1.
metric somewhat under-estimates the Euclidean metric.
Lemma 1. If n = 2 then dN(X, Y) >-dE(X, Y)

foreveryX, Y E S . (I4) Since by lemma 1 dN ~z dE for n = 2, theorem 2 also provides the exact upper bound of deviation of the new metric from the Euclidean distance.
Theorem 6. For sufficiently large n, i.e. when n ~ oo, the limiting value of gn(ds) lies between 0 and 1, i.e.
0 -< lira gn(dN) --< 1. (I8)
Theorem 7. For sufficiently large n, dN(X, Y) is closer to dE(X, Y) than both dc(X, Y) and riM(X, Y) for every X, Y E S, i.e. IdN(X, Y) - dE(X, Y)I <--Idc(X,Y) - dE(X, Y)I
(19) and
Theorem 5a. If dE > dN then

g.(dN) --< 1 1 n--2 forn --~ 3. (15)
Ids(X, y) - dE(X, y)I <--IdE(X,Y) - dM(X, y)[. Theorem 5b. If dE > d N then
g,(dN)>-- 1 I X/n I X/n" (n - 1) (I10) All the theorems and lemmas are proved in the Appendix.
(n
(16)
Theorem 5 states that for dE > tiN, the upper bound of deviation of the new metric from the Euclidean metric lies between the right-hand side of inequalities (I5) and (I6).
3.1. Computational aspect

We observe that for each pair of points in R" the following computations are needed for Euclidean distance: (i) n multiplications, (ii) (n - 1) additions and (iii) one square root. On the other hand, computation of dN(X, Y) needs: (i) (n - 1) comparisons, (ii) (n + 2) additions and (iii) two divisions. Since multiplication needs more computation than comparison, dN(X, Y) is more efficient than
Lemma 2. For sufficiently large n

dN(X, Y) > dE(X, Y) (17) With Lebesgue measure zero, i.e. we almost always have dN(X, Y) <- dE(X, Y). Lemma 2 states that in higher dimension, the new
dE(X, r).
670
D. CHAUDHURI et al.
Table 1. The data set considered for clustering Data point 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Dimension 8 9 0.204 0.513 0.306 0.164 0.450 0.243 0.451 0.833 0.501 0.171 0.791 0.158 0.404 0.996 0.151 0.215 0.576 0.882 0.484 0.139 1.212 1.532 1.211 1.681 1.615 1.904 1.105 1.267 1.694 1.310 1.970 1.336 1.888 1.669 1.370 1.680 1.764 1.634 1.653 1.561 1.527 1.375 1.522 1.337 1.715 1.744 1.403 1.355 1.306 1.581 1.860 1.862 1.500 1.914 1.310 1.678 1.425 1.799 1.275 1.387 0.812 0.429 0.794 0.784 0.975 0.363 0.456 0.635 0.222 0.782 0.002 0.224 0.947 0.533 0.153 0.522 0.651 0.857 0.020 0.689 1.171 1.617 1.116 1.722 1.662 1.430 1.009 1.656 1.102 1.989 1.363 1.553 1.062 1.793 1.119 1.541 1.486 1.558 1.570 1.156 1.246 1.543 1.076 1.221 1.202 1.050 1.165 1.528 1.538 1.353 1.781 1.567 1.106 1.286 1.419 1.549 1.538 1.763 1.901 1.978
1 0.000 0.934 0.958 0.599 0.179 0.412 0.103 0.516 0.717 0.734 0.379 0.938 0.171 0.876 0.629 0.182 0.761 0.353 0.347 0.130 1.372 1.035 1.211 1.725 1.828 1.380 1.776 1.508 1.168 1.345 1.040 1.459 1.703 1.208 1.661 1.373 1.578 1.578 1.559 1.051 0.603 0.809 0.665 0.519 0.100 0.102 0.555 0.556 0.516 0.517 0.723 0.765 0.552 0.163 0.254 0.443 0.552 0.810 0.626 0.491
4 0.020 0.642 0.240 0.726 0.699 0.445 0.420 0.362 0.998 0.668 0.687 0.134 0.190 0.407 0.077 0.350 0.465 0.341 0.108 0.606 1.010 1.702 1.414 1.096 1.262 1.625 1.179 1.195 1.576 1.645 1.609 1.057 1.576 1.082 1.712 1.975 1.418 1.067 1.615 1.263 0.226 0.961 0.685 0.392 0.959 0.909 0.693 0.157 0.936 0.293 0.275 0.599 0.297 0.817 0.942 0.444 0.622 0.987 0.887 0.810
5 0.074 0.356 0.090 0.736 0.957 0.652 0.902 0.507 0.622 0.848 0.525 0.200 0.136 0.980 0.313 0.494 0.320 0.840 0.071 0.921 1.988 1.871 1,953 1,815 1,696 1,229 1,756 1.052 1.766 1.269 1.352 1.844 1.315 1.699 1.013 1.821 1.058 1.666 1.372 1.670 0.625 0.998 0.778 0.560 0.136 0.886 0.794 0.626 0.803 0.221 0.580 0.736 0.292 0.866 0.919 0.821 0.599 0.090 0.637 0.984
6 0.267 0.362 0.377 0.886 0.448 0.906 0.635 0.786 0.743 0.073 0.967 0.999 0.101 0.218 0.187 0.811 0.736 0.971 0.456 0.069 1.836 1.909 1.988 1.025 1.823 1.751 1.923 1.559 1.409 1.808 1.630 1.552 1.699 1.460 1.670 1.150 1.589 1.392 1.694 1.657 0.713 0.344 0.502 0.837 0.186 0.133 0.529 0.338 0.396 0.686 0.007 0.023 0.085 0.839 0.035 0.926 0.991 0.654 0.840 0.612
10 0.041 0.952 0.008 0.222 0.798 0.990 0.677 0.318 0.824 0.156 0.889 0.922 0.047 0.237 0.556 0.199 0.725 0.211 0.760 0.879 1.115 1.911 1.793 1.207 1.435 1.445 1.109 1.534 1.369 1.139 1.447 1.295 1.376 1.741 1.381 1.128 1.043 1.647 1.543 1.888 1.731 1.882 1.759 1.293 1.774 1.610 1.365 1.973 1.473 1.894 1.943 1.646 1.133 1.491 1.728 1.195 1.401 1.390 1.934 1.382
11 0.935 0.855 0.906 0.282 0.011 0.675 0.958 0.188 0.843 0.895 0.316 0.513 0.761 0.624 0.957 0.495 0.491 0.550 0.383 0.076 1.155 1.919 1.719 1.743 1.654 1.805 1.573 1.297 1.293 1.936 1.415 1.794 1.698 1.308 1.221 1.899 1.879 1.856 1.130 1.922 0.170 0.410 0.872 0.774 0,825 0.203 0.707 0,090 0,993 0.186 0,629 0.775 0.840 0.368 0.593 0.231 0.559 0.476 0.495 0.490
12 0.240 0.560 0.361 0.693 0.887 0.138 0.662 0.270 0.241 0.963 0.895 0.777 0.141 0.612 0.742 0.177 0.421 0.401 0.461 0.545 1.894 1.312 1.175 1.594 1.011 1.822 1.458 1.979 1.436 1.364 1.471 1.110 1.804 1.179 1.891 1.237 1.888 1.316 1.887 1.543 0.444 0.515 0.398 1.007 1.982 1.734 1.952 1.782 1.700 1.065 0.286 0.840 0.845 0.794 0.007 0.630 0.749 0.345 0.559 0.503
13 0.031 0.666 0.012 0.616 0.222 0.756 0.346 0.930 0.956 0.725 0.522 0.045 0.001 0.059 0.837 0.610 0.108 0.451 0.316 0.591 1.966 1.602 1.576 1.881 1.175 1.691 1.592 1.194 1.985 1.763 1.093 1.514 1.543 1.298 1.360 1.336 1.420 1.189 1.153 1.959 0.137 0.406 0.539 0.072 0.470 0.574 0.35l 0.885 0.264 0.718 0.048 0.062 0.514 0.445 0.704 0.701 0.462 0.782 0.901 0.603
14 0.023 0.960 0.824 0.461 0.348 0.294 0.119 0.145 0.569 0.687 0.079 0.282 0.739 0.845 0.348 0.063 0.859 0.100 0.748 0.641 1.748 1.805 1.883 1.941 1.958 1.744 1.424 1.357 1.983 1.306 1.314 1.090 1.021 1.175 1.141 1.878 1.528 1.287 1.937 1.867 0.822 0.800 0.652 0.370 0.981 0.835 0.537 0.270 0.288 0.723 0.716 0.812 0.479 0.530 0.166 0.534 0.028 0.587 0.375 0.092
15 0.857 0.766 0.835 0.221 0.091 0.958 0.598 0.504 0.809 0.593 0.774 0.285 0.420 0.539 0.553 0.888 0.180 0.540 0.644 0.523 1.794 1.409 1.112 1.717 1.166 1.246 1.221 1.397 1.031 1.966 1.047 1.919 1.232 1.373 1.606 1.247 1.388 1.024 1.248 1.567 0.701 0.145 0.064 0.571 0.655 0.845 0.065 0.658 0.352 0.872 0.869 0.310 0.246 0.171 0.656 0.893 0.010 0.485 0.145 0.124
0.001 0.005 0.891 0.944 0.851 0.484 0.076 .0.069 0.088 0.915 0.656 0.224 0.999 0.068 0.713 0.629 0.768 0.152 0.123 0.129 0.931 0.177 0.659 0.511 0.461 0.223 0.470 0.940 0.928 0.905 0.117 0.068 0.576 0.608 0.496 0.801 0.216 0.175 0.986 0.746 1.523 1.786 1.058 1.038 1.586 1.615 1.342 1.529 1.517 1.563 1.784 1.280 1.448 1.702 1.061 1.791 1.432 1.077 1.788 1.622 1.547 1.923 1.327 1.833 1.947 1.349 1.161 1.088 1.607 1.696 1.783 1.337 1.246 1.272 1.973 1.637 1.141 1.813 1.075 1.989 0.511 0.637 0.543 0.974 0.689 0.148 0.533 0.532 0.458 0.846 0.719 0.397 0.728 0.374 0.747 0.480 0.178 0.424 0.933 0.949 0.493 0.452 0.775 0.762 0.522 0.165 0.764 0.115 0.987 0.637 0.754 0.538 0.274 0.682 0.767 0.315 0.397 0.743 0.642 0.431
0.934 0.961 0.449 0.689 0.081 0.566 0.694 0.151 0.865 0.805 0.083 0.192 0.386 0.493 0.306 0.419 0.534 0.270 0.098 0.127 1.122 1.619 1.350 1.818 1.670 1.444 1.736 1.883 1.562 1.430 1.606 1.718 1.363 1.469 1.900 1.504 1.011 1.360 1.816 1.912 0.657 1.079 1.006 1.978 1.899 1.824 1.028 1.400 1.144 1.126 0.820 0.511 0.877 0.244 0.938 0.169 0.557 0.114 0.305 0.816

4. APPLICATION
31 37 ~/~'] 0 3~3335 9 24 30
671
In this section an example is provided to show the applicability of the new distance measure in the context of clustering. This example also shows that the City-block and Chessboard distances are not better approximations to Euclidean distances than the new distance. Data have been artificially generated in a 15dimensional space. Three sets A = [0, 1] 15, B = [1, 2] 15 and C = [0, 1] 7 [1, 2] 3 [0, 1] 5 are chosen so that C overlaps with A and B partially. Twenty points are randomly selected from each set. The 60 points are numbered 1, 2 , . . . , 60 where the points numbered 1-20 are in A, 21-40 are in B and 41-60 are in C. The points are shown in Table 1. The procedure to classify the above 60 points is based on the minimal spanning tree (MST). (4) There are various ways in which clusters are detected using MST. Here, one of the methods is stated below. (1) Draw the MST of 60 points where the edge weight is considered to be the distance between the corresponding points. (2) Remove those two edges from MST whose edge weights are maximum. The points in the resulting three trees give the three clusters.
~.0
3 ~ 5 B
48
"~,,,s'..... 4 s
57 59 60
50 51 /*1 /,4
12 15'I" 17
14
52
19 20
10 ,..... 7 4
II
I 36
29 26
Fig. 3. The MST using the Euclidean distance.

37
35
30
22
28 23 27 32 25 38 31 39
Scale ~--~ : 1
40
24
3/, 57
/,2
59 (-5 ~6
47
/,9
60
5~. 55 5 56
53
Here, we have considered four distances dc, dE, dM and dN and a MST is drawn for each of them (Figs 2-5, respectively). Note that the MST using the Euclidean distance is closer to the MST using the new distance than the MST using the Chessboard distance, since the MST using the new distance differs from the MST using the Euclidean distance at 14 places whereas the MST using the Chessboard distance differs from the MST using the Euclidean distance at 29 places. Also Table 2 gives the classification results under the four distances. Observe that dE, dM and d N classify all points correctly. But City-block distance does not provide the required classification. Thus the new distance is indeed close to the Euclidean distance.
5. DISCUSSION
58
' 10 t, 9 6
~,l
51 ~ 15
14 18
12
52
In this paper, an approximation to Euclidean distance is suggested for higher dimensions and it is compared with the City-block and Chessboard distances. But the existence of an even better approximation to Euclidean distance is to be studied. A class of metrics can be defined on the basis of the results of this paper as
,=2
Fig. 2. The MST using the City-block distance.
p(X, Y) = al + -
(7)
672
D. CHAUDHURI et al.
20 Stole I I-- ]
3 SCOlel !:1
/
21, ~7
30
36---'~,
29~\
" 1
59
~~ss
34
"N~""~ 22
t8
5~,
~$3
~56 ~4.9
42 "58
20
15
54 ~~ 0 I ,
191 , ~ 1 7 7
56
It,
12~--.--~I1
I\
~ ~ 2
1
I) ./
0
Fig. 5. The MST using the new distance.

1
Fig. 4. The MST using the Chessboard distance.
where X = (xl, x2 . . . . . xn) and Y = (Yl, Y2. . . . . Yn) are two n-dimensional points in S, [xi - Yil = ai for all i, a l > ai for i = 2, 3 . . . . . n, f , > 0 for all n and {f.} is a sequence in n. It can be easily shown that for every n and for every positive f., the distance measure in equation (7) is indeed a metric. The new metric stated in this paper is a particular case of equation (7). Thus equation (7) gives a generalized system of metrics. The utility of the above relation for various values
of {f,} is to be studied in practical problems. Currently we are investigating this topic in problems related to clustering, distance transforms, etc.
6. SUMMARY
This paper deals with the problem of finding a new distance measure for large dimensional data so that the proposed distance is computationally more efficient than the Euclidean distances and it is closer to the same than the City-block and Chessboard distances which are computationally more efficient than the Euclidean distance.
Table 2. Three dusters formed by using various distances Distance measure 1. New distance The clusters A = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20} B = {21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40} C = {41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60} A = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20} B = {21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40} C = {41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60}
2. Euclidean distance
3. Chessboard distance A = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20} B = {21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40} C = {41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60} 4. City-block distance A = {1, 2, 3, 4, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60}
B = {5,
20}
C = {21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40}
A modified metric to compute distance Let the City-block, Chessboard, Euclidean and the new distances between the points X = (x~, x2, . . . . x,) and Y = (Yl, Y2. . . . . y~) in a b o u n d e d set S C_ R n be denoted by dc(X, Y), dM(X, Y), dE(X, Y) and dN(X, Y), respectively. T h e n
673
A n example is provided to show the utility of the new distance in the context of clustering on a 15dimensional data set. Finally a generalized system of metrics is defined as
d c ( S , Y) = ~
i=l
IXi
- Yil
- yi[
p(X, Y) = a 1 -t- - -
i=2
f.
dM(X, Y) = Max
i
Ix,
dE(X, Y)
--
(X i -- yi) 2
i=1
dN(X, Y ) = l x i ~ - y i ~ l
+
n -
where [xi - yil = ai for all i, al >- ai for all i, f, > 0 for all n and {f,} is a measure in n. It can be easily shown that for any positive sequence {f,} the generalized distance measure is indeed a metric. The new metric stated in this paper is a particular case of the generalized system of metrics.
Acknowledgement--The authors wish to thank Prof. D.
i~ixy
Ix~ -- Yil
Dutta Majumder and Mr N. Chatterjee for their interest in this work. The authors also acknowledge Mr J. Gupta for typing the manuscript and Mr S. Chakraborty for his drawings.
REFERENCES
where Ixi - Yil is m a x i m u m for i = ixy and [a] indicates the integral part of "a", i.e. the largest integer -< a. It is very easy to see that dN is computationally more efficient than dE. It is shown that (i) dN(X, Y) is a metric. (ii) For sufficiently large n IdN(X, Y) - dE(X, and tdN( X, Y) - dE(X, (iii) g"(dN) = Sup {" will be I~/( 4(n- 1)~_ 1+(n+2)2 ] 1 4(n--1)~_ 1+(n+3)2 ] 1 forn=2,4,6 ....
r)l
-< IdE(X, Y ) - d u ( X , Y)I
Y)l -< Idc(X, Y) d--~, "~
dE(X, Y)t.
rld~(X, r ) - dE(X,Y)I
_/
:X, Y e S }
g'(dN)= (~(
forn=3,5,7 ....
if d N > d E. Otherwise
1
1 ~/n
1
~/n
( n - 1)
{n_[~_.2]}
1. G. Borgefors, Distance transformations in arbitrary dimensions, Comput. Vision Graphics Image Process. 27, 321-345 (1984). 2. B. Kleiner and J. A. Hartigan, Representing points in many dimensions by trees and castles, J. Am. Statist. Ass. 76, 260-276 (1981). 3. M. Yamarhita and T. Ibaraki, Distances defined by neighbourhood sequences, Pattern Recognition 19, 337-346 (1989). 4. C. T. Zahn, Graph theoretic methods for detecting and describing gestalt clusters, 1EEE Trans. Comput. No. 20, 68--86 (1971). 5. M. R. Anderbug, Clustering Analysis for Application. Academic Press, New York (1971). 6. R. M. Cormack, A review of classification, R. Statist. Soc. Series A 134(3), 321-367 (1971). 7. A. Rosenfeld and J. L. Pfaltz, Distance functions on digital pictures, Pattern Recognition 1, 33--61 (1968). 8. R. A. Melter and I. Tomescu, Path generated digital metrics, Pattern Recognition Lett. 1,151-154 (1983). 9. P. E. Danielsson, Euclidean distance mapping, Comput. Graphics Image Process. 14, 227-248 (1980). 10. T. M. Apostol, Mathematical Analysis, pp. 151-152. Addison-Wesley, Reading, Massachusetts (1957). 11. P. Billingsley, Probability and Measure. Wiley, New York (1979). 12. J. A. Richards, Remote Sensing Digital Image Analysis--an Introduction. Springer, Berlin (1986). 13. P. A. Devijver and J. Kittler, Pattern Recognition--a Statistical Approach. Prentice-Hall, London (1982).
APPENDIX
Proof of theorem 2. In order to find gn(dN) we shall use the <--g.(dN) <-- 1 and following theorem. (m)
0 - lim g . ( d y ) <- 1.
n~
(iv) Min {Sup ~ "ldc - dE I'[, Sup ~ IdM-7-dz [ ~ [ dE J I. dE JJ
> sup{%: !}.
Theorem 2.1. Let f have continuous second-order partial derivatives on an open set S in E,. Let X0 be a point of S for which Dlf(Xo) . . . . . D,f(Xo) = 0. Assume that the determinant A = det {Di.jf(Xo)} ~ O. Let A0 = 1 and A,-k be the determinant obtained from A by deleting the last k rows and columns. If the n + 1 numbers A0, A~. . . . . A, are all positive then f has a local minimum at X0. If these numbers are alternatively positive and negative, then f has a local maximum at X0. Now, let al -> ai, i = 2, 3, . . . , n.
674
D. CI4AUDHURI et
al.
Case (i).
Let n be even, i.e. let n = 2k. Then
It can be easily found that Djj atZ, = - (k2 + 4k - 1)(k + 1)

' ( k 2 + 4 k ) 3/2
g.(dN) = Sup { ~ }
a 1 +.--.~a
k+li=
(n 2 + 8n - 4)(n + 2)
(n 2 + 8n)3/2
1 and
(A4)
= Sup
2k
~ ",i=1
a,9
[1 + .----v~ z, ~___k ,=___2_~ + 1

=Sup
Dij
(A1)
k+ 1 4(n + 2) at 2 = (k 2 + 4k)3/2 = (n2+ 8n)3/2,
i@j.
(A5)
.J(l+
~ , z~)
Now the determinant
D2.2 D3.2
D2.3 D3.3
..... .....
D2.,
D3 m
A=
Dn, 2 Dn, 3 .....
D,,,
(n-l)x(n-l)
-(n2+8n-4) 4 n+2 ~,-t
4 - (n 2 + 8n - 4)
..... .....
4 4
= [(n 2 + 8n)3/2 J
4 It can be seen that where zi = 4 ..... - ( n 2 + 8n - 4)
ai/al,
i = 2, 3 . . . . .
1
2k. Let A=(

2k
f n+2 I ~-1 1)"'t( n2..... + 8n)3/2 /
1 + .-""'-7. ~ , zi f(Z) =
~/\
x {4(n - 2) - (n 2 + 8n - 4)}(n 2 + an) ~-2 #: 0 1 (A2)

.
k+li= 2
2f
n + 2
~,-~
1+
i=2
z
/
~._, = (-I) - I(.2 + gn)~/~I

x {4(n - 2) - (n 2 + 8n - 4)}(n 2 + 8n) "-2 < 0
n_3 (
where Z = ( z 2 , z 3 . . . . . Z2k). Now differentiating (A2) partially with respect to zj, j = 2, 3 . . . . . 2k A'-2=(-1)
1 1
-t---
n+2
"1n - I
I(n2+8n)3/2 I
2k
~
2*
/=2
/
----Zj
X (n 2 + 8n)"-3{4(n - 3) - (n 2 - 8n - 4)} > 0 (A3)
Dj -- Ozj -
/
\
2k
i=2
\ 3/2
/
(1 + X
Equating given by
8f/Sz/to
zero we get the only real solution of z~

n+2 ~2
2 -A2 = [ ( n 2 + 8n)3/2 J {(n 2 + 8 n - - 4 )
1 ~=k+1 Let
fori = 2,3 . . . . ,2k.
16}>0.
Thus
A n _ l , An_2, . . .
, A 2 are alternatively
positive
and
Let
negative. Hence function (A2) has a local maximum at Z. The maximum value =~/(1 2k-1\ 4 ( n - 1)
Di, j = OZiOZj.
675
Case (ii).
Let n be odd, i.e. n = 2k + 1. This proof is similar to that of case (i) that the function 1 2kl
If both inequalities (A-I1) and (A-I2) are true, then the only possibility is g , ( d c ) = V'n - 1. n. There-
'+
flZ)
=
SX z, i=2
(A7)
Proof of theorem 4. Let a~ >-aifor

fore
i = 2, 3 . . . . .
l+Ez
i=2
has a m a x i m u m at Z = (~2, 23 . . . . . ,?.~ = ~ - ~ V i

1
12k+i) where , 2 k + 1.
= 2,3,...
T h e m a x i m u m value =
1+ ~
- 1 (A8) Now
1 + ~'-~"~'7)
- 1.
~\i=1
a
/
<-a,X/n
(sincea,
>-aigi).
Proof of theorem 3.
g . ( d c ) = Sup { ~ }
Therefore 1 ~ ai < 1
1
~n'
=Sup
~.\i= 1 /
,/2
(sincedc ~dE)
Hence
g.(du) <- 1
Y~a~
iffil
X/n"
(A-I3)
= Sup
(A9)
,1(~ a}ln)
~ 'iffi 1
/
Again let us a s s u m e that the points are b o u n d e d in ndimensional space. A s in the case of t h e o r e m 3, consider the points X and Y such that Ix~ -,Yil = ai = m V i = 1, 2 . . . . . n. Now dE = mVn and dM = m. Therefore
IdE Now
aMI
1
-
1
~n'
dE
. a, 2 X a~ >>[i-:-~-) ( f r o m C a u c h y - S c h w e r t z i n e q u a l i t y ) .
i=l
/.._Z
So
g.(dM) =
Sup
>- 1 - ~ n '
(A-I4)
Therefore
n
Hence, from inequalities (A-I3) and (A-I4) we conclude that

1
1
n
~=,
g.(dM) = 1 - -~n" g Proofoflemma

1. Let al -> a2 >- 0. So dE(X, Y) = X/(a~ + Now (a~ +
Therefore g . ( d c ) -< V'n - 1. (A-I1)
dN(X, Y) = at + a2/2. a2/2) 2 = aT + a2/4 + ala2. Therefore

a 2) and
Again let us a s s u m e that the points are b o u n d e d in ndimensional space. Consider two points X and Y such that [x~ - y~[ = ai = mVi=l,2 . . . . . n. Now dE = m%/n and dc = nm. Therefore dc - dE = ~ / n -- 1. A,(dc) = dE g.(dc), which is the s u p r e m u m of A,(dc), cannot be smaller that ~/n - 1, i.e.
al+
-(a21+a2)=-~(4a,-3az).
-
Since al ~ a2, we have 4al
3a 2 ~ 0. Therefore (al +
a2/2) >-X/(a] + a2). H e n c e d~(X, Y) >- dE(X, Y). Proof of theorem 5a. Let
fore a, >- ai for i = 2, 3 . . . . .
n. There-
a I -{" r/-
a~
(All)
g~(dN)=Su p 1 g.(dc) = Sup { ~ } >- ~v/n - 1 .

(A-I2).
%/(~a21)
676
D. CHAUDHURI et al.
Now we know that

n
:} al
ff
a~
~-
a~
al +
n--
n. ~ ai > n-i= I
~(~ai)2+2(n_[_7])n-2 xal~a~(nn-2 2 , 2
a~+
X/(~aT)
~Xi=l
->
:
X/(~a2)
~\i=l :
Therefore
In reality, values of ai are usually bounded. In that case, for sufficiently large n, d N < dE because inequality (A-I8) involves n. Putting n = 2k in (A-I8) we have
2k 2 2k 2k
a,+
1-
. >
,
"
.
~I-
"
(~ail
\i=2 /
+2(k+l)a1~ai<>(k+l)
i=2
2 ~ a 2.
i=2
(A-I9)
,/(~a~i)
~\i=l
X/(~a2i)
~\i=l /
Let a~ < M for all i. Therefore, the left-hand side of (A-I9) will be less than (2k - 1)2M 2 + 2(k + 1)g(2k - 1)M = (4k + 1)(2k - 1)M s. (A-I5) Let m < az for all i. (m > 0 is true for many values of X and Y.) So the right-hand side of (A-I9) will be greater than (k + 1)2(2k - 1)m 2. Now for sufficiently large n (i.e. for large k) d N < dE =>(4k + 1)(2k - 1)M 2 < (k + 2)2(2k - 1)m 2
From ( A l l ) and (A-I5) we have

n
g,(ds) < 1
1 n-[~-~]X
/=1 (f~a~)'
\i=1 /
(A-I6)
Again ~ai
i=1
Ms
k2+2k+l
4k+ 1
"
:>
,/fi
~\i=l
,.2.)
/
<:=~- T <
k+l 4 + 4k+'-'-'-i"
Therefore
n
1-
1 n-2
i=1 ~(~
-<1
1 [~_2] '
Observe that if n/8 > MS/m 2, m ~ O, then dN < dE. Observe that m #: 0 for uncountably many points in S x S. In fact, for a given point X E S, m will be zero with Lebesque measure zero since the n-dimensional Lebesgue measure of an (n - 1)-dimensional set in zero. m> Hence dN > dE with measure zero. (A-I7)
Proof of theorem 6. Let a 1 > a i for i = 2, 3 . . . . . n. Now by lemma 2, ds < dE for sufficiently large n. Therefore
1
From (A-I6) and (A-I7) we have g,(dN) ~ 1

1
al +
_-"
^ ~ ai
lira
= 1 - lira
~xi= 1 /
Proof of theorem 5b. It has been assumed in Section 2 that

the points are bounded in n-dimensional space and the interior of the set is non-empty. Hence, there exists two p o i n t s X a n d Ysuch that [x~ - Y~I = ai m V i = 1, 2 . . . . . n. Therefore
=
Put n = 2k. If n ~ oo then k---} oo. So

2k
laE - a s l
(k + 1)al + ~ ai
,=s
lira
= 1 - lim (k + 1) a
('~'/oo form, by applying L'Hospital's rule) m +

n--
(n - 1)m
al
= 1 -
>
11 1
m~/n
~l X i = l
( n - 1)
=i-:~n-:~nn'{n_[~_2]
}"
Proof of lemma. Let al >-a, for i = 2, 3 . . . . .

dN X d E
n. Now
From the above limit we see that the limit exists, but it is not possible to find the exact value of this limit. We see that if a2 = a3, . = a2k = 0 then the limiting value will be 0. Otherwise if as, a3, . . . , az~ are non-zeros then the limiting value will be 1. Also if al = as . . . . . aEk then the
A modified metric to compute distance limiting value will be 1. Now
677
(by putting n = 2k)

2k
'uPf l=Su f!im )

Therefore 0-< lim gn(dN)--< 1. n. Now
<:>4(k + 1) 2 ~ a~ X 4(k + 1)2a {

i-I 2k 2k 2
+ 4(k + l)(k + 2)al ~ ai + (k + 2) 2 (~,, ai)

i~2 n \1=2 n /
Proof of theorem 7. Let al -> ai for i = 2, 3 . . . . .

by iemma 2 dM <--dN --<dE --<dc. For dN <- dE --<dc IdN - dEI X Idc - dE[ ~:~dE -- dN X dc - dE ~ 2dE X d c + dN :~2(k+1)
"~/ \ i = l
~4(n + 2) 2 ~ a ] @ 4(n + 2)(n + 4)al ~ ai

i=2 i=2
+ (n + 4) 2
a,
(A-I10)
In practice values of ai are usually bounded. In that case, for sufficiently large n, the less than relationship holds because inequality (A-I10) involves n. Hence for sufficiently large n fdN - d E l < l d c dEI.
a~ X 2 ( k + l ) a l + ( k + 2 ) ~ a ,
/ i=2
It is easy to see that IdN - - d E l < l d E - - dMI.
About the Author--D. CHAUDHURIwas born in Bolpur (Santiniketan), India. He received a Bachelor
of Mathematics (Hons) from Visva-Bharati University, Santiniketan, in 1984 and an M.Sc. (applied mathematics) from Jadavpur University, Calcutta, in 1987. Currently he is a regular research worker in the Department of Physical and Earth Science Division, Indian Statistical Institute, Calcutta. His fields of interest are pattern recognition, image processing and computer graphics.
About the h~uthor--C. A. MURTHYwas born in Ongole, India, on 12 June 1958. He received Bachelor
of Statistics (Hons) and M.Stat. degrees from Indian Statistical Institute (ISI), Calcutta, in 1979 and 1980, respectively. He worked as a research fellow in ISI up to 1987. He joined as a programmer in a project in which analysis of satellite imagery was the focal point. He received his Ph.D. degree from ISI, Calcutta, in 1989. The field of his doctoral dissertation was pattern recognition. Currently he is a programmer in the project on knowledge-based computing systems, which is jointly sponsored by UNDP and Department of Electronics, India. His fields of interest are pattern recognition, image processing, computer vision and fuzzy sets. He is a member of Indian Unit for Pattern Recognition and Artificial Intelligence and Indian Society for Fuzzy Mathematics and Information Processing.
About the Author--B. B. CHAUDHURIreceived the B.Sc. (Hons), B.Tech. and M.Tech. degrees from
Calcutta University, India, in 1969, 1972 and 1974, respectively, and the Ph.D. degree from Indian Institute of Technology, Kampur, in 1980. He joined the Indian Statistical Institute, Calcutta, in 1978 where he is currently a professor and Professor-in-charge of Physical and Earth Science Division. His initial research work was on dielectric and optical wave guides. Later on, he became more interested in pattern recognition, image processing, computer graphics and natural language processing. He has published 80 research papers in the international journals and has written a book entitled Two Tone Image Processing and Recognition. He was awarded the Sir J. C. Bose Memorial Award for best engineering science oriented paper published in JIETE in 1986 and the M. N. Saha Memorial Award for best application oriented paper published in 1989. He acts as a referee to many international journals. He was the winner of the Leverhulme Overseas Visiting Fellowship in 1981-82 to work at Queens University. He worked as a visiting faculty member at GSF, Munich, and a Guest Professor at the University of Hannover during 1986--88. He again visited several German, Italian and Swiss institutions during 1990-91. In 1986 he started a successful on-going Indo-German scientific collaboration in biomedical image processing and related topics. He is a senior member of IEEE and fellow/member of many academic professional bodies.

A Modified Metric To Compute Distance

Transféré par

Informations du document

Description originale:

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

A Modified Metric To Compute Distance

Transféré par

Droits d'auteur :

Formats disponibles

Pattern Recognition, Vol. 25, No. 7, pp.

667-677, 1992 Printed in Great Britain

A MODIFIED METRIC TO COMPUTE DISTANCE

d(X, Y) <- d(X, Z) + d(Z, Y ) V X, Y, Z E S.

i = ixa-. The proposed distance is defined as

ICi - hi[ ~ ~ Ici[ Jr"~ [bi[

Theorem 1. dN is a metric. Proof. Let X = (xl, x2, . . . , xn), Y = (Yl, Y2 . . . . .

Ix,- z,l = Ic, I.

dN(X,Z) + dN(Z, Y) -> dN(X,Y)

The following interesting results can be proved.

Now Ici - bi[ <-I%zl + Ib izr[, i.e. max

Theorem 2. If dN> dE then

la,A-< Icixzl + Ib,z,I.

Again Ici - b i I <- [ci I + [bi l, i.e.

A modified metric to compute distance

Fig. 1. g,(ds), g,(dc) and g,(dM).

metric from the Euclidean metric. They are compared in Fig. 1.

metric somewhat under-estimates the Euclidean metric.

Lemma 1. If n = 2 then dN(X, Y) >-dE(X, Y)

Theorem 5a. If dE > dN then

3.1. Computational aspect

Lemma 2. For sufficiently large n

A modified metric to compute distance

Fig. 3. The MST using the Euclidean distance.

Fig. 2. The MST using the City-block distance.

Fig. 5. The MST using the new distance.

Fig. 4. The MST using the Chessboard distance.

Acknowledgement--The authors wish to thank Prof. D.

-< IdE(X, Y ) - d u ( X , Y)I

Y)l -< Idc(X, Y) d--~, "~

(iv) Min {Sup ~ "ldc - dE I'[, Sup ~ IdM-7-dz [ ~ [ dE J I. dE JJ

> sup{%: !}.

Let n be even, i.e. let n = 2k. Then

It can be easily found that Djj atZ, = - (k2 + 4k - 1)(k + 1)

[1 + .----v~ z, ~___k ,=___2_~ + 1

k+ 1 4(n + 2) at 2 = (k 2 + 4k)3/2 = (n2+ 8n)3/2,

Now the determinant

-(n2+8n-4) 4 n+2 ~,-t

2k. Let A=(

f n+2 I ~-1 1)"'t( n2..... + 8n)3/2 /

x {4(n - 2) - (n 2 + 8n - 4)}(n 2 + an) ~-2 #: 0 1 (A2)

~._, = (-I) - I(.2 + gn)~/~I

X (n 2 + 8n)"-3{4(n - 3) - (n 2 - 8n - 4)} > 0 (A3)

zero we get the only real solution of z~

fori = 2,3 . . . . ,2k.

A modified metric to compute distance

Proof of theorem 4. Let a~ >-aifor

has a m a x i m u m at Z = (~2, 23 . . . . . ,?.~ = ~ - ~ V i

Hence, from inequalities (A-I3) and (A-I4) we conclude that

g.(dM) = 1 - -~n" g Proofoflemma

Therefore g . ( d c ) -< V'n - 1. (A-I1)

dN(X, Y) = at + a2/2. a2/2) 2 = aT + a2/4 + ala2. Therefore

Since al ~ a2, we have 4al

g~(dN)=Su p 1 g.(dc) = Sup { ~ } >- ~v/n - 1 .

Now we know that

From ( A l l ) and (A-I5) we have

From (A-I6) and (A-I7) we have g,(dN) ~ 1

Proof of theorem 5b. It has been assumed in Section 2 that

Put n = 2k. If n ~ oo then k---} oo. So

('~'/oo form, by applying L'Hospital's rule) m +

Proof of lemma. Let al >-a, for i = 2, 3 . . . . .

A modified metric to compute distance limiting value will be 1. Now

[1 + .----v~ z, ~_k ,=_2_~ + 1