Vous êtes sur la page 1sur 20

US006295504B1

(12) United States Patent (10) Patent N0.: US 6,295,504 B1


Ye et al. (45) Date of Patent: Sep. 25, 2001

(54) MULTI-RESOLUTION GRAPH-BASED Threshold Validity For Mutual Neighborhood Clustering,


CLUSTERING Stephen P. Smith, IEEE Transactions On Pattern Analysis
And Machine Intelligence, vol. 15, No. 1, Jan. 1993, pp.
(75) Inventors: Shin-Ju Ye, Spring, TX (US); Phillipe 8992.
J. Y. M. Rabiller, Lescar (FR) Fast Parzen Density Estimation Using ClusteringBased
Brand And Bound, ByeungWoo Jeon et al., IEEE Transac
(73) Assignees: Halliburton Energy Services, Inc., tions On Pattern Analysis And Machine Intelligence, vol. 16,
Houston, TX (US); Elf Exploration No. 9, Sep. 1994, pp. 950954.
Production (FR) Motion Estimation Via Clustering Matching, Dane P. Kottke
et al., IEEE Transactions On Pattern Analysis And Machine
(*) Notice: Subject to any disclaimer, the term of this Intelligence, vol. 16, No. 11, Nov. 1994, pp. 11281132.
patent is extended or adjusted under 35
U.S.C. 154(b) by 0 days. (List continued on next page.)
Primary ExaminerDonald E. McElheny, Jr.
(21) Appl. No.: 09/586,129 (74) Attorney, Agent, or FirmConley, Rose & Tayon, RC.
(22) Filed: Jun. 2, 2000 (57) ABSTRACT
Related U.S. Application Data An apparatus and method for obtaining facies of geological
(60) Provisional application No. 60/161,335, ?led on Oct. 25, formations for identifying mineral deposits is disclosed.
1999. Logging instruments are moved in a bore hole to produce log
measurements at successive levels of the bore hole. The set
(51) Int. Cl.7 ..................................................... .. G01V 3/38
of measurements at each such level of the bore hole interval
(52) U.S. Cl. .................................................. .. 702/7, 702/11 is associated With reference sample points Within a multi
(58) Field of Search .................................. .. 702/6, 78, 11, dimensional space. The multidimensional scatter of sample
702/12, 13, 367/72, 73; 324/338, 339 points thus obtained is analyzed to determine a set of
characteristic modes. The sample points associated With
(56) References Cited characteristic modes are grouped to identify clusters. A
U.S. PATENT DOCUMENTS facies is designated for each of the clusters and a graphic
representation of the succession of facies as a function of the
4,646,240 2/1987 Serra et al. ........................ .. 364/422 depth is thus obtained. To identify the clusters, a neigh
FOREIGN PATENT DOCUMENTS boring index of each log measurement point in the data set
is calculated. Next, small natural groups of points are
2215891 9/1989 (GB) . formed based on the use of the neighboring index to deter
OTHER PUBLICATIONS mine a K-Nearest-Neighbor (KNN) attraction for each point.
Independently of the natural group formation, an optimal
Clustering Without A Metric, Geoffrey MattheWs et al., number of clusters is calculated based on a Kernel Repre
IEEE Transactions On Pattern Analysis And Machine Intel sentative Index (KRI) and based on a user-speci?ed resolu
ligence, vol. 13., No. 2, Feb. 1991, pp. 175184. tion. Lastly, based on the data calculated from the prior
Robust Clustering With Applications In Computer Wsion, steps, ?nal clusters are formed by merging the smaller
JeanMichael Jolion et al., IEEE Transactions On Pattern clusters.
Analysis And Machine Intelligence, vol. 13, No. 8, Aug.
1991, pp. 791802. 47 Claims, 7 Drawing Sheets

510
Determine Neighbor
Index (Local Density of
Data Distribution)
575 520

Determine Kernel
Determine KNN
Representative Index
Attraction (Form natural
small groups) (Determine Optimal
Cluster Number)

530
Perform Merging to
Form Final Clusters
US 6,295,504 B1
Page 2

OTHER PUBLICATIONS The Contribution Of Logging Data To Sedimentology And


A Least Biased Fuzzy Clustering Method, Gerardo Beni, et Stratigraphy, Oberto Serra et al., SPE 9270, Society of
al., IEEE Transactions On Pattern Analysis And Machine Petroleum Engineers of AIME, Sep. 1980, pp. 119.
Intelligence, vol. 16, No. 9, Sep. 1994, pp. 954960.
Fast NearestNeighbor Search In Dissimilarity Spaces, BoreholeAcousticsAnd LoggingAnd Reservoir Delineation
Andras Farago, et al., IEEE Transactions On Pattern Analy Consortia, Annual Report 1999, Earth Resources Labora
sis And Machine Intelligence, vol. 15, No. 9, Sep. 1993, pp. tory, pp. 125.
957962.
Application Of Kruskal Multidimensional Scaling (MDS) To FaciologAutomatic Electrofacies Determination, M.
Rock Type Identi?cation From Well Logs, Vaclav Matsas et Wolff et al., SPWLA TWentyThird Annual Logging Syrn
al., The Log Analyst, Jan.Feb. 1995, pp. 2834. posiurn, Jul. 69, 1982, pp. 123.
Automatic Determination Of Lithology From Well Logs,
Pierre Del?ner et al., SPE Formation Evaluation, Sep. 1997, Facies Model, Second Edition, Roger G. Walker, Geo
pp. 303310. science Canada, Reprint Series 1, McMaster University,
Synergetic Log & Core Data Treatment: A Methodology To May, 1984, pp. 16 p.
Improve ReservoirDescription Through C lusterAnalysis, C.
DescalZi, et al., SPE #17637, Society of Petroleum Engi The SelfOrganizing Map, Teuvo Kohonen, Proceedings of
neers, Nov. 1988, pp. 113. the IEEE, vol. 78, No. 9, Sep. 1990, pp. 14641480.
U.S. Patent Sep. 25,2001 Sheet 2 of7 US 6,295,504 B1

Nphi
-0.1 0 0.1 0.2 0.3 0.4 0.5
1.7
Q
0
1.9

2.1 a
2.3
RhoB
2.5

2.7

2.9 Electrofacies1
I Electrofacies2

3.1 I
Fig. 3
' 5lO
Flg- 4 Determine Neighbor
Index (Local Density of
Data Distribution)
575 520

Determine Kernel
Determine KNN
Representative Index
Attraction (Form natural
small groups) (Determine Optimal
Cluster Number)

550
Perform Merging to
Form Final Clusters
U.S. Patent Sep. 25, 2001 Sheet 3 0f 7 US 6,295,504 B1

670/
@
Initialize NxN array A to
contain indexes of N-1 nearest
neighbors for all points.
i
Sort vector A[i] for each point Xi,
620/ i=1 to N, in the data set.
Results of sorted index order of
vector A, j, stored in NxN array B.
l
Calculate value of m of point Xi
and of its jth neighbor y,

Determine on(x)=exp(-mlot)
640/ where on is user de?ned
parameter
l
N-1
650/
Determine s(x)= Z 6n(x)

Have all
points been
processed
?
660
\
Determine
Smin= Min{s(xi)}
i=1,N
, l
Have all
Calculate NlF l(x): Determine
points been
processed I(X) : :(X) ' 5min =
max- 3 mm
Smax= Max{s(xi)}
|=1,N

680) 670)
Fig.
U.S. Patent Sep. 25,2001 Sheet 4 0f 7 US 6,295,504 B1

Fig. 6 @
For each nearest neighbor q of a point p, determine
attraction function Attrp(q)=l(q)Vp(q), where Vp(q)=1, \ 720
if q is in the KNN of p. Vp(q)=O othen/vise.
l
Determine Attraction of point p by
Attfp=(I7\/1?/X(Attfp(q))
p
\ 730

Have all
points in the
dataset been
processed
?

740 K750
P is free attractor,
Is p not attracted Yes and kernel of a
by any other point? mode, local
maximum of PDF

/ 770
P is a related
2:83 p fatttrfact aphy Yes attractor. P is
o wioFrom fem 7 e
e data set.
a ofslope point
a mode

P is attracted by another point, but


does not attract any other point. P \
. . 780
IS a boundary or valley point.

Have we
unsetlect a.n d categorized each Yes F
.68 egonze point in total num. row.
point from total of data points N as attractlon
Set of. po?ts of boundary or mode Sets
3'29 ' point?
U.S. Patent Sep. 25,2001 Sheet 5 0f 7 US 6,295,504 B1

Fig. 7 Cg
For each point x in the set of all points of size N, determine
y, ?rst neighbor of x, satisfying l(y)>l(x). l(x) and l(y) are \8 7 O
the Neighbor Indexes of point x and y respectively.
l
Determine Kernel Representative Index F(x):
F(X)=Ia(><)*Mb(X,y)*D(X,y) \820
M(x,y)=m in which y is the mth neighbor of x, and
D(x,y) is the distance between x and y.

0.8 -

0.7 -

0.6 -

KRI
5 Clusters
0.5

0.4

8 Clusters
0.3 -

12 Clusters
0.2 '

13 5 7 911131517192123252729
Kernels in decreasing order of their KRI
U.S. Patent Sep. 25,2001 Sheet 6 0f 7 US 6,295,504 B1

Have all
possible pairs
g) [7007
Select a pair of attraction sets S1 and S2 among
of attraction
sets been all attraction sets determined previously.
processed l
'7
Select a pair of points p1 and p2 \ 7 005
' belonging to S1 and 82 respectively.

707O
No Is p1 a
boundary point
in S1?
Yes
7075
No Is p2 a
boundary point
in S2?
Yes
7 020
ls p1 in the
KNN of p2? Yes

No
7 025
No ls p2 in the
KNN of p1?

Yes

A l 7030

Is the distance
Yes /\ No between p1 and p2,
Have all pairs No D(p2,p1), among all Yes
of points been pairs of points,
processed? satisfying all the
above conditions,
minimum?
Save the values
(p1,p2,L) where
- |-=Min(|(P1),l(P2))
Flg. 7032f intoa list.
U.S. Patent Sep. 25,2001 Sheet 7 0f 7 US 6,295,504 B1

Q)
Sort the list in decreasing \ 7035
order with respect to L.
i
In decreasing order of L, for all pair of
points (p1, p2) of the list determine:

7035
Does the set S1
corresponding to
point p1 contain a
previously selected
cluster kernel?

1040

Does the set S2


corresponding to
point p2 contain a
previously selected
cluster kernel?

No
7

Merge sets 81 and S2. "\ l 045

7050
Have all (p1,p2)
in the list been
processed?

Fig. 9B
US 6,295,504 B1
1 2
MULTI-RESOLUTION GRAPH-BASED injected near the drilling tool. It is not normally cost
CLUSTERING effective to identify facies using these methods. Information
on geological formations traversed by a bore hole is more
This application claims bene?t to Us. Provisional commonly gathered by a measurement sonde passing
60/161,335 ?led Oct. 25, 1999. through the bore hole. The gathered information as a func
tion of the sondes position along the bore hole is then stored
BACKGROUND OF THE INVENTION or logged.
Many doWnhole measurement techniques have been used
1. Field of the Invention in the past, including passive measurements such as mea
The present invention generally relates to the geological suring the natural emission of gamma rays; and active
study of earth formations for the location and exploitation of measurements such as emitting some form of energy into the
mineral deposits using electrofacies analysis. More formation and measuring the response. Common active
particularly, the present invention relates to a neW system measurements include using acoustic Waves, electromag
and method for identifying formations of mineral deposits netic Waves, electric currents, and nuclear particles. The
using a user-friendly and reliable clustering technique that 15
sonde measurements are designed to re?ect the distinguish
can extract natural clusters from sets of logged data points ing characteristics of the rock facies. Multiple logs and
for improved electrofacies analysis of the formation. sondes may be used to gather the measurements, Which are
then correlated and standardiZed so as to furnish measure
2. Description of the Related Art
ments at discrete levels separated by equal depth intervals.
Mineral and hydrocarbon prospecting is based upon the The measurement standardiZation alloWs the automation of
geological study and observation of formations of the earths data interpretation in order to obtain estimates of the poros
crust. Correlations have long been established betWeen ity of the rocks encountered, the pore volume occupied by
geological phenomena and the formation of mineral and hydrocarbons, and the ease of How of hydrocarbons out of
hydrocarbon deposits that are suf?ciently dense to make the reservoirs in the case of petroleum prospecting. The set
their exploitation economically pro?table. of measured formation characteristics values that distinguish
25
The study of rock and soil facies encountered While the strata in a given bore hole is herein termed the electro
prospecting for minerals takes on particular importance. As facies.
used herein, a facies is an assemblage of characteristics that Interpretation studies have demonstrated a strong corre
distinguish a rock or strati?ed body from others. A facies lation betWeen the electrofacies and lithofacies, thereby
results from the physical, chemical and biological conditions making it possible to identify With con?dence the compo
involved in the formation of a rock With respect to other sitional characteristics of the rocks traversed by bore holes
rocks or soil. This set of characteristics provides information based on the sonde measurements. It has been established
on the origin of the deposits, their distribution channels and that the sets of log measurements (i.e. sample points) Which
the environment Within Which they Were produced. For correspond to a given lithofacies form a cluster in data
example, sedimentary deposits can be classi?ed according to space. That is, When the measured characteristic values of
35
their location (continental, shoreline or marine), according a formation are graphed, the points generally fall into a
to their origin (?uviatile, lacustrine, eolian) and according to continuous region Which is distinguishable from the regions
the environment Within Which they occurred (estuaries, Where points for other formations Would fall.
deltas, marshes, etc.). This information in turn makes it Various systems and method that use the correlation or the
possible to detect, for example, Zones in Which the prob observed clusters to identify lithofacies from electrofacies
ability of hydrocarbon accumulation is high. have been created. These systems take the logged measure
The set of characteristics used to de?ne a facies depends ments and convert them to a graph that furnishes, as a
on the situation. For example, a lithofacies may be de?ned function of position along the bore hole, an image of a
by the rocks petrographic and petrophysical characteristics. succession of facies. The graph typically also provides some
These are the composition, texture and structure of the rock. 45 indication of the measured formation characteristic values
Examples of mineral composition are silicate, carbonate, alongside the image. An example of one such system and its
evaporite, and so on. A rocks texture is determined by its output is described in US. patent application No. 4646240,
grain siZe, sorting, morphology, degree of compaction, and Which is hereby incorporated herein by reference. HoWever,
degree of cementation. The rock structure includes the before these systems can do the conversion, they must be
thickness of beds, their alternation, presence of stones, tailored to the drilling region.
lenses, fractures, degree of parallelism of laminations, etc. The most accurate existing systems and methods require
All of these parameters are related to the macroscopic a substantial amount of user participation to set up, and
appearance of the rock. conversely, those existing systems Which are highly auto
For extraction of hydrocarbons from geologic formations, mated tend to perform poorly. One proven approach uses a
the particularly desirable characteristics of the lithofacies are 55 tWo-step methodology to correlate different measured char
the porosity of the reservoir rocks and their permeability, as acteristic values into generaliZed electrofacies charts for
Well as the fraction of the pore volume occupied by these analysis. In the ?rst step, the number of clusters is speci?ed
hydrocarbons. These aid in estimating the nature, quantity, to an automatic clustering algorithm such as maximum
and producibility of the hydrocarbons contained in such likelihood algorithm, hierarchical clustering method,
strata. dynamic clustering or neural netWork. The number of clus
There are various sources of information on formation ters speci?ed is large, creating clusters containing small
lithofacies. Information may be gathered from subsurface numbers of points. Apetrophysicist or geologist then manu
observations such as, for example, by the study of core ally assigns geological characteristics from the facies to each
samples taken from rock formations during the drilling of a cluster and simultaneously merges similar small clusters into
bore hole for an oil Well. Such information can also be 65 electrofacies.
provided by drill cuttings sent up to the surface from the Another approach for creation of electrofacies charts
bottom of a Well by means of a ?uid (generally drilling mud) requires that the number of clusters speci?ed to the auto
US 6,295,504 B1
3 4
matic clustering algorithm be relatively small. In this BRIEF DESCRIPTION OF THE DRAWINGS
approach, the geologist often has a problem assigning spe
ci?c geological facies to the clusters, Which tend to be much For a more detailed description of the preferred embodi
larger than the clusters in the previous approach. The geolo ments of the present invention, reference Will noW be made
gist may also be required to lump together geological to the accompanying draWings, Wherein:
facies at a coarser level of distinction than might be appro FIG. 1 illustrates logging equipment in operation in a bore
priate. A large number of clusters require much Work by the hole;
geologist to match clusters to geology; too feW clusters FIG. 2 illustrates a computer system used to process the
cause the geologist problems in making meaningful linkages logged data and determine mineral compositions of the earth
betWeen clusters and geology. 10 formation;
The electrofacies analysis systems described above suffer FIG. 3 illustrates the scattering of points representative of
from various limitations and draWbacks. The automatic the values of tWo characteristic parameters of the formations
clustering methods require the user to provide an initial measured Within a given depth interval;
number of clusters before processing. This is a limitation FIG. 4 is a How diagram shoWing the steps for Multi
because the results are very sensitive to this parameter. 15
Resolution Graph Based automatic clustering;
Furthermore, unless the number is large, the identi?ed
clusters may have shapes that are not geologically mean FIG. 5 is a How diagram shoWing the steps to determine
ingful. This prevents them from being directly used for Neighboring IndeX Function;
facies analysis. On the other hand, manual merging of a FIG. 6 is a How diagram shoWing the steps to determine
large number of small clusters based on similar geological K-Nearest-Neighbor Attraction;
characteristics by hand makes this process sloW and subjec FIG. 7 is a How diagram shoWing the steps for determin
tive. Furthermore, because electrofacies analysis occurs in ing the Kernel Representative IndeX;
N-dimensional space it is still dif?cult even for a trained FIG. 8 is a curve of the Kernel Representative IndeX in
individual With good visualiZation tools to identify clusters decreasing order Which may be used to determine the
manually. Thus, it is desirable to develop a system and 25 optimal number of clusters at different resolutions; and
method that, in a relatively constant, reliable, and systematic FIGS. 9A and 9B shoW the steps for performing merging
manner, permits automatic clustering of logged data to to form ?nal clusters.
eXtract information about the geological facies of the data.
While the invention is susceptible to various modi?ca
SUMMARY OF THE INVENTION tions and alternative forms, speci?c embodiments thereof
Accordingly, there is disclosed herein a method for iden are shoWn by Way of eXample in the draWings and Will
ti?ng formations of mineral deposits. In one embodiment of herein be described in detail. It should be understood,
this method, logs are made over multiple levels Within an hoWever, that the draWings and detailed description thereto
interval along the bore hole in order to obtain a group of are not intended to limit the invention to the particular form
several measurements for each of these levels. With each 35
disclosed, but on the contrary, the intention is to cover all
such level of the bore hole interval is associated a sample modi?cations, equivalents and alternatives falling Within the
point Within a multidimensional space de?ned by the dif spirit and scope of the present invention as de?ned by the
ferent logs. The sample point coordinates are a function of appended claims.
the logging values measured at this level. The sample points
DETAILED DESCRIPTION OF PREFERRED
thus obtained Will form a scatter diagram Within this mul
EMBODIMENT
tidimensional space.
The sample points of this scatter diagram are used to FIG. 1 shoWs logging equipment in a bore hole 10 going
determine a set of characteristic modes, each corresponding through earth formations 12. The equipment includes a
to a Zone of maXimum density in the distribution of these sonde 16 is suspended in the bore hole 10 at the end of a
samples; each mode is regarded as a characteristic of a 45 cable 18. Cable 18 connects sonde 16 both mechanically and
respective cluster and the surrounding samples of this cluster electrically, by means of a pulley 19 on the surface, to a
are related to it. Afacies is designated for each of the modes control installation 20 equipped With a Winch 21 around
thus characteriZed and a graphic representation is produced Which the cable 18 is Wound. The control installation
as a function of the depth of the succession of facies thus includes recording and processing equipment knoWn in the
obtained. The characteristic modes of each cluster are made art that make it possible to produce graphic representations
up of sample points coming from the measurements them called logs of the measurements obtained by the sonde 16
selves. according to the depth of the sonde in the bore hole or Well
To identify the clusters, a neighboring indeX of each log 10.
measurement point in the data set is calculated. NeXt, small The bore hole 10 passes through a series of earth forma
natural groups of points (called attraction sets) are formed 55 tions (not speci?cally shoWn) that is typically composed of
based on the use of the neighboring indeX to determine the a series of Zones or beds. The Zones are identi?ed by the
K-Nearest-Neighbor (KNN) attraction for each point. Inde rock facies they contain, e.g. clay, limestone, etc. From the
pendently of the natural group formation, the optimal num geological vieWpoint, each of these successive Zones is
ber of clusters is calculated based on the Kernel Represen characteriZed by a relative homogeneity that is revealed by
tative IndeX (KRI) and a user-speci?ed resolution. Lastly, a set of characteristic data values (facies). These values vary
based on the data calculated from the prior steps, ?nal from one Zone to another, but have a relatively limited range
clusters are formed by merging the attraction sets. of variation Within a given Zone. These data, Which depend
Experimentation con?rms that the above method alloWs in particular on the mineralogical composition, the teXture
an accurate determination of the geological facies derived and the structure of the rocks making up these Zones,
from the logging measurements obtained Within an interval 65 identify respective facies.
of geological formations traversed by a sonde traveling in a It is possible to establish a correspondence betWeen, on
bore hole. the one hand, different facies characteriZed by mineralogical
US 6,295,504 B1
5 6
factors, texture and structure and, on the other hand, elec tools, among others. FIG. 3 demonstrates in tWo dimensions
trofacies Which can be obtained directly from a suitable the results of the automatic clustering technique described
quantitative analysis of a set of logs measured by the sonde beloW. Although only 2 dimensions are shoWn, the technique
as it traverses the bore hole. The possibility of establishing is generally applied to N-dimensions for N log measurement
such a correspondence betWeen electrofacies and facies is properties (4-dimensions Were used to obtain the clusters
capable of providing a valuable aid in the geological knoWl shoWn in FIG. 3).
edge of a Zone of the earths crust Within a given region,
Once a set of measurement points has been obtained, it is
such knoWledge being useful in completing the information
usually available to geologists and, in certain cases, helping desired to partition the points. Initially, this partitioning is
them in the interpretation of the facies encountered to obtain achieved by gathering the points into clusters using a cluster
information on the history of the formations and for deter
10 identi?cation algorithm. Many such algorithms exist, and
mining the concentrations of mineral deposits. they almost universally require that the data be normaliZed.
An approximate image of the facies may be produced by There are several Ways to normaliZe the data. One clas
?rst obtaining a number of logs over a bore hole interval sical method, frequently used, is to limit the data in an unit
H1H2. The measured values are discretiZed and correlated hypercube [0,1]i, d being the number of features, Which
15 corresponds to the number of dimensions. In each
in depth so as to have, for each level of the interval
considered, a set of distinct log values. In one embodiment,
dimension, that dimensions minimal value is subtracted
the measurements are discretiZed into levels at 15 cm from the data, and the difference is divided by the total range
intervals. The set of log values thus obtained is analyZed to of the data in that dimension. In another method, the average
determine groups of consecutive levels that have log values value of each feature is subtracted from the data, and the
Within a given range. The upper and loWer values of the difference divided by the standard deviation. The normal
range are based on the potential variations that may be iZation changes the distance betWeen data points and affect
caused by, eg bore hole conditions such as roughness or the natural separation of data points, but it is necessary to
caving of the bore hole Walls. Those consecutive levels prevent an improper choice of scale in one dimension from
having log values in the given range may be considered to dominating the measurements in other dimensions.
25
share a true physical characteristic value that is substan The clustering algorithms can be divided into parametric
tially constant over those levels. In this manner, those Zones algorithms, and non-parametric algorithms. Parametric algo
having relatively constant values may be portrayed as facies rithms are generally regarded as being less desirable than
having the indicated measurement values. non-parametric algorithms because parametric algorithms
To generate an image of the facies using the logged data are based on some model of the data, Whereas non
points, processing of the logged data measured by the sonde parametric algorithms make no assumptions about the data
as it traverses the formation may be handled at the Well site pattern. One consequent advantage of non-parametric algo
by a computer system such as that shoWn in FIG. 2. The rithms is that they are capable of recogniZing clusters of
computer system consists of a keyboard 308 and a monitor varied shapes.
306 to permit user interaction With the computer toWer 302 35
One example of a non-parametric approach is to divide
containing the CPU and peripheral hardWare. Logs gathered the observations space (eg the graph of FIG. 3) into regular
at the Well site may be graphically displayed on the monitor hypercubes of a ?xed siZe and estimating the probability
306 and clustering (described beloW) may be performed to density function (PDF) of the data based on the number of
determine the petrophysical characteristics of the formation. measurement points in each hypercube. One draWback of
Alternatively in a second embodiment, because of cost, this approach is that the number of hypercubes increases
space, poWer, and transportation restrictions, one may prefer exponentially as the number of data dimensions.
storage of the logged data gathered by the sonde in a storage Furthermore, this approach encounters critical dif?culties
medium such as a 3.5 diskette, tape or recordable compact When the data includes clusters that are closely spaced
disc 310. Processing of the data for determination of for and/or clusters of very different densities or siZes.
mation characteristics may then occur at the of?ce or labo 45 Another example of a non-parametric approach is the
ratory using more poWerful CPUs at a substantially loWer K-Nearest-Neighbor (KNN) approach. Rather than estimat
cost than at the Well site. ing the PDF by determining hoW many points are in a given
As shoWn in FIG. 3, if one analyZes a scattering of points data volume (eg a hypercube), the PDF is estimated by
representative of the logs carried out on a succession of measuring the data volume occupied by a given number of
levels in a bore hole interval, it is noted that the distribution points. The K-Nearest-Neighbor approach estimates the
density of these points in the scatter varies. This ?gure is the PDF around a point by determining the radius from the point
result of Multi-Resolution Graph-based Clustering (MRGC) to its Kth nearest neighbor.
using four logs (Nphi, RhoB, GR and DT), although only the Another example of a non-parametric approach is
tWo log measurements RhoB and Nphi are shoWn in the tWo described by C. T. Zahn in Graph theoretical methods for
dimensional graph. RhoB corresponds to a density measure 55 detecting and describing Gestalt Clusters, IEEE Trans. On
ment log and Nphi is a porosity measurement log measured Computers, v. C-20, n. 1, pp. 6886, 1971, incorporated
by a Compensated Neutron Log (CNL) tool. Examples of herein by reference. This approach relies on graph theory. A
other log measurement properties include natural gamma connected graph is constructed by linking the data points xi
radiation measurements (GR), temperature measurement by arcs according to their proximity relationship.
(HRT), inverse of square root of resistivity measurement In a graph representation of a data set, each observation
near the Wall of the Well in the invaded Zone (HRXO), xi is represented by a node, and an arc is established betWeen
acoustic Wave transit time measurement (DT), measurement tWo distinct nodes if they are linked by a relationship de?ned
of resistivity of formation far from Well or bore hole (RT), in the data set S. The graph is denoted by G=(S, A), Where
and measurement of resistivity near bore hole Wall (RXO). A is the set of arcs. Various graph structures have been used
These measurements may be made by logging tools such as 65 for clustering. Four particular structures are the Minimum
resistivity tools, induction tools, nuclear magnetic resonance Spanning Tree (MST), Relative Neighborhood Graph
tools, thermal neutron decay tools, and gamma radiation (RNG), Gabriel Graph (GG), and Delaunay Triangulation
US 6,295,504 B1
7 8
Graph (DTG). The MST is the graph that connects every then it is b thereafter. In practice, b is set equal to b=K+1,
node together With the shortest overall arc length. The RNG, thereby providing all points outside the neighborhood of
GG, and DTG are graphs With nodes coupled based on interest an equal ranking.
increasingly relaxed distance requirements. For a given set The limited rank On(x) is de?ned above for n=1, 2, . . . ,
of measurement points, these graphs are related in the K, ie for xs K nearest neighbors. The sum of the limited
folloWing manner: ranks for each point x is expressed as:

MsTgRNGgGGgDTG (Eqnl) K (Eqn 3)


s(x) : 2 03/10:).
The inclusion of Gg G means that, for the same node set S, n:l
10
the set of arcs of G is contained in the set of arcs of G. The
connectivity of MST implies the connectivity of the other 3 The smallest sum of the limited ranks is expressed:
structures, so that these graph-based methods begin With a
single cluster that is to be divided.
Typically, heuristic rules are then applied to remove the Sim-n = [Ii/E35 (1a)}, (Eqn 4)
15
inconsistent arcs that bridge inherent separations among
potential clusters. The elimination of these arcs breaks the
Where N is the number of measurement points in the data set
graph into several connected sub-graphs, called connected
components. Each connected component gathers a group of S. The largest sum of the limited ranks is similarly
points that are recogniZed as a cluster. The dif?culty With expressed:
this approach lies in the determination of the heuristic rules.
Generally, these rules are based on the comparison of max
(Eqn 5)
valuation of arcs With a homogeneity criterion for detecting
arcs With uncommon valuation (the inconsistent arcs).
HoWever, no such rules have been successfully established Using the largest and smallest sums of the limited ranks, the
for more than three dimensions. Furthermore, the resulting limited rank sums can be normaliZed to a range of 0 to 1:
clusters are often unstable due to the irregularity of data
/
distribution (i.e. the identi?ed clusters are highly sensitive to 5min
/ t
small disturbances in the data). Smax 5min

A fourth non-parametric approach is described by the 3O


authors of the folloWing references:
This is the Boundary Index function I(x). Because a point
C. Gan et al Classi?cation non supervise par detection near a mode of the PDF is more likely to be the nearest
des zones frontires application en reconnaissances des
neighbor of nearby points than a point near a valley of the
formes et segmentation, 13ime Colloque GRETSI, Juan PDF, a boundary index value near Zero indicates that mea
les-pins, Tome 2, pp. 11051108, 1991, and C. Gan, Une35 surement point x is neara mode of the PDF, Whereas a value
approche de classi?cation non supervise base sur la near one indicates that measurement point x is neara valley
notion des k plus proches voisins, French Ph.D. Thesis, of the PDF.
University of Technology of Compigne, 1994. In this After determining the boundary indices, Gan applies a
approach, the KNN approach is combined With the graph relaxation process to the boundary indices to remove the
theory approach to devise a Boundary Index that alloWs local irregularities. Gan then separates the cluster kernel
detection of data clusters from data sets of any dimension
points from the boundary points by a simple thresholding of
and of very complex shapes and con?gurations. Like KNN, the boundary index values. In using this approach, the
Gan requires a single parameter: the number of neighbors K. optimal value of K is determined by systematically repeating
That is, given a value for K, the number of clusters is the process for various values of K and observing the
automatically determined. Unlike KNN, Gans Boundary resulting numbers of clusters. A stable Zone may appear in
Index is sensitive to the change of the local PDF rather
Which a range of consecutive K values results in a consistent
than to the PDF itself. The KNN estimates the local PDF, number of clusters. Any of these K values may be consid
Whereas the Boundary Index (BI) indicates Whether a point ered optimal.
is relatively close to a mode of the PDF (i.e. a local
Advantages of Gans approach include the identi?ability
maximum density Zone, or the center of a cluster) or
of clusters having varied shapes, densities and volumes,
relatively close to a valley of the PDF (i.e. a local minimum even When they are small. Disadvantages include cluster
density Zone, or the border of a cluster). Gans Boundary sensitivity to K. As expected, increasing K generally
Index is noW developed here in detail in the context of
decreases the number of clusters due to a stronger smoothing
measurement point clustering. effect. HoWever, When the clusters are not Well-separated,
Let measurement point x be an element of the set of
55 the stable Zone disappears and it becomes dif?cult to
measurement points S={x1, x2, . . . XN} and let measurement
identify a suitable value for K. Further, the organiZation of
point y be measurement point xs nth Nearest Neighbor the clusters is very sensitive to small changes in K. Finally,
(NN) in the set of measurement points S, nK. The limited for very unbalanced cluster siZes, this approach fails to
rank of measurement point x With respect to its nth NN, y,
identify obvious clusters. For example, experiments indicate
is de?ned to be:
that for a log data set containing 2500 points With more than
a third of these data points concentrated in a very small
o';l(x) : m if x is the m' NN of y, m s K (Eqn 2)
space forming a very compact cluster, and having the
remaining points dispersed in a very large Zone forming
many clusters of great volumes, Gans approach does not
65 identify the obvious compact cluster as one cluster. Rather,
Where b is set so that K+bN-1. In other Words, the Gans approach generates several clusters in the small
limited rank On(x) is the rank of x relative to y up to K, and compact Zone and a feW clusters in the large Zone. The
US 6,295,504 B1
9 10
disadvantages and drawbacks of Gans approach may be The smallest and largest sums are expressed
overcome by the method of Multi-Resolution Graph-based
Clustering (MRGC) shown in FIG. 4. 5min = gym. and (Eqn 9)
Prior to clustering, the MRGC method requires the log
measurements to be normaliZed. As With all clustering Smax = gig/{NWO} (Eqn 10)
methods using the concept of distance, the MRGC is
sensitive to scale changes. It is therefore desired to ?rst
normaliZe the data so as to balance the Weights of each
The Neighboring Index function I is de?ned to be:
measured feature (a feature represents one dimension of data
point). Hereafter, it is assumed that the data has been
M) = 5(X) _Smin (Eqn 11)
normaliZed in some fashion. Smax 5min
The ?rst step in the MRGC method, block 510, deter
mines a Neighboring Index value for each measurement
point. The neighboring index values are then used in block 15
The value of I varies betWeen 0 and 1. Unlike the boundary
515 to automatically form small basic data groups by use of index function, the higher the value of the neighboring index
a multidimensional KNN point-to-point attraction algo function I, the closer the point to a mode of the PDF. The
rithm. Independently of block 515, the neighboring index neighboring index function replaces K With a smoothing
values are used in block 520 to determine a Kernel Rep parameter a, but advantageously, the neighboring index
resentative Index (KRI) for each measurement point. The function is less sensitive to changes in a.
points are then ordered according to their KRI, and one or
more optimal numbers of clusters (corresponding to differ One method for calculating the Neighboring Index func
ent resolutions) is suggested to the user. The user selects a tion is noW described With reference to FIG. 5. First, each
resolution, and in block 530, the ?nal clusters are obtained 25 measurement point is assigned an index i from 1 to N. In step
by use of a multidimensional merging process that joins the 610, a nearest neighbor array A is determined in Which
basic data groups in the order of their linking strength at is the index of the jth nearest neighbor of measure
their joint boundary, until the desired number of clusters is ment point xi. For example With 6 measurement points, A
achieved. could be:
Once the clusters have been obtained, they can be pro
vided to a geologist Who identi?es the facies that they j=1 to 6 (Eqn 12)
represent. Discussed further beloW are methods for using the 526431
clusters to provide identi?cation of neW measurement points 463152
taken elseWhere. First, hoWever, each of the blocks in FIG. 35 > II
II-
II
615243
4 is described in greater detail. 132564
In block 510, the MRGC method calculates a modi?ed 624315
boundary index using an exponential function and an unlim 231456
ited WindoW siZe, Which is alWays equal to N-1 (N is the
total number of data points). The neW index is called
Neighboring Index (NI), and it is based on the Weighted
rank of measurement point x relative to all the other mea In this example, x2s nearest neighbor is x4, and x4s nearest
surement points y: neighbor is x1. For convenience, xi is considered to be the
Nth nearest neighbor of itself. The determination of this
45
array requires on the order of N2/2 operations due to the
need to calculate the distance from each point to every other
Where x is the m"1 NN of y, With y being the nth NN to x. point. Once it is determined, a sorting algorithm can be used
The K parameter in the boundary index function is effec to identify the rank array C in Which is the rank of
tively replaced by a constant a that is greater than Zero. It is measurement point xi relative to its jth nearest neighbor.
noted that a is insensitive to the siZe of the data set and may
be set once for all log data sets. In practice, a has been In step 620, a companion array B initialiZed as folloWs:
successfully set to 10, but a range of successful values is
expected. The Weighted rank On(x) is de?ned relative to all j: 1 to 6 (Eqn 13)
the other points, so n ranges from 1 to N1. 55 123456
Unlike the limited rank function on Which is an increasing 123456
function, the Weighted rank function on is a strictly decreas 123456
ing function varying from 1 to 0, i.e., (1, 0). With respect to 123456
each of its neighbor, x has a rank On(x), n=1, 2, . . . , N1. 123456
The sum of the Weighted ranks for a given measurement 123456
point x is:

is used to track the initial position of indices in nearest


65 neighbor array A. The nearest neighbor array A is sorted a
roW at a time With B as a companion array. As values are
rearranged in A, the companion array B is rearranged in the
US 6,295,504 B1
11 12
same Way. After the contents of the roWs in A are sorted in otherWise. Each point x is directed to the nearest neighbor
the above example, the companion array B is: y that maximiZes the attraction function With a value greater
than Zero. If none of the nearest neighbor points of x has an
j: 1 to 6 (Eqn 14) attraction value greater than Zero, then x is not adhered to
any other points.
625413 In FIG. 6, step 720 determines Eqn17 for all the neigh
463152 boring points y of a measurement point x. In step 730, an
246531 adherence point (if one exists) is determined for measure
132645 ment point x by identifying the point y in the K2 nearest
524361 neighbor points of point x that maximiZes Attrx(y):
312456
Am, : Max(AllrX(y)) (Eqn 19)
yEVX

Since B[1][3]=5, this indicates that x3 is the 51 nearest


neighbor of x1. It is noted that it is desired to preserve the 15 and verifying that the maximal value is greater than Zero. To
original nearest neighbor array A, so the sorting operation is reduce the number of computations, it may be desired to
performed on a temporary copy of A and the original is left eliminate the constant neighboring index value I(x) from
undisturbed for future use. Eqn17 and modify Eqn19 to be:
In step 630, the array C is ?lled so that is the rank
of measurement point xi relative to its jth nearest neighbor.
Since
and B[k][i]
the index
givesofthetherank
jth of
nearest
i relative
neighbor
to Xk,ofthe
xi folloWing
is
relationship can be used to ?nd C: In steps 740 and 760, each point x is classi?ed into one of
the folloWing categories:
C[i]U]=B[A[i]U]][i] (Eqn15) 25 1. x is not directed to any other point (Attr 20). In this
For the above example, the rank matrix C is: case (block 750) x is free attractor, meaning that x is
the kernel point of a local maximum of the PDF.
2. x is directed to another point and at the same time one
j: 1 to 6 (Eqn 16)
or more other points are directed to x. In this case, (block
543126 770) x is a related attractor, meaning that x is on the slope
314226 surrounding the kernel point of a local maximum of the PDF.
254326 3. x is directed to another point but no other points are
451346 directed to x. In this case, (block 780) x is pending and
554316 related, meaning that x is a boundary point in a local
213516 35 minimum of the PDF.
Once all the points have been classi?ed as determined in step
790, the points are formed into attraction sets. All free and
Having determined the rank matrix C, it can be used in related attractors are considered as modes, While the pending
step 640 to easily calculate the Weighted ranks On(x) of and related points are considered as valleys (boundaries). An
Eqn7, Which are in turn used in step 650 to calculate the attraction set is de?ned to be all points that directly or
summations of Weighted ranks s(x) of Eqn8. The minimum indirectly adhere to a common mode. Attraction sets may be
and maximum summations are determined in steps 660 and considered as basic (elementary) clusters that are small
670, and then in step 680 the Neighboring Index values of natural data groups of the analyZed data set. Attraction sets
Eqn11 are calculated for each measurement point. include points in the valley Which do not attract any other
Returning momentarily to FIG. 4, once the neighboring 45 points. These are considered the boundary points of the set.
index values are calculated, they are then used in block 515 In the KNN attraction method, the parameter K2 is used
to automatically form small basic data groups by use of a in the adherence function. The variable K2 can be considered
multidimensional KNN point-to-point attraction algorithm. a smoothing parameter. The higher its value, the less basic
These attraction sets Will be used in the mode merging step the attraction set structure are. But K2 should not be too
530 for forming ?nal clusters. high, as it Will reduce local point-to-point attraction and
The basic idea of the multidimensional KNN point-to merge some structures separated by narroW and deep val
point attraction method is to attempt to associate every point leys. In this step, a small K2 is usually preferred to construct
x in the set of measurement points With an adherence point high-resolution structures. HoWever, a very small K2 should
y that maximiZes the attraction function, Attrx(y). The choice also be avoided because it Will create small, disconnected
of the point y is based on the concept of path of the highest 55 islands, that is, attraction sets Where points are attracted
gradient. A general attraction function is: among themselves Without creating any boundary points
(block 780). If no boundary points are recogniZed in an
attraction set, this attraction set Would never be merged With
other sets in the ?nal merging step (block 530). By
The Neighboring Index values of points x and y are repre experience, K2=5 generates consistent results even for high
sented by I(x) and I(y) respectively. The adherence function dimensional data sets. The preferred values for K2 are not be
Vx(y) can be any function. A useful form of the adherence less than 4 or much higher than 12, even for very large sets
function could be the exponential function: of data.
VX(y)=exp(m/[5), Where y is the mth NN of x and [5>O (Eqn18) As a comparison, in the post-processing of Gans clus
65 tering method, the clusters are recogniZed by use of a
In the preferred embodiment, the MRGC uses Vx(y)=1 if y threshold on the boundary index function, Where points With
is one of the K2 nearest neighbor points of point x, and 0 a boundary index loWer than the threshold are considered as
US 6,295,504 B1
13 14
mode points; otherwise they are considered as boundary clusters of equivalent volumes. The combination of these
points. After removing the boundary points, Gan detects the tWo factors produces a good balance betWeen the siZe and
connected components formed by mode points as clusters, the volume of a cluster, and generates consistent results (the
and then the boundary points are assigned to the cluster of change of Weights, b and c, Will change this balance). The
their nearest neighbor. Compared With Gans method, eXpe third draWback of Gans method mentioned above, that it
rience shoWs that the MRGC method using KNN point-to cannot generate consistent clusters if the data sets present
point attraction results in more consistent and stable clusters clusters of very unbalanced siZes, is thus solved.
relative to variations in the smoothing parameters. In other Once the KRI have been calculated, they may be sorted
Words, unlike the method of Gan, the MRGC method and displayed as shoWn in FIG. 8. With the help of the
generates better cluster borders and the results are less decreasingly ordered KRI curve, one can easily recogniZe
sensitive to the choice of smoothing parameter. the importance of a mode in the overall data set. There are
Referring momentarily to FIG. 4, as the K-Nearest several important drops (breaks) Which corresponds to the
Neighbor attraction for each point is being determined and changes of cluster kernels from one stable plateau to another.
natural small groups are formed in block 515, a kernel The drop points at 2, 5, 8 and 12 clusters each corresponds
representative indeX (KRI) for each point in the data set may 15
to the optimal number of clusters at different resolutions.
be determined in block 520. The KRI permits determination The drop points of the curve can be automatically
of the optimal number of clusters for the analyZed data set. detected by the peaks of the gradient (the ?rst derivative) of
For Well-separated cluster data sets that contain a signi? the decreasingly ordered KRI curve. These peaks as Well as
cant probability density difference betWeen modes and their values are provided to the user as possibilities of
valleys, the number of clusters can be easily identi?ed. But optimal cluster numbers and associated quality indeXes at
in actual application data sets, the clusters are often very different resolutions.
close together and the number of clusters is ambiguous. The The points With the highest values of KRI selected in this
optimal number of clusters becomes a function of desired Way are considered as cluster kernels and are used for
resolution and it depends on the users requirements, mode merging and forning the ?nal clusters as shoWn in
namely, at What resolution they Would like to analyZe the 25
block 530 of FIG. 4.
data. The MRGC automatic clustering method permits the Before presenting multidimensional merging, a 2-D gray
user to select the optimal number of clusters from one or level image eXample serves as a simple visualiZation aid. If
more different possibilities. Each cluster number is associ We consider the images gray-level as the third dimension,
ated With a quality indeX and is suggested in the order of its an image can be considered as a relief having mountains
probability. (light values) and seas (dark values). The merging method is
To detect the best number of clusters, the MRGC algo conceptually similar to ?ooding the seas little by little, With
rithm ?rst determines the representativity of each point in one sea merging With another in the order of their loWest
the data set. The representativity value aids in determining border levels. To merge mountains rather than seas, inverse
if a mode is a real mode of a cluster or just a local landscapes should be considered, With the merging occur
irregularity. Each point is characteriZed by hoW closely it 35
ring in order of the highest border levels. This process alloWs
us to remove the shalloWest valley and merge the tWo most
represents a cluster kernel, and the best kernel representa
tives are then selected to form clusters by merging basic probable neighboring modes into one. But the realiZation of
structures detected in block 515 (the merging process is such a process for multidimensional data is not simple,
presented further beloW). While ordering and analyZing the because the neighbor relationship among modes is not easy
function of kernel representativity, We can recogniZe some to evaluate. With the help of the K-Nearest-Neighbor
cluster kernels are much better represented than others, and concept, the attraction sets and boundary points recogniZed
the gradient change of the Kernel Representative Indexes previously in block 515, the MRGC method implements the
(KRI) can highlight these kernels. folloWing algorithm for multidimensional dot pattern mode
The neighboring indeX function by itself is inadequate as merging as shoWn in FIGS. 9A and 9B. Attraction sets
an indicator for the number of clusters present because it is 45
referred to beloW are the sets of points corresponding to the
a primarily a local indicator With little in?uence from the elementary groups found in block 515.
points outside the local region. To remedy this local effect, Step 1: For every pair of attraction sets S1 and S2 (block
the MRGC algorithm adds tWo other factors, the number of 1001), ?nd a pair of points, P1 and p2 that ?t the folloWing
neighbors and the distance at Which the neighboring indeX of four conditions:
a point loses its poWer. This is the distance at Which 1. p1S1 and PZESZ, block 1005.
another point having a higher value can be found. 2. p1 and P2 are boundary points in S1 and S2, block 1010,
Let I(X) be the neighboring indeX value of point X, and y 1015.
be the ?rst neighbor of X having and indeX value I(y)>I(X) 3. p1 is in the K-nearest-neighbors of P2, block 1020, or
as shoWn in block 810 of FIG. 7. As calculated in block 820, P2 is in the KNN of p1, block 1025. This K could take
the Kernel Representative Function can be Written as: 55 the value of 2><K2 for eXample, but should not be too
high to avoid merging non-neighbor attraction sets.
4. The distance betWeen P2 and p1, D(P2, p1), block 1030,
Where a, b, and c are the exponents used to Weight each should be the minimum among all pairs of points
corresponding function I(X), M(X,y), and D(X,y). The neigh satisfying the three previous conditions.
bor function M(X, y)=m, if y is the m1 neighbor of X, and The points, p1 and P2, may not eXist for all pairs of attraction
D(X,y) is the distance betWeen X and y. Based on testing sets. These points are considered as the shortest passage
eXperience, a=b=c=1 give good results for the tested data. In from one mode to another (S1 and S2). At the end of step 1,
this function, the factor I(X) alloWs us to recogniZe the peak We have a list of all the shortest passages that eXist betWeen
(kernel) of a mode, and M(X,y) and D(X,y), the eXtension of pairs of attraction sets.
the importance of this mode on the Whole data set. 65 Step 2: For each passage in the list of shortest passages,
The number of neighbors, M(X,y), tends to produce result determine a level L, and sort the list of shortest passages in
ing clusters of equivalent siZes and the distance, D(X,y), decreasing order (block 1033). For a given passage (P2, P1),
US 6,295,504 B1
15 16
the level is de?ned by the minimum neighboring index Large values close to one indicate high con?dence
value: identi?cation, Whereas small values close to Zero indi
L=Min (I(P1), I(p2)) (Eqn22) cate more of an extrapolation in making the identi?
cation.
Step 3: For each passage in the sorted list, blocks 1035 2. A Membership Index (MI) to indicate if x is inside or
and 1040 test Whether S1 and S2 contain a cluster kernel
outside of the cluster de?ned on the reference data set.
selected in block 520. If at least one of them does not, merge
The MI can be calculated by:
the tWo modes (block 1045). This is repeated in order of
decreasing passage levels (block 1034) until all passages
have been considered (block 1050). In other Words, the
modes having the shalloWest joint valley are merged ?rst. 10 Where k is the kernel point of the cluster c, and D is the
The clusters made in this Way are deterministic. Clusters distance function. MI is close to 1 When x is inside and close
of higher resolutions using higher numbers of cluster kernels to the cluster c. The higher the MI the farther x is outside of
are alWays subclusters of the loW-resolution clusters. This the cluster c. The MI can be used for detecting abnormal
corresponds to the hierarchical Way that the geologist orga points and/or neW facies of the application data set.
niZes the geological facies. The MRGC technique Will 15 3. An Ambiguity Index indicates if x is relatively
automatically ?nd the most easily broken clusters and pro ambiguous betWeen tWo clusters. It can be calculated
vide a solution according to the available data structure. This by:
property is important for subdividing facies based on the
already recogniZed electrofacies con?guration.
Various extensions and different application modalities
Where Z is the nearest neighbor of x in the reference data set
exist for the MRGC automatic clustering technique
after removing the cluster c.
described above. The Neighboring Index generated from the
The values of AI should be betWeen 0 and 1. The higher
MRGC method may be very helpful for geological inter
the Al, the more ambiguous xs position is betWeen tWo
pretation of log measurement data. For example, threshold
clusters.
ing the entire data set based on the NI Will identify the denser 25
Using any or all of the above quality indices Will enable
Zones of the log data set. Based on various experimental
automatic feedback to assess the applicability of the model
trials, it is knoWn that these Zones correspond to the thick
to the neW Well for each point.
beds less affected by shoulder effects, Which have good
In discussing the quality indices, it Was assumed that the
homogeneity and greater lateral extension. Hence, these data logs gathered at the neW Well site Were the same types
Zones are the most important for ?uid How and, thus, greater
as those used to create the model. This assumption may be
attention must be paid to these thick beds. They Will be
incorrect. There are many occasions Where feWer or different
useful for calibrating the electrofacies and designing/
logs are gathered, eg due to failure of one or more
optimiZing the sampling program of the core.
instruments doWnhole. Consequently, the electrofacies
Another extension of the MRGC algorithm also involves
model may need to be propagated to other Wells using a
the geological interpretation and calibration of the electro 35
reduced application data set.
facies. After the NI is normaliZed Within each cluster in the
For an electrofacies model made by N logs (and thus of
same manner as the data normaliZation described above,
then the threshold on the normaliZed NI alloWs the MRGC
N data dimensions) and an application data set With N-R
logs available, Where R is the number of logs Which are not
algorithm to recogniZe the mode Zone of each cluster. The
available in the application data set, the nearest neighbor
visualiZation of these mode Zones facilitates the geological
propagation method preferably assigns for each application
interpretation and calibration of the electrofacies, by alloW data point the electrofacies of its nearest neighbor in the
ing the geologist to concentrate on the most representative
reference data set While ignoring the R unavailable logs of
points of the facies.
the reference data set. This method respects the original
After the electrofacies model is made, it is possible to
cluster shapes and can be applied to the results of any
interpret another Well in terms of the existing electrofacies 45
clustering method. For the case of MRGC, the quality
model. In the preferred embodiment, the Nearest Neighbor
Propagation algorithm assigns to each neW measurement indexes MI (Membership Index) and Al (Ambiguity Index)
described above can also be calculated.
point the electrofacies of its nearest neighbor in the refer
Testing the process described above on a four
ence data set used to create the model. Unlike the dynamic
dimensional data set, the folloWing error percentages rela
clustering or Self-OrganiZing-Map methods Which generate
tive to a complete data set Were obtained When one dimen
a mosaic separation of the log space around the cluster
sion Was omitted.
kernels Without any consideration of the cluster shapes, the
nearest neighbor propagation algorithm retains the original
cluster shapes.
Because the neW measurement data might not ?t the 55 Removed Log RhoB Nphi GR DT
model data, quality control While gathering log data mea
Error % 26.5 18.5 34.8 16.1
surements is very important for correct model propagation.
Three quality indices may be determined for each neW
measurement point (also termed application point). The This example illustrates that Nearest Neighbor Propagation
quality indices are based on the concept of MRGC in order alloWs propagation to reduced data sets. Furthermore, for
to evaluate the result of model propagation. First, for each complete data sets containing all logs it can be used to
application point x, its nearest neighbor y belonging to the recogniZe the separability of each log for electrofacies by
cluster c in the reference model data set is determined. The removing one log at a time. For the above example, the order
propagation algorithm then calculates the folloWing infor of importance of these four logs is GR then RhoB, Nphi and
mation for each x: 65 DT. The larger the errors made after removing a particular
1. The NI of y and the cluster-normaliZed NI of y to help log, the greater the amount of information provided by the
locate the point x With respect to the cluster kernel. log.
US 6,295,504 B1
17 18
Propagation of electrofacies model for complete or forming attraction sets from the points having a shared
reduced data sets can also occur using a Most Attracting kernel point.
Nearest Neighbor (MANN) method, that is, the neighbor 7. The method of claim 6, Wherein the attraction value
having the highest value of NI in the K nearest neighbors of calculations include ?nding the products I(y) Vx(y) Where
the reference data set. Nearest Neighbor propagation is a Vx(y) is an adherence function.
special case of MANN propagation With K=1. This method 8. The method of claim 7 Wherein the adherence function
attracts application points to higher probability density Vx(y)=1 if y is one of the K-nearest-neighbors of X, other
Zones (the higher the NI, the more dense the Zone). In theory, Wise Vx(y)=0.
for the reduced data set the propagation error should be 9. The method of claim 5, further comprising:
reduced in Zones of overlaid clusters.
10 determining one or more optimal cluster numbers by
Numerous variations and modi?cations Will become
apparent to those skilled in the art once the above disclosure
calculating a Kernel Representative IndeX for each
is fully appreciated. For example, clustering techniques may measurement point in the data set.
be applied in ?elds such as economic analysis, image 10. The method of claim 9 Wherein the Kernel Represen
analysis, quality control for manufacturing and statistical tative IndeX is determined by determining for each
analyses of other kinds. It is intended that the folloWing 15 point X in the data set, the nearest neighbor y of X satisfying
claims be interpreted to embrace all such variations and I(y)>I(X); and
modi?cations. calculating F(X)=I(X)*Mb(X,y)*DC(X,y) in Which M(X,y)
What is claimed is: is a rank function that equals m When y is the mth
1. A method of de?ning the electrofacies of a geological neighbor of X, D(X,y) is the distance betWeen X and y,
formation traversed by a bore hole comprising: and a, b, c are predetermined constants.
moving a sonde through a plurality of positions in said 11. The method of claim 9 Wherein the optimal cluster
bore hole and recording a data set having a number d number corresponds to a sharp drop in the Kernel Repre
of measurements taken by the sonde at each of the sentative IndeX
plurality of positions, Wherein the d measurements at 12. The method of claim 9, further comprising:
each position are associable With a point in 25 performing mode merging of the attraction sets to form
d-dimensional space; and clusters of measurement points, each cluster de?ning
calculating a neighboring indeX I of each measurement an electrofacies.
point. 13. The method of claim 12, Wherein the clusters of
2. The method of claim 1, Wherein said calculating a measurement points are formed by:
neighboring indeX includes:
determining for each given point X said given points for each a pair of attraction sets S1 and S2 from all
nearest neighbor ranks m relative to all other measure attraction sets, identifying a pair of points p1 and P2
ment points in the data set; belonging to S1 and S2, respectively, that satisfy the
calculating the Weighted ranks, On(X), of said given point conditions:
using the nearest neighbor ranks m; and (a) p1 and P2 are boundary points;
35
determining for each given point X the summation s(X) of (b) p1 is in K-nearest-neighbors of P2 or P2 is in
all the Weighted ranks On(X) for said given point. K-nearest-neighbors of p1; and
3. The method of claim 2, Wherein said calculating a (c) the distance D(p2,p1) betWeen p1 and P2 is minimum
neighboring indeX further includes: among all pairs of points satisfying conditions (a)
and
determining from the summations s(X) for each measure
14. The method of claim 13, further comprising:
ment point X a minimum summation value Smin;
determining from the summations s(X) for each measure
calculating L=Min(I(p1), I(p2)) for each pair of points
ment point X a maXimum summation value Smax; and
satisfying conditions (a), (b) and (c); and
storing the values (p 1, p2) Wherein into a list in decreasing
calculating for each point X the neighboring indeX I(X) order With respect to L.
corresponding to 45
15. The method of claim 14, further comprising:
traversing the list in decreasing order While merging sets
Smax _ 5min
corresponding to points p1 and P2 if the sets do not both
contain previously selected cluster kernels.
16. The method of claim 12, further comprising:
4. The method of claim 2 Wherein the Weighted rank of the correlating the electrofacies With the geological forma
point is On(X)=6Xp(II1/a). tions traversed by the bore hole.
5. The method of claim 1, further comprising: 17. The method of claim 16, further comprising:
determining attraction sets that are disjoint sets of mea prior to said calculating the neighboring indeX, selecting
surement points. 55 the recorded measurements points that are stable over
6. The method of claim 5, Wherein the attraction sets are consecutive levels; and
determined by: after said correlating, comparing a recorded measurement
calculating for each measurement point X in the data set point not selected in said selecting step to the clusters
a set of attraction values Attrx(y) Where y ranges over of selected measurement points to predict the facies of
the set of the nearest neighbors of X; the geological formation at the bore hole positions
calculating the maXimum Attrx(y) for each point X; corresponding to the unselected recorded measurement
for those points X having a maXimum Attrx(y) greater than points.
Zero, determining a directed connection from point X to 18. The method of claim 17, further comprising:
the point y that maXimiZes Attrx(y); producing a graph of the electrofacies of the geological
using the directed connections to categoriZe each mea formation as a function of the depth of the bore hole.
surement point X in the set as a kernel point of a mode, 19. An apparatus for performing automatic clustering,
slope point of a mode, or boundary point of a mode; and comprising:
US 6,295,504 B1
19 20
a memory unit con?gured to store log measurement points neighbor of X, D(X,y) is the distance betWeen X and y,
in d-dimensional space; and and a, b, c are predetermined constants.
a processing unit con?gured to retrieve the log measure 26. The apparatus of claim 19 Wherein the optimal cluster
ment points from the memory unit, and con?gured to number corresponds to a sharp drop in the Kernel Repre
calculates a neighboring indeX I of each log measure sentative IndeX
ment point. 27. The apparatus of claim 19, Wherein said processing
20. The apparatus of claim 19, Wherein the neighboring unit is further con?gured to perform mode merging of the
indeX corresponding to each log measurement point is attraction sets to form clusters of log measurement points,
calculated by: each cluster characteriZing an electrofacies.
determining for each given point X said given points 28. The apparatus of claim 27, Wherein the clusters of log
nearest neighbor ranks m relative to said given points measurement points are formed by:
N-nearest-neighbors; for each a pair of attraction sets S1 and S2 from all
calculating the Weighted ranks, On(X), of said given point attraction sets, identifying a pair of points p1 and P2
using the nearest neighbor ranks m; and belonging to S1 and S2, respectively, that satisfy the
15 conditions:
determining for said given point the summation s(X) of the
Weighted ranks of said given point. (a) p1 and P2 are boundary points;
21. The apparatus of method of claim 20, Wherein the (b) p1 is in K-nearest-neighbors of P2 or P2 is in
neighboring indeX corresponding to each log measurement K-nearest-neighbors of p1; and
point is calculated by: (c) the distance D(p2,p1) betWeen p1 and P2 is minimum
among all pairs of points satisfying conditions (a)
determining for each given point X said given points
nearest neighbor ranks m relative to said given points 29. The
andapparatus of claim 28, Wherein the clusters of log
N-nearest-neighbors; measurement points are further formed by:
calculating the Weighted ranks, On(X), of said given point calculating L=Min(I(p1), I(p2)) for each pair of points
using the nearest neighbor ranks m; and 25
satisfying conditions (a), (b) and (c); and
determining for said given point the summation s(X) of the
Weighted ranks of said given point; storing the values (p1,p2) Wherein into a list in decreasing
order With respect to L.
determining the minimum value Smax of s over each
30. The apparatus of claim 29, Wherein the clusters of log
log measurement point in the set; measurement points are further formed by:
determining the maXimum value Smwc of s(X) over each traversing the list in decreasing order While merging sets
log measurement point in the set; and corresponding to points p1 and P2 if the sets do not both
calculating the neighboring indeX I(X) for each point in contain a previously selected kernel points.
the set so that 31. The apparatus of claim 27, Wherein the processing unit
35 is further con?gured to correlate the electrofacies With the
5(X) 5min facies traversed by the bore hole.
max 5min
32. Amethod of de?ning the electrofacies of a geological
formation traversed by a bore hole comprising:
22. The apparatus of claim 19 Wherein said processing moving a sonde through a plurality of positions in said
unit is further con?gured to determine attraction sets, the bore hole and recording a data set having a number d
attraction sets containing log measurement points. of log measurements taken by the sonde at each of the
23. The apparatus of claim 22, Wherein the attraction sets predetermined levels, Wherein the d log measurements
are determined by: at each position are associable With a point in
calculating for each log measurement point p in the data d-dimensional space;
set a set of attraction values Attrx(y)Where y ranges 45 calculating a neighboring indeX I of each log measure
over the set of the nearest neighbors of X; ment points; and
calculating the maXimum Attrx(y) for each point X; determining an optimal cluster number by calculating a
for those points X having a maXimum Attrx(y) greater than Kernel Representative IndeX for each log mea
Zero, determining a directed connection from point X to surement point.
the point y that maXimiZes Attrx(y); 33. The method of claim 32, Wherein the Kernel Repre
using the directed connections to identify those Which log sentative IndeX is determined by:
measurement points serve as a kernel points; and determining for each point X in the data set, the nearest
for each kernel point, forming an attraction set that neighbor y of X that satis?es I(y)>I(X); and
includes the given kernel point and all those points 55 calculating F(X)=I(X)*Mb(X, y)*DC(X, y) in Which M(X,
having directed connections that lead to the given y) is a rank function that equals m When y is the mth
kernel point. neighbor of X, D(X,y) is the distance betWeen X and y,
24. The apparatus of claim 19, Wherein said processing and a, b, c are predetermined constants.
unit is further con?gured to determine an optimal cluster 34. The method of claim 32 Wherein the optimal cluster
number
for each by
logcalculating
measurementa Kernel
point. Representative IndeX number corresponds to a sharp drop in Kernel Representa
tive IndeX
25. The apparatus of claim 24, Wherein the Kernel Rep 35. An apparatus for performing automatic clustering,
resentative IndeX is determined by: comprising:
calculating for each point X in the data set the nearest a memory unit con?gured to store measurement points in
neighbor y of X satisfying I(y)>I(X); and d dimensional space;
calculating F(X)=I(X)*Mb(X, y)*DC(X, y) in Which M(X, a processing unit con?gured to retrieve the measurement
y) is a rank function that equals m When y is the mth points from the memory unit; and
US 6,295,504 B1
21 22
wherein said processing unit calculates a neighboring 41. The apparatus of claim 40, Wherein the Kernel Rep
index I of each measurement point. resentative IndeX is determined by:
36. The apparatus of claim 35, Wherein the neighboring
index corresponding to each measurement point is calcu calculating for each point X in the data set, the nearest
lated by: neighbor y of X satisfying I(y)>I(X); and
determining for each given point X said given points calculating F(X)=I(X)*Mb(X,y)*DC(X,y) in Which M(X,y)
nearest neighbor ranks m relative to all other points; is a rank function that equals m When y is the mth
calculating the Weighted ranks, On(X), of said given point neighbor of X, D(X,y) is the distance betWeen X and y,
using the nearest neighbor ranks m; and and a, b, c are predetermined constants.
determining for said given point the summation s(X) of the 10 42. The apparatus of claim 40 Wherein the optimal cluster
Weighted ranks for said given point. numbers correspond to sharp drops in the Kernel Represen
37. The apparatus of method of claim 36, Wherein the
neighboring indeX corresponding to each measurement point tative IndeX
is further calculated by: 43. The apparatus of claim 40, Wherein the processing unit
determining the minimum value Smin of s(X) over each 15
is further con?gured to perform mode merging of the
measurement point in the set; attraction sets to form clusters of measurement points, each
determining the maXimum value Smwc of s(X) over each cluster characteriZing a prototype.
measurement point in the set; and 44. The apparatus of claim 43, Wherein the clusters of
calculating the neighboring indeX I(X) for each point in measurement points are formed by:
the set by for each a pair of attraction sets S1 and S2 from all
attraction sets, identifying a pair of points p1 and P2
5(X) 5min
belonging to S1 and S2, respectively, that satisfy the
max 5min
conditions:
25 (a) p1 and P2 are boundary points;
38. The apparatus of claim 35, Wherein said processing (b) p1 is in K-nearest-neighbors of P2 or P2 is in
unit is further con?gured to determine attraction sets that are K-nearest-neighbors of p1; and
disjoint sets of measurement points. (c) the distance D(p2,p1) betWeen p1 and P2 is minimum
39. The apparatus of claim 38, Wherein the attraction sets
are determined by:
among all pairs of points satisfying conditions (a)
calculating for each measurement point X in the data set
a set of attraction values Attrx(y) Where y ranges over
45. The
and apparatus of claim 44, further comprising:
the set of the nearest neighbors of X; calculating L=Min(I(p1), I(p2) for each pair of points
calculating the maXimum Attrx(y) for each point X; satisfying conditions (a), (b) and (c); and
for those points X having a maXimum Attrx(y) greater than 35 storing the values (p1,p2) Wherein into a list in decreasing
Zero, directing point X to the point y that maXimiZes order With respect to L.
AWAY); 46. The apparatus of claim 45, further comprising:
using the directions to identify points that serve as a
traversing the list in decreasing order While merging sets
kernel point of a mode; and
corresponding to points p1 and P2 if the sets do not both
for each kernel point, forming an attraction set that
includes the given kernel point and all those points contain a previously selected kernel points.
directed to the kernel point. 47. The apparatus of claim 43, Wherein the processing unit
40. The apparatus of claim 38, Wherein the processing unit is further con?gured to correlate the prototype With neW
is further con?gured to determine one or more optimal measurement points.
cluster numbers by calculating a Kernel Representative 45
IndeX for each measurement point in the data set.