2 vues

Transféré par Elok Galih Karuniawati

This article about multi-resolution graph-based clustering analysis patent that I just reupload the document

- TowardsaBusinessModelTaxonomyofStartupsintheFinanceSec
- A Fast and Flexible Clustering Algorithm Using Binary Discretization
- A Comparison Study Between Various Fuzzy Clustering Algorithms
- An Introduction to Cluster Analysis for Data Mining
- Depositional and Provenance Record of the Paleogene Transition From Foreland
- kmeans
- Cluster
- Fuzzy Modeling
- Ppt for Seminar 1
- [IJCST-V4I4P29]:M.Arthi,M.Jayashree
- ikmeans_sdm_workshop03
- Proposal of KMSTME Data Mining Clustering Method for Prolonging Life of Wireless Sensor Networks
- IJETR031830
- Paper Pyroclastic Flow
- ITJDM02
- asroa
- Significance of Classification Techniques in Prediction of Learning Disabilities
- Finding Prominent Features In
- 15-53 Data Warehousing and Data Mining
- Data Warehouse Unit 5

Vous êtes sur la page 1sur 20

Ye et al. (45) Date of Patent: Sep. 25, 2001

CLUSTERING Stephen P. Smith, IEEE Transactions On Pattern Analysis

And Machine Intelligence, vol. 15, No. 1, Jan. 1993, pp.

(75) Inventors: Shin-Ju Ye, Spring, TX (US); Phillipe 8992.

J. Y. M. Rabiller, Lescar (FR) Fast Parzen Density Estimation Using ClusteringBased

Brand And Bound, ByeungWoo Jeon et al., IEEE Transac

(73) Assignees: Halliburton Energy Services, Inc., tions On Pattern Analysis And Machine Intelligence, vol. 16,

Houston, TX (US); Elf Exploration No. 9, Sep. 1994, pp. 950954.

Production (FR) Motion Estimation Via Clustering Matching, Dane P. Kottke

et al., IEEE Transactions On Pattern Analysis And Machine

(*) Notice: Subject to any disclaimer, the term of this Intelligence, vol. 16, No. 11, Nov. 1994, pp. 11281132.

patent is extended or adjusted under 35

U.S.C. 154(b) by 0 days. (List continued on next page.)

Primary ExaminerDonald E. McElheny, Jr.

(21) Appl. No.: 09/586,129 (74) Attorney, Agent, or FirmConley, Rose & Tayon, RC.

(22) Filed: Jun. 2, 2000 (57) ABSTRACT

Related U.S. Application Data An apparatus and method for obtaining facies of geological

(60) Provisional application No. 60/161,335, ?led on Oct. 25, formations for identifying mineral deposits is disclosed.

1999. Logging instruments are moved in a bore hole to produce log

measurements at successive levels of the bore hole. The set

(51) Int. Cl.7 ..................................................... .. G01V 3/38

of measurements at each such level of the bore hole interval

(52) U.S. Cl. .................................................. .. 702/7, 702/11 is associated With reference sample points Within a multi

(58) Field of Search .................................. .. 702/6, 78, 11, dimensional space. The multidimensional scatter of sample

702/12, 13, 367/72, 73; 324/338, 339 points thus obtained is analyzed to determine a set of

characteristic modes. The sample points associated With

(56) References Cited characteristic modes are grouped to identify clusters. A

U.S. PATENT DOCUMENTS facies is designated for each of the clusters and a graphic

representation of the succession of facies as a function of the

4,646,240 2/1987 Serra et al. ........................ .. 364/422 depth is thus obtained. To identify the clusters, a neigh

FOREIGN PATENT DOCUMENTS boring index of each log measurement point in the data set

is calculated. Next, small natural groups of points are

2215891 9/1989 (GB) . formed based on the use of the neighboring index to deter

OTHER PUBLICATIONS mine a K-Nearest-Neighbor (KNN) attraction for each point.

Independently of the natural group formation, an optimal

Clustering Without A Metric, Geoffrey MattheWs et al., number of clusters is calculated based on a Kernel Repre

IEEE Transactions On Pattern Analysis And Machine Intel sentative Index (KRI) and based on a user-speci?ed resolu

ligence, vol. 13., No. 2, Feb. 1991, pp. 175184. tion. Lastly, based on the data calculated from the prior

Robust Clustering With Applications In Computer Wsion, steps, ?nal clusters are formed by merging the smaller

JeanMichael Jolion et al., IEEE Transactions On Pattern clusters.

Analysis And Machine Intelligence, vol. 13, No. 8, Aug.

1991, pp. 791802. 47 Claims, 7 Drawing Sheets

510

Determine Neighbor

Index (Local Density of

Data Distribution)

575 520

Determine Kernel

Determine KNN

Representative Index

Attraction (Form natural

small groups) (Determine Optimal

Cluster Number)

530

Perform Merging to

Form Final Clusters

US 6,295,504 B1

Page 2

A Least Biased Fuzzy Clustering Method, Gerardo Beni, et Stratigraphy, Oberto Serra et al., SPE 9270, Society of

al., IEEE Transactions On Pattern Analysis And Machine Petroleum Engineers of AIME, Sep. 1980, pp. 119.

Intelligence, vol. 16, No. 9, Sep. 1994, pp. 954960.

Fast NearestNeighbor Search In Dissimilarity Spaces, BoreholeAcousticsAnd LoggingAnd Reservoir Delineation

Andras Farago, et al., IEEE Transactions On Pattern Analy Consortia, Annual Report 1999, Earth Resources Labora

sis And Machine Intelligence, vol. 15, No. 9, Sep. 1993, pp. tory, pp. 125.

957962.

Application Of Kruskal Multidimensional Scaling (MDS) To FaciologAutomatic Electrofacies Determination, M.

Rock Type Identi?cation From Well Logs, Vaclav Matsas et Wolff et al., SPWLA TWentyThird Annual Logging Syrn

al., The Log Analyst, Jan.Feb. 1995, pp. 2834. posiurn, Jul. 69, 1982, pp. 123.

Automatic Determination Of Lithology From Well Logs,

Pierre Del?ner et al., SPE Formation Evaluation, Sep. 1997, Facies Model, Second Edition, Roger G. Walker, Geo

pp. 303310. science Canada, Reprint Series 1, McMaster University,

Synergetic Log & Core Data Treatment: A Methodology To May, 1984, pp. 16 p.

Improve ReservoirDescription Through C lusterAnalysis, C.

DescalZi, et al., SPE #17637, Society of Petroleum Engi The SelfOrganizing Map, Teuvo Kohonen, Proceedings of

neers, Nov. 1988, pp. 113. the IEEE, vol. 78, No. 9, Sep. 1990, pp. 14641480.

U.S. Patent Sep. 25,2001 Sheet 2 of7 US 6,295,504 B1

Nphi

-0.1 0 0.1 0.2 0.3 0.4 0.5

1.7

Q

0

1.9

2.1 a

2.3

RhoB

2.5

2.7

2.9 Electrofacies1

I Electrofacies2

3.1 I

Fig. 3

' 5lO

Flg- 4 Determine Neighbor

Index (Local Density of

Data Distribution)

575 520

Determine Kernel

Determine KNN

Representative Index

Attraction (Form natural

small groups) (Determine Optimal

Cluster Number)

550

Perform Merging to

Form Final Clusters

U.S. Patent Sep. 25, 2001 Sheet 3 0f 7 US 6,295,504 B1

670/

@

Initialize NxN array A to

contain indexes of N-1 nearest

neighbors for all points.

i

Sort vector A[i] for each point Xi,

620/ i=1 to N, in the data set.

Results of sorted index order of

vector A, j, stored in NxN array B.

l

Calculate value of m of point Xi

and of its jth neighbor y,

Determine on(x)=exp(-mlot)

640/ where on is user de?ned

parameter

l

N-1

650/

Determine s(x)= Z 6n(x)

Have all

points been

processed

?

660

\

Determine

Smin= Min{s(xi)}

i=1,N

, l

Have all

Calculate NlF l(x): Determine

points been

processed I(X) : :(X) ' 5min =

max- 3 mm

Smax= Max{s(xi)}

|=1,N

680) 670)

Fig.

U.S. Patent Sep. 25,2001 Sheet 4 0f 7 US 6,295,504 B1

Fig. 6 @

For each nearest neighbor q of a point p, determine

attraction function Attrp(q)=l(q)Vp(q), where Vp(q)=1, \ 720

if q is in the KNN of p. Vp(q)=O othen/vise.

l

Determine Attraction of point p by

Attfp=(I7\/1?/X(Attfp(q))

p

\ 730

Have all

points in the

dataset been

processed

?

740 K750

P is free attractor,

Is p not attracted Yes and kernel of a

by any other point? mode, local

maximum of PDF

/ 770

P is a related

2:83 p fatttrfact aphy Yes attractor. P is

o wioFrom fem 7 e

e data set.

a ofslope point

a mode

does not attract any other point. P \

. . 780

IS a boundary or valley point.

Have we

unsetlect a.n d categorized each Yes F

.68 egonze point in total num. row.

point from total of data points N as attractlon

Set of. po?ts of boundary or mode Sets

3'29 ' point?

U.S. Patent Sep. 25,2001 Sheet 5 0f 7 US 6,295,504 B1

Fig. 7 Cg

For each point x in the set of all points of size N, determine

y, ?rst neighbor of x, satisfying l(y)>l(x). l(x) and l(y) are \8 7 O

the Neighbor Indexes of point x and y respectively.

l

Determine Kernel Representative Index F(x):

F(X)=Ia(><)*Mb(X,y)*D(X,y) \820

M(x,y)=m in which y is the mth neighbor of x, and

D(x,y) is the distance between x and y.

0.8 -

0.7 -

0.6 -

KRI

5 Clusters

0.5

0.4

8 Clusters

0.3 -

12 Clusters

0.2 '

13 5 7 911131517192123252729

Kernels in decreasing order of their KRI

U.S. Patent Sep. 25,2001 Sheet 6 0f 7 US 6,295,504 B1

Have all

possible pairs

g) [7007

Select a pair of attraction sets S1 and S2 among

of attraction

sets been all attraction sets determined previously.

processed l

'7

Select a pair of points p1 and p2 \ 7 005

' belonging to S1 and 82 respectively.

707O

No Is p1 a

boundary point

in S1?

Yes

7075

No Is p2 a

boundary point

in S2?

Yes

7 020

ls p1 in the

KNN of p2? Yes

No

7 025

No ls p2 in the

KNN of p1?

Yes

A l 7030

Is the distance

Yes /\ No between p1 and p2,

Have all pairs No D(p2,p1), among all Yes

of points been pairs of points,

processed? satisfying all the

above conditions,

minimum?

Save the values

(p1,p2,L) where

- |-=Min(|(P1),l(P2))

Flg. 7032f intoa list.

U.S. Patent Sep. 25,2001 Sheet 7 0f 7 US 6,295,504 B1

Q)

Sort the list in decreasing \ 7035

order with respect to L.

i

In decreasing order of L, for all pair of

points (p1, p2) of the list determine:

7035

Does the set S1

corresponding to

point p1 contain a

previously selected

cluster kernel?

1040

corresponding to

point p2 contain a

previously selected

cluster kernel?

No

7

7050

Have all (p1,p2)

in the list been

processed?

Fig. 9B

US 6,295,504 B1

1 2

MULTI-RESOLUTION GRAPH-BASED injected near the drilling tool. It is not normally cost

CLUSTERING effective to identify facies using these methods. Information

on geological formations traversed by a bore hole is more

This application claims bene?t to Us. Provisional commonly gathered by a measurement sonde passing

60/161,335 ?led Oct. 25, 1999. through the bore hole. The gathered information as a func

tion of the sondes position along the bore hole is then stored

BACKGROUND OF THE INVENTION or logged.

Many doWnhole measurement techniques have been used

1. Field of the Invention in the past, including passive measurements such as mea

The present invention generally relates to the geological suring the natural emission of gamma rays; and active

study of earth formations for the location and exploitation of measurements such as emitting some form of energy into the

mineral deposits using electrofacies analysis. More formation and measuring the response. Common active

particularly, the present invention relates to a neW system measurements include using acoustic Waves, electromag

and method for identifying formations of mineral deposits netic Waves, electric currents, and nuclear particles. The

using a user-friendly and reliable clustering technique that 15

sonde measurements are designed to re?ect the distinguish

can extract natural clusters from sets of logged data points ing characteristics of the rock facies. Multiple logs and

for improved electrofacies analysis of the formation. sondes may be used to gather the measurements, Which are

then correlated and standardiZed so as to furnish measure

2. Description of the Related Art

ments at discrete levels separated by equal depth intervals.

Mineral and hydrocarbon prospecting is based upon the The measurement standardiZation alloWs the automation of

geological study and observation of formations of the earths data interpretation in order to obtain estimates of the poros

crust. Correlations have long been established betWeen ity of the rocks encountered, the pore volume occupied by

geological phenomena and the formation of mineral and hydrocarbons, and the ease of How of hydrocarbons out of

hydrocarbon deposits that are suf?ciently dense to make the reservoirs in the case of petroleum prospecting. The set

their exploitation economically pro?table. of measured formation characteristics values that distinguish

25

The study of rock and soil facies encountered While the strata in a given bore hole is herein termed the electro

prospecting for minerals takes on particular importance. As facies.

used herein, a facies is an assemblage of characteristics that Interpretation studies have demonstrated a strong corre

distinguish a rock or strati?ed body from others. A facies lation betWeen the electrofacies and lithofacies, thereby

results from the physical, chemical and biological conditions making it possible to identify With con?dence the compo

involved in the formation of a rock With respect to other sitional characteristics of the rocks traversed by bore holes

rocks or soil. This set of characteristics provides information based on the sonde measurements. It has been established

on the origin of the deposits, their distribution channels and that the sets of log measurements (i.e. sample points) Which

the environment Within Which they Were produced. For correspond to a given lithofacies form a cluster in data

example, sedimentary deposits can be classi?ed according to space. That is, When the measured characteristic values of

35

their location (continental, shoreline or marine), according a formation are graphed, the points generally fall into a

to their origin (?uviatile, lacustrine, eolian) and according to continuous region Which is distinguishable from the regions

the environment Within Which they occurred (estuaries, Where points for other formations Would fall.

deltas, marshes, etc.). This information in turn makes it Various systems and method that use the correlation or the

possible to detect, for example, Zones in Which the prob observed clusters to identify lithofacies from electrofacies

ability of hydrocarbon accumulation is high. have been created. These systems take the logged measure

The set of characteristics used to de?ne a facies depends ments and convert them to a graph that furnishes, as a

on the situation. For example, a lithofacies may be de?ned function of position along the bore hole, an image of a

by the rocks petrographic and petrophysical characteristics. succession of facies. The graph typically also provides some

These are the composition, texture and structure of the rock. 45 indication of the measured formation characteristic values

Examples of mineral composition are silicate, carbonate, alongside the image. An example of one such system and its

evaporite, and so on. A rocks texture is determined by its output is described in US. patent application No. 4646240,

grain siZe, sorting, morphology, degree of compaction, and Which is hereby incorporated herein by reference. HoWever,

degree of cementation. The rock structure includes the before these systems can do the conversion, they must be

thickness of beds, their alternation, presence of stones, tailored to the drilling region.

lenses, fractures, degree of parallelism of laminations, etc. The most accurate existing systems and methods require

All of these parameters are related to the macroscopic a substantial amount of user participation to set up, and

appearance of the rock. conversely, those existing systems Which are highly auto

For extraction of hydrocarbons from geologic formations, mated tend to perform poorly. One proven approach uses a

the particularly desirable characteristics of the lithofacies are 55 tWo-step methodology to correlate different measured char

the porosity of the reservoir rocks and their permeability, as acteristic values into generaliZed electrofacies charts for

Well as the fraction of the pore volume occupied by these analysis. In the ?rst step, the number of clusters is speci?ed

hydrocarbons. These aid in estimating the nature, quantity, to an automatic clustering algorithm such as maximum

and producibility of the hydrocarbons contained in such likelihood algorithm, hierarchical clustering method,

strata. dynamic clustering or neural netWork. The number of clus

There are various sources of information on formation ters speci?ed is large, creating clusters containing small

lithofacies. Information may be gathered from subsurface numbers of points. Apetrophysicist or geologist then manu

observations such as, for example, by the study of core ally assigns geological characteristics from the facies to each

samples taken from rock formations during the drilling of a cluster and simultaneously merges similar small clusters into

bore hole for an oil Well. Such information can also be 65 electrofacies.

provided by drill cuttings sent up to the surface from the Another approach for creation of electrofacies charts

bottom of a Well by means of a ?uid (generally drilling mud) requires that the number of clusters speci?ed to the auto

US 6,295,504 B1

3 4

matic clustering algorithm be relatively small. In this BRIEF DESCRIPTION OF THE DRAWINGS

approach, the geologist often has a problem assigning spe

ci?c geological facies to the clusters, Which tend to be much For a more detailed description of the preferred embodi

larger than the clusters in the previous approach. The geolo ments of the present invention, reference Will noW be made

gist may also be required to lump together geological to the accompanying draWings, Wherein:

facies at a coarser level of distinction than might be appro FIG. 1 illustrates logging equipment in operation in a bore

priate. A large number of clusters require much Work by the hole;

geologist to match clusters to geology; too feW clusters FIG. 2 illustrates a computer system used to process the

cause the geologist problems in making meaningful linkages logged data and determine mineral compositions of the earth

betWeen clusters and geology. 10 formation;

The electrofacies analysis systems described above suffer FIG. 3 illustrates the scattering of points representative of

from various limitations and draWbacks. The automatic the values of tWo characteristic parameters of the formations

clustering methods require the user to provide an initial measured Within a given depth interval;

number of clusters before processing. This is a limitation FIG. 4 is a How diagram shoWing the steps for Multi

because the results are very sensitive to this parameter. 15

Resolution Graph Based automatic clustering;

Furthermore, unless the number is large, the identi?ed

clusters may have shapes that are not geologically mean FIG. 5 is a How diagram shoWing the steps to determine

ingful. This prevents them from being directly used for Neighboring IndeX Function;

facies analysis. On the other hand, manual merging of a FIG. 6 is a How diagram shoWing the steps to determine

large number of small clusters based on similar geological K-Nearest-Neighbor Attraction;

characteristics by hand makes this process sloW and subjec FIG. 7 is a How diagram shoWing the steps for determin

tive. Furthermore, because electrofacies analysis occurs in ing the Kernel Representative IndeX;

N-dimensional space it is still dif?cult even for a trained FIG. 8 is a curve of the Kernel Representative IndeX in

individual With good visualiZation tools to identify clusters decreasing order Which may be used to determine the

manually. Thus, it is desirable to develop a system and 25 optimal number of clusters at different resolutions; and

method that, in a relatively constant, reliable, and systematic FIGS. 9A and 9B shoW the steps for performing merging

manner, permits automatic clustering of logged data to to form ?nal clusters.

eXtract information about the geological facies of the data.

While the invention is susceptible to various modi?ca

SUMMARY OF THE INVENTION tions and alternative forms, speci?c embodiments thereof

Accordingly, there is disclosed herein a method for iden are shoWn by Way of eXample in the draWings and Will

ti?ng formations of mineral deposits. In one embodiment of herein be described in detail. It should be understood,

this method, logs are made over multiple levels Within an hoWever, that the draWings and detailed description thereto

interval along the bore hole in order to obtain a group of are not intended to limit the invention to the particular form

several measurements for each of these levels. With each 35

disclosed, but on the contrary, the intention is to cover all

such level of the bore hole interval is associated a sample modi?cations, equivalents and alternatives falling Within the

point Within a multidimensional space de?ned by the dif spirit and scope of the present invention as de?ned by the

ferent logs. The sample point coordinates are a function of appended claims.

the logging values measured at this level. The sample points

DETAILED DESCRIPTION OF PREFERRED

thus obtained Will form a scatter diagram Within this mul

EMBODIMENT

tidimensional space.

The sample points of this scatter diagram are used to FIG. 1 shoWs logging equipment in a bore hole 10 going

determine a set of characteristic modes, each corresponding through earth formations 12. The equipment includes a

to a Zone of maXimum density in the distribution of these sonde 16 is suspended in the bore hole 10 at the end of a

samples; each mode is regarded as a characteristic of a 45 cable 18. Cable 18 connects sonde 16 both mechanically and

respective cluster and the surrounding samples of this cluster electrically, by means of a pulley 19 on the surface, to a

are related to it. Afacies is designated for each of the modes control installation 20 equipped With a Winch 21 around

thus characteriZed and a graphic representation is produced Which the cable 18 is Wound. The control installation

as a function of the depth of the succession of facies thus includes recording and processing equipment knoWn in the

obtained. The characteristic modes of each cluster are made art that make it possible to produce graphic representations

up of sample points coming from the measurements them called logs of the measurements obtained by the sonde 16

selves. according to the depth of the sonde in the bore hole or Well

To identify the clusters, a neighboring indeX of each log 10.

measurement point in the data set is calculated. NeXt, small The bore hole 10 passes through a series of earth forma

natural groups of points (called attraction sets) are formed 55 tions (not speci?cally shoWn) that is typically composed of

based on the use of the neighboring indeX to determine the a series of Zones or beds. The Zones are identi?ed by the

K-Nearest-Neighbor (KNN) attraction for each point. Inde rock facies they contain, e.g. clay, limestone, etc. From the

pendently of the natural group formation, the optimal num geological vieWpoint, each of these successive Zones is

ber of clusters is calculated based on the Kernel Represen characteriZed by a relative homogeneity that is revealed by

tative IndeX (KRI) and a user-speci?ed resolution. Lastly, a set of characteristic data values (facies). These values vary

based on the data calculated from the prior steps, ?nal from one Zone to another, but have a relatively limited range

clusters are formed by merging the attraction sets. of variation Within a given Zone. These data, Which depend

Experimentation con?rms that the above method alloWs in particular on the mineralogical composition, the teXture

an accurate determination of the geological facies derived and the structure of the rocks making up these Zones,

from the logging measurements obtained Within an interval 65 identify respective facies.

of geological formations traversed by a sonde traveling in a It is possible to establish a correspondence betWeen, on

bore hole. the one hand, different facies characteriZed by mineralogical

US 6,295,504 B1

5 6

factors, texture and structure and, on the other hand, elec tools, among others. FIG. 3 demonstrates in tWo dimensions

trofacies Which can be obtained directly from a suitable the results of the automatic clustering technique described

quantitative analysis of a set of logs measured by the sonde beloW. Although only 2 dimensions are shoWn, the technique

as it traverses the bore hole. The possibility of establishing is generally applied to N-dimensions for N log measurement

such a correspondence betWeen electrofacies and facies is properties (4-dimensions Were used to obtain the clusters

capable of providing a valuable aid in the geological knoWl shoWn in FIG. 3).

edge of a Zone of the earths crust Within a given region,

Once a set of measurement points has been obtained, it is

such knoWledge being useful in completing the information

usually available to geologists and, in certain cases, helping desired to partition the points. Initially, this partitioning is

them in the interpretation of the facies encountered to obtain achieved by gathering the points into clusters using a cluster

information on the history of the formations and for deter

10 identi?cation algorithm. Many such algorithms exist, and

mining the concentrations of mineral deposits. they almost universally require that the data be normaliZed.

An approximate image of the facies may be produced by There are several Ways to normaliZe the data. One clas

?rst obtaining a number of logs over a bore hole interval sical method, frequently used, is to limit the data in an unit

H1H2. The measured values are discretiZed and correlated hypercube [0,1]i, d being the number of features, Which

15 corresponds to the number of dimensions. In each

in depth so as to have, for each level of the interval

considered, a set of distinct log values. In one embodiment,

dimension, that dimensions minimal value is subtracted

the measurements are discretiZed into levels at 15 cm from the data, and the difference is divided by the total range

intervals. The set of log values thus obtained is analyZed to of the data in that dimension. In another method, the average

determine groups of consecutive levels that have log values value of each feature is subtracted from the data, and the

Within a given range. The upper and loWer values of the difference divided by the standard deviation. The normal

range are based on the potential variations that may be iZation changes the distance betWeen data points and affect

caused by, eg bore hole conditions such as roughness or the natural separation of data points, but it is necessary to

caving of the bore hole Walls. Those consecutive levels prevent an improper choice of scale in one dimension from

having log values in the given range may be considered to dominating the measurements in other dimensions.

25

share a true physical characteristic value that is substan The clustering algorithms can be divided into parametric

tially constant over those levels. In this manner, those Zones algorithms, and non-parametric algorithms. Parametric algo

having relatively constant values may be portrayed as facies rithms are generally regarded as being less desirable than

having the indicated measurement values. non-parametric algorithms because parametric algorithms

To generate an image of the facies using the logged data are based on some model of the data, Whereas non

points, processing of the logged data measured by the sonde parametric algorithms make no assumptions about the data

as it traverses the formation may be handled at the Well site pattern. One consequent advantage of non-parametric algo

by a computer system such as that shoWn in FIG. 2. The rithms is that they are capable of recogniZing clusters of

computer system consists of a keyboard 308 and a monitor varied shapes.

306 to permit user interaction With the computer toWer 302 35

One example of a non-parametric approach is to divide

containing the CPU and peripheral hardWare. Logs gathered the observations space (eg the graph of FIG. 3) into regular

at the Well site may be graphically displayed on the monitor hypercubes of a ?xed siZe and estimating the probability

306 and clustering (described beloW) may be performed to density function (PDF) of the data based on the number of

determine the petrophysical characteristics of the formation. measurement points in each hypercube. One draWback of

Alternatively in a second embodiment, because of cost, this approach is that the number of hypercubes increases

space, poWer, and transportation restrictions, one may prefer exponentially as the number of data dimensions.

storage of the logged data gathered by the sonde in a storage Furthermore, this approach encounters critical dif?culties

medium such as a 3.5 diskette, tape or recordable compact When the data includes clusters that are closely spaced

disc 310. Processing of the data for determination of for and/or clusters of very different densities or siZes.

mation characteristics may then occur at the of?ce or labo 45 Another example of a non-parametric approach is the

ratory using more poWerful CPUs at a substantially loWer K-Nearest-Neighbor (KNN) approach. Rather than estimat

cost than at the Well site. ing the PDF by determining hoW many points are in a given

As shoWn in FIG. 3, if one analyZes a scattering of points data volume (eg a hypercube), the PDF is estimated by

representative of the logs carried out on a succession of measuring the data volume occupied by a given number of

levels in a bore hole interval, it is noted that the distribution points. The K-Nearest-Neighbor approach estimates the

density of these points in the scatter varies. This ?gure is the PDF around a point by determining the radius from the point

result of Multi-Resolution Graph-based Clustering (MRGC) to its Kth nearest neighbor.

using four logs (Nphi, RhoB, GR and DT), although only the Another example of a non-parametric approach is

tWo log measurements RhoB and Nphi are shoWn in the tWo described by C. T. Zahn in Graph theoretical methods for

dimensional graph. RhoB corresponds to a density measure 55 detecting and describing Gestalt Clusters, IEEE Trans. On

ment log and Nphi is a porosity measurement log measured Computers, v. C-20, n. 1, pp. 6886, 1971, incorporated

by a Compensated Neutron Log (CNL) tool. Examples of herein by reference. This approach relies on graph theory. A

other log measurement properties include natural gamma connected graph is constructed by linking the data points xi

radiation measurements (GR), temperature measurement by arcs according to their proximity relationship.

(HRT), inverse of square root of resistivity measurement In a graph representation of a data set, each observation

near the Wall of the Well in the invaded Zone (HRXO), xi is represented by a node, and an arc is established betWeen

acoustic Wave transit time measurement (DT), measurement tWo distinct nodes if they are linked by a relationship de?ned

of resistivity of formation far from Well or bore hole (RT), in the data set S. The graph is denoted by G=(S, A), Where

and measurement of resistivity near bore hole Wall (RXO). A is the set of arcs. Various graph structures have been used

These measurements may be made by logging tools such as 65 for clustering. Four particular structures are the Minimum

resistivity tools, induction tools, nuclear magnetic resonance Spanning Tree (MST), Relative Neighborhood Graph

tools, thermal neutron decay tools, and gamma radiation (RNG), Gabriel Graph (GG), and Delaunay Triangulation

US 6,295,504 B1

7 8

Graph (DTG). The MST is the graph that connects every then it is b thereafter. In practice, b is set equal to b=K+1,

node together With the shortest overall arc length. The RNG, thereby providing all points outside the neighborhood of

GG, and DTG are graphs With nodes coupled based on interest an equal ranking.

increasingly relaxed distance requirements. For a given set The limited rank On(x) is de?ned above for n=1, 2, . . . ,

of measurement points, these graphs are related in the K, ie for xs K nearest neighbors. The sum of the limited

folloWing manner: ranks for each point x is expressed as:

s(x) : 2 03/10:).

The inclusion of Gg G means that, for the same node set S, n:l

10

the set of arcs of G is contained in the set of arcs of G. The

connectivity of MST implies the connectivity of the other 3 The smallest sum of the limited ranks is expressed:

structures, so that these graph-based methods begin With a

single cluster that is to be divided.

Typically, heuristic rules are then applied to remove the Sim-n = [Ii/E35 (1a)}, (Eqn 4)

15

inconsistent arcs that bridge inherent separations among

potential clusters. The elimination of these arcs breaks the

Where N is the number of measurement points in the data set

graph into several connected sub-graphs, called connected

components. Each connected component gathers a group of S. The largest sum of the limited ranks is similarly

points that are recogniZed as a cluster. The dif?culty With expressed:

this approach lies in the determination of the heuristic rules.

Generally, these rules are based on the comparison of max

(Eqn 5)

valuation of arcs With a homogeneity criterion for detecting

arcs With uncommon valuation (the inconsistent arcs).

HoWever, no such rules have been successfully established Using the largest and smallest sums of the limited ranks, the

for more than three dimensions. Furthermore, the resulting limited rank sums can be normaliZed to a range of 0 to 1:

clusters are often unstable due to the irregularity of data

/

distribution (i.e. the identi?ed clusters are highly sensitive to 5min

/ t

small disturbances in the data). Smax 5min

authors of the folloWing references:

This is the Boundary Index function I(x). Because a point

C. Gan et al Classi?cation non supervise par detection near a mode of the PDF is more likely to be the nearest

des zones frontires application en reconnaissances des

neighbor of nearby points than a point near a valley of the

formes et segmentation, 13ime Colloque GRETSI, Juan PDF, a boundary index value near Zero indicates that mea

les-pins, Tome 2, pp. 11051108, 1991, and C. Gan, Une35 surement point x is neara mode of the PDF, Whereas a value

approche de classi?cation non supervise base sur la near one indicates that measurement point x is neara valley

notion des k plus proches voisins, French Ph.D. Thesis, of the PDF.

University of Technology of Compigne, 1994. In this After determining the boundary indices, Gan applies a

approach, the KNN approach is combined With the graph relaxation process to the boundary indices to remove the

theory approach to devise a Boundary Index that alloWs local irregularities. Gan then separates the cluster kernel

detection of data clusters from data sets of any dimension

points from the boundary points by a simple thresholding of

and of very complex shapes and con?gurations. Like KNN, the boundary index values. In using this approach, the

Gan requires a single parameter: the number of neighbors K. optimal value of K is determined by systematically repeating

That is, given a value for K, the number of clusters is the process for various values of K and observing the

automatically determined. Unlike KNN, Gans Boundary resulting numbers of clusters. A stable Zone may appear in

Index is sensitive to the change of the local PDF rather

Which a range of consecutive K values results in a consistent

than to the PDF itself. The KNN estimates the local PDF, number of clusters. Any of these K values may be consid

Whereas the Boundary Index (BI) indicates Whether a point ered optimal.

is relatively close to a mode of the PDF (i.e. a local

Advantages of Gans approach include the identi?ability

maximum density Zone, or the center of a cluster) or

of clusters having varied shapes, densities and volumes,

relatively close to a valley of the PDF (i.e. a local minimum even When they are small. Disadvantages include cluster

density Zone, or the border of a cluster). Gans Boundary sensitivity to K. As expected, increasing K generally

Index is noW developed here in detail in the context of

decreases the number of clusters due to a stronger smoothing

measurement point clustering. effect. HoWever, When the clusters are not Well-separated,

Let measurement point x be an element of the set of

55 the stable Zone disappears and it becomes dif?cult to

measurement points S={x1, x2, . . . XN} and let measurement

identify a suitable value for K. Further, the organiZation of

point y be measurement point xs nth Nearest Neighbor the clusters is very sensitive to small changes in K. Finally,

(NN) in the set of measurement points S, nK. The limited for very unbalanced cluster siZes, this approach fails to

rank of measurement point x With respect to its nth NN, y,

identify obvious clusters. For example, experiments indicate

is de?ned to be:

that for a log data set containing 2500 points With more than

a third of these data points concentrated in a very small

o';l(x) : m if x is the m' NN of y, m s K (Eqn 2)

space forming a very compact cluster, and having the

remaining points dispersed in a very large Zone forming

many clusters of great volumes, Gans approach does not

65 identify the obvious compact cluster as one cluster. Rather,

Where b is set so that K+bN-1. In other Words, the Gans approach generates several clusters in the small

limited rank On(x) is the rank of x relative to y up to K, and compact Zone and a feW clusters in the large Zone. The

US 6,295,504 B1

9 10

disadvantages and drawbacks of Gans approach may be The smallest and largest sums are expressed

overcome by the method of Multi-Resolution Graph-based

Clustering (MRGC) shown in FIG. 4. 5min = gym. and (Eqn 9)

Prior to clustering, the MRGC method requires the log

measurements to be normaliZed. As With all clustering Smax = gig/{NWO} (Eqn 10)

methods using the concept of distance, the MRGC is

sensitive to scale changes. It is therefore desired to ?rst

normaliZe the data so as to balance the Weights of each

The Neighboring Index function I is de?ned to be:

measured feature (a feature represents one dimension of data

point). Hereafter, it is assumed that the data has been

M) = 5(X) _Smin (Eqn 11)

normaliZed in some fashion. Smax 5min

The ?rst step in the MRGC method, block 510, deter

mines a Neighboring Index value for each measurement

point. The neighboring index values are then used in block 15

The value of I varies betWeen 0 and 1. Unlike the boundary

515 to automatically form small basic data groups by use of index function, the higher the value of the neighboring index

a multidimensional KNN point-to-point attraction algo function I, the closer the point to a mode of the PDF. The

rithm. Independently of block 515, the neighboring index neighboring index function replaces K With a smoothing

values are used in block 520 to determine a Kernel Rep parameter a, but advantageously, the neighboring index

resentative Index (KRI) for each measurement point. The function is less sensitive to changes in a.

points are then ordered according to their KRI, and one or

more optimal numbers of clusters (corresponding to differ One method for calculating the Neighboring Index func

ent resolutions) is suggested to the user. The user selects a tion is noW described With reference to FIG. 5. First, each

resolution, and in block 530, the ?nal clusters are obtained 25 measurement point is assigned an index i from 1 to N. In step

by use of a multidimensional merging process that joins the 610, a nearest neighbor array A is determined in Which

basic data groups in the order of their linking strength at is the index of the jth nearest neighbor of measure

their joint boundary, until the desired number of clusters is ment point xi. For example With 6 measurement points, A

achieved. could be:

Once the clusters have been obtained, they can be pro

vided to a geologist Who identi?es the facies that they j=1 to 6 (Eqn 12)

represent. Discussed further beloW are methods for using the 526431

clusters to provide identi?cation of neW measurement points 463152

taken elseWhere. First, hoWever, each of the blocks in FIG. 35 > II

II-

II

615243

4 is described in greater detail. 132564

In block 510, the MRGC method calculates a modi?ed 624315

boundary index using an exponential function and an unlim 231456

ited WindoW siZe, Which is alWays equal to N-1 (N is the

total number of data points). The neW index is called

Neighboring Index (NI), and it is based on the Weighted

rank of measurement point x relative to all the other mea In this example, x2s nearest neighbor is x4, and x4s nearest

surement points y: neighbor is x1. For convenience, xi is considered to be the

Nth nearest neighbor of itself. The determination of this

45

array requires on the order of N2/2 operations due to the

need to calculate the distance from each point to every other

Where x is the m"1 NN of y, With y being the nth NN to x. point. Once it is determined, a sorting algorithm can be used

The K parameter in the boundary index function is effec to identify the rank array C in Which is the rank of

tively replaced by a constant a that is greater than Zero. It is measurement point xi relative to its jth nearest neighbor.

noted that a is insensitive to the siZe of the data set and may

be set once for all log data sets. In practice, a has been In step 620, a companion array B initialiZed as folloWs:

successfully set to 10, but a range of successful values is

expected. The Weighted rank On(x) is de?ned relative to all j: 1 to 6 (Eqn 13)

the other points, so n ranges from 1 to N1. 55 123456

Unlike the limited rank function on Which is an increasing 123456

function, the Weighted rank function on is a strictly decreas 123456

ing function varying from 1 to 0, i.e., (1, 0). With respect to 123456

each of its neighbor, x has a rank On(x), n=1, 2, . . . , N1. 123456

The sum of the Weighted ranks for a given measurement 123456

point x is:

65 neighbor array A. The nearest neighbor array A is sorted a

roW at a time With B as a companion array. As values are

rearranged in A, the companion array B is rearranged in the

US 6,295,504 B1

11 12

same Way. After the contents of the roWs in A are sorted in otherWise. Each point x is directed to the nearest neighbor

the above example, the companion array B is: y that maximiZes the attraction function With a value greater

than Zero. If none of the nearest neighbor points of x has an

j: 1 to 6 (Eqn 14) attraction value greater than Zero, then x is not adhered to

any other points.

625413 In FIG. 6, step 720 determines Eqn17 for all the neigh

463152 boring points y of a measurement point x. In step 730, an

246531 adherence point (if one exists) is determined for measure

132645 ment point x by identifying the point y in the K2 nearest

524361 neighbor points of point x that maximiZes Attrx(y):

312456

Am, : Max(AllrX(y)) (Eqn 19)

yEVX

neighbor of x1. It is noted that it is desired to preserve the 15 and verifying that the maximal value is greater than Zero. To

original nearest neighbor array A, so the sorting operation is reduce the number of computations, it may be desired to

performed on a temporary copy of A and the original is left eliminate the constant neighboring index value I(x) from

undisturbed for future use. Eqn17 and modify Eqn19 to be:

In step 630, the array C is ?lled so that is the rank

of measurement point xi relative to its jth nearest neighbor.

Since

and B[k][i]

the index

givesofthetherank

jth of

nearest

i relative

neighbor

to Xk,ofthe

xi folloWing

is

relationship can be used to ?nd C: In steps 740 and 760, each point x is classi?ed into one of

the folloWing categories:

C[i]U]=B[A[i]U]][i] (Eqn15) 25 1. x is not directed to any other point (Attr 20). In this

For the above example, the rank matrix C is: case (block 750) x is free attractor, meaning that x is

the kernel point of a local maximum of the PDF.

2. x is directed to another point and at the same time one

j: 1 to 6 (Eqn 16)

or more other points are directed to x. In this case, (block

543126 770) x is a related attractor, meaning that x is on the slope

314226 surrounding the kernel point of a local maximum of the PDF.

254326 3. x is directed to another point but no other points are

451346 directed to x. In this case, (block 780) x is pending and

554316 related, meaning that x is a boundary point in a local

213516 35 minimum of the PDF.

Once all the points have been classi?ed as determined in step

790, the points are formed into attraction sets. All free and

Having determined the rank matrix C, it can be used in related attractors are considered as modes, While the pending

step 640 to easily calculate the Weighted ranks On(x) of and related points are considered as valleys (boundaries). An

Eqn7, Which are in turn used in step 650 to calculate the attraction set is de?ned to be all points that directly or

summations of Weighted ranks s(x) of Eqn8. The minimum indirectly adhere to a common mode. Attraction sets may be

and maximum summations are determined in steps 660 and considered as basic (elementary) clusters that are small

670, and then in step 680 the Neighboring Index values of natural data groups of the analyZed data set. Attraction sets

Eqn11 are calculated for each measurement point. include points in the valley Which do not attract any other

Returning momentarily to FIG. 4, once the neighboring 45 points. These are considered the boundary points of the set.

index values are calculated, they are then used in block 515 In the KNN attraction method, the parameter K2 is used

to automatically form small basic data groups by use of a in the adherence function. The variable K2 can be considered

multidimensional KNN point-to-point attraction algorithm. a smoothing parameter. The higher its value, the less basic

These attraction sets Will be used in the mode merging step the attraction set structure are. But K2 should not be too

530 for forming ?nal clusters. high, as it Will reduce local point-to-point attraction and

The basic idea of the multidimensional KNN point-to merge some structures separated by narroW and deep val

point attraction method is to attempt to associate every point leys. In this step, a small K2 is usually preferred to construct

x in the set of measurement points With an adherence point high-resolution structures. HoWever, a very small K2 should

y that maximiZes the attraction function, Attrx(y). The choice also be avoided because it Will create small, disconnected

of the point y is based on the concept of path of the highest 55 islands, that is, attraction sets Where points are attracted

gradient. A general attraction function is: among themselves Without creating any boundary points

(block 780). If no boundary points are recogniZed in an

attraction set, this attraction set Would never be merged With

other sets in the ?nal merging step (block 530). By

The Neighboring Index values of points x and y are repre experience, K2=5 generates consistent results even for high

sented by I(x) and I(y) respectively. The adherence function dimensional data sets. The preferred values for K2 are not be

Vx(y) can be any function. A useful form of the adherence less than 4 or much higher than 12, even for very large sets

function could be the exponential function: of data.

VX(y)=exp(m/[5), Where y is the mth NN of x and [5>O (Eqn18) As a comparison, in the post-processing of Gans clus

65 tering method, the clusters are recogniZed by use of a

In the preferred embodiment, the MRGC uses Vx(y)=1 if y threshold on the boundary index function, Where points With

is one of the K2 nearest neighbor points of point x, and 0 a boundary index loWer than the threshold are considered as

US 6,295,504 B1

13 14

mode points; otherwise they are considered as boundary clusters of equivalent volumes. The combination of these

points. After removing the boundary points, Gan detects the tWo factors produces a good balance betWeen the siZe and

connected components formed by mode points as clusters, the volume of a cluster, and generates consistent results (the

and then the boundary points are assigned to the cluster of change of Weights, b and c, Will change this balance). The

their nearest neighbor. Compared With Gans method, eXpe third draWback of Gans method mentioned above, that it

rience shoWs that the MRGC method using KNN point-to cannot generate consistent clusters if the data sets present

point attraction results in more consistent and stable clusters clusters of very unbalanced siZes, is thus solved.

relative to variations in the smoothing parameters. In other Once the KRI have been calculated, they may be sorted

Words, unlike the method of Gan, the MRGC method and displayed as shoWn in FIG. 8. With the help of the

generates better cluster borders and the results are less decreasingly ordered KRI curve, one can easily recogniZe

sensitive to the choice of smoothing parameter. the importance of a mode in the overall data set. There are

Referring momentarily to FIG. 4, as the K-Nearest several important drops (breaks) Which corresponds to the

Neighbor attraction for each point is being determined and changes of cluster kernels from one stable plateau to another.

natural small groups are formed in block 515, a kernel The drop points at 2, 5, 8 and 12 clusters each corresponds

representative indeX (KRI) for each point in the data set may 15

to the optimal number of clusters at different resolutions.

be determined in block 520. The KRI permits determination The drop points of the curve can be automatically

of the optimal number of clusters for the analyZed data set. detected by the peaks of the gradient (the ?rst derivative) of

For Well-separated cluster data sets that contain a signi? the decreasingly ordered KRI curve. These peaks as Well as

cant probability density difference betWeen modes and their values are provided to the user as possibilities of

valleys, the number of clusters can be easily identi?ed. But optimal cluster numbers and associated quality indeXes at

in actual application data sets, the clusters are often very different resolutions.

close together and the number of clusters is ambiguous. The The points With the highest values of KRI selected in this

optimal number of clusters becomes a function of desired Way are considered as cluster kernels and are used for

resolution and it depends on the users requirements, mode merging and forning the ?nal clusters as shoWn in

namely, at What resolution they Would like to analyZe the 25

block 530 of FIG. 4.

data. The MRGC automatic clustering method permits the Before presenting multidimensional merging, a 2-D gray

user to select the optimal number of clusters from one or level image eXample serves as a simple visualiZation aid. If

more different possibilities. Each cluster number is associ We consider the images gray-level as the third dimension,

ated With a quality indeX and is suggested in the order of its an image can be considered as a relief having mountains

probability. (light values) and seas (dark values). The merging method is

To detect the best number of clusters, the MRGC algo conceptually similar to ?ooding the seas little by little, With

rithm ?rst determines the representativity of each point in one sea merging With another in the order of their loWest

the data set. The representativity value aids in determining border levels. To merge mountains rather than seas, inverse

if a mode is a real mode of a cluster or just a local landscapes should be considered, With the merging occur

irregularity. Each point is characteriZed by hoW closely it 35

ring in order of the highest border levels. This process alloWs

us to remove the shalloWest valley and merge the tWo most

represents a cluster kernel, and the best kernel representa

tives are then selected to form clusters by merging basic probable neighboring modes into one. But the realiZation of

structures detected in block 515 (the merging process is such a process for multidimensional data is not simple,

presented further beloW). While ordering and analyZing the because the neighbor relationship among modes is not easy

function of kernel representativity, We can recogniZe some to evaluate. With the help of the K-Nearest-Neighbor

cluster kernels are much better represented than others, and concept, the attraction sets and boundary points recogniZed

the gradient change of the Kernel Representative Indexes previously in block 515, the MRGC method implements the

(KRI) can highlight these kernels. folloWing algorithm for multidimensional dot pattern mode

The neighboring indeX function by itself is inadequate as merging as shoWn in FIGS. 9A and 9B. Attraction sets

an indicator for the number of clusters present because it is 45

referred to beloW are the sets of points corresponding to the

a primarily a local indicator With little in?uence from the elementary groups found in block 515.

points outside the local region. To remedy this local effect, Step 1: For every pair of attraction sets S1 and S2 (block

the MRGC algorithm adds tWo other factors, the number of 1001), ?nd a pair of points, P1 and p2 that ?t the folloWing

neighbors and the distance at Which the neighboring indeX of four conditions:

a point loses its poWer. This is the distance at Which 1. p1S1 and PZESZ, block 1005.

another point having a higher value can be found. 2. p1 and P2 are boundary points in S1 and S2, block 1010,

Let I(X) be the neighboring indeX value of point X, and y 1015.

be the ?rst neighbor of X having and indeX value I(y)>I(X) 3. p1 is in the K-nearest-neighbors of P2, block 1020, or

as shoWn in block 810 of FIG. 7. As calculated in block 820, P2 is in the KNN of p1, block 1025. This K could take

the Kernel Representative Function can be Written as: 55 the value of 2><K2 for eXample, but should not be too

high to avoid merging non-neighbor attraction sets.

4. The distance betWeen P2 and p1, D(P2, p1), block 1030,

Where a, b, and c are the exponents used to Weight each should be the minimum among all pairs of points

corresponding function I(X), M(X,y), and D(X,y). The neigh satisfying the three previous conditions.

bor function M(X, y)=m, if y is the m1 neighbor of X, and The points, p1 and P2, may not eXist for all pairs of attraction

D(X,y) is the distance betWeen X and y. Based on testing sets. These points are considered as the shortest passage

eXperience, a=b=c=1 give good results for the tested data. In from one mode to another (S1 and S2). At the end of step 1,

this function, the factor I(X) alloWs us to recogniZe the peak We have a list of all the shortest passages that eXist betWeen

(kernel) of a mode, and M(X,y) and D(X,y), the eXtension of pairs of attraction sets.

the importance of this mode on the Whole data set. 65 Step 2: For each passage in the list of shortest passages,

The number of neighbors, M(X,y), tends to produce result determine a level L, and sort the list of shortest passages in

ing clusters of equivalent siZes and the distance, D(X,y), decreasing order (block 1033). For a given passage (P2, P1),

US 6,295,504 B1

15 16

the level is de?ned by the minimum neighboring index Large values close to one indicate high con?dence

value: identi?cation, Whereas small values close to Zero indi

L=Min (I(P1), I(p2)) (Eqn22) cate more of an extrapolation in making the identi?

cation.

Step 3: For each passage in the sorted list, blocks 1035 2. A Membership Index (MI) to indicate if x is inside or

and 1040 test Whether S1 and S2 contain a cluster kernel

outside of the cluster de?ned on the reference data set.

selected in block 520. If at least one of them does not, merge

The MI can be calculated by:

the tWo modes (block 1045). This is repeated in order of

decreasing passage levels (block 1034) until all passages

have been considered (block 1050). In other Words, the

modes having the shalloWest joint valley are merged ?rst. 10 Where k is the kernel point of the cluster c, and D is the

The clusters made in this Way are deterministic. Clusters distance function. MI is close to 1 When x is inside and close

of higher resolutions using higher numbers of cluster kernels to the cluster c. The higher the MI the farther x is outside of

are alWays subclusters of the loW-resolution clusters. This the cluster c. The MI can be used for detecting abnormal

corresponds to the hierarchical Way that the geologist orga points and/or neW facies of the application data set.

niZes the geological facies. The MRGC technique Will 15 3. An Ambiguity Index indicates if x is relatively

automatically ?nd the most easily broken clusters and pro ambiguous betWeen tWo clusters. It can be calculated

vide a solution according to the available data structure. This by:

property is important for subdividing facies based on the

already recogniZed electrofacies con?guration.

Various extensions and different application modalities

Where Z is the nearest neighbor of x in the reference data set

exist for the MRGC automatic clustering technique

after removing the cluster c.

described above. The Neighboring Index generated from the

The values of AI should be betWeen 0 and 1. The higher

MRGC method may be very helpful for geological inter

the Al, the more ambiguous xs position is betWeen tWo

pretation of log measurement data. For example, threshold

clusters.

ing the entire data set based on the NI Will identify the denser 25

Using any or all of the above quality indices Will enable

Zones of the log data set. Based on various experimental

automatic feedback to assess the applicability of the model

trials, it is knoWn that these Zones correspond to the thick

to the neW Well for each point.

beds less affected by shoulder effects, Which have good

In discussing the quality indices, it Was assumed that the

homogeneity and greater lateral extension. Hence, these data logs gathered at the neW Well site Were the same types

Zones are the most important for ?uid How and, thus, greater

as those used to create the model. This assumption may be

attention must be paid to these thick beds. They Will be

incorrect. There are many occasions Where feWer or different

useful for calibrating the electrofacies and designing/

logs are gathered, eg due to failure of one or more

optimiZing the sampling program of the core.

instruments doWnhole. Consequently, the electrofacies

Another extension of the MRGC algorithm also involves

model may need to be propagated to other Wells using a

the geological interpretation and calibration of the electro 35

reduced application data set.

facies. After the NI is normaliZed Within each cluster in the

For an electrofacies model made by N logs (and thus of

same manner as the data normaliZation described above,

then the threshold on the normaliZed NI alloWs the MRGC

N data dimensions) and an application data set With N-R

logs available, Where R is the number of logs Which are not

algorithm to recogniZe the mode Zone of each cluster. The

available in the application data set, the nearest neighbor

visualiZation of these mode Zones facilitates the geological

propagation method preferably assigns for each application

interpretation and calibration of the electrofacies, by alloW data point the electrofacies of its nearest neighbor in the

ing the geologist to concentrate on the most representative

reference data set While ignoring the R unavailable logs of

points of the facies.

the reference data set. This method respects the original

After the electrofacies model is made, it is possible to

cluster shapes and can be applied to the results of any

interpret another Well in terms of the existing electrofacies 45

clustering method. For the case of MRGC, the quality

model. In the preferred embodiment, the Nearest Neighbor

Propagation algorithm assigns to each neW measurement indexes MI (Membership Index) and Al (Ambiguity Index)

described above can also be calculated.

point the electrofacies of its nearest neighbor in the refer

Testing the process described above on a four

ence data set used to create the model. Unlike the dynamic

dimensional data set, the folloWing error percentages rela

clustering or Self-OrganiZing-Map methods Which generate

tive to a complete data set Were obtained When one dimen

a mosaic separation of the log space around the cluster

sion Was omitted.

kernels Without any consideration of the cluster shapes, the

nearest neighbor propagation algorithm retains the original

cluster shapes.

Because the neW measurement data might not ?t the 55 Removed Log RhoB Nphi GR DT

model data, quality control While gathering log data mea

Error % 26.5 18.5 34.8 16.1

surements is very important for correct model propagation.

Three quality indices may be determined for each neW

measurement point (also termed application point). The This example illustrates that Nearest Neighbor Propagation

quality indices are based on the concept of MRGC in order alloWs propagation to reduced data sets. Furthermore, for

to evaluate the result of model propagation. First, for each complete data sets containing all logs it can be used to

application point x, its nearest neighbor y belonging to the recogniZe the separability of each log for electrofacies by

cluster c in the reference model data set is determined. The removing one log at a time. For the above example, the order

propagation algorithm then calculates the folloWing infor of importance of these four logs is GR then RhoB, Nphi and

mation for each x: 65 DT. The larger the errors made after removing a particular

1. The NI of y and the cluster-normaliZed NI of y to help log, the greater the amount of information provided by the

locate the point x With respect to the cluster kernel. log.

US 6,295,504 B1

17 18

Propagation of electrofacies model for complete or forming attraction sets from the points having a shared

reduced data sets can also occur using a Most Attracting kernel point.

Nearest Neighbor (MANN) method, that is, the neighbor 7. The method of claim 6, Wherein the attraction value

having the highest value of NI in the K nearest neighbors of calculations include ?nding the products I(y) Vx(y) Where

the reference data set. Nearest Neighbor propagation is a Vx(y) is an adherence function.

special case of MANN propagation With K=1. This method 8. The method of claim 7 Wherein the adherence function

attracts application points to higher probability density Vx(y)=1 if y is one of the K-nearest-neighbors of X, other

Zones (the higher the NI, the more dense the Zone). In theory, Wise Vx(y)=0.

for the reduced data set the propagation error should be 9. The method of claim 5, further comprising:

reduced in Zones of overlaid clusters.

10 determining one or more optimal cluster numbers by

Numerous variations and modi?cations Will become

apparent to those skilled in the art once the above disclosure

calculating a Kernel Representative IndeX for each

is fully appreciated. For example, clustering techniques may measurement point in the data set.

be applied in ?elds such as economic analysis, image 10. The method of claim 9 Wherein the Kernel Represen

analysis, quality control for manufacturing and statistical tative IndeX is determined by determining for each

analyses of other kinds. It is intended that the folloWing 15 point X in the data set, the nearest neighbor y of X satisfying

claims be interpreted to embrace all such variations and I(y)>I(X); and

modi?cations. calculating F(X)=I(X)*Mb(X,y)*DC(X,y) in Which M(X,y)

What is claimed is: is a rank function that equals m When y is the mth

1. A method of de?ning the electrofacies of a geological neighbor of X, D(X,y) is the distance betWeen X and y,

formation traversed by a bore hole comprising: and a, b, c are predetermined constants.

moving a sonde through a plurality of positions in said 11. The method of claim 9 Wherein the optimal cluster

bore hole and recording a data set having a number d number corresponds to a sharp drop in the Kernel Repre

of measurements taken by the sonde at each of the sentative IndeX

plurality of positions, Wherein the d measurements at 12. The method of claim 9, further comprising:

each position are associable With a point in 25 performing mode merging of the attraction sets to form

d-dimensional space; and clusters of measurement points, each cluster de?ning

calculating a neighboring indeX I of each measurement an electrofacies.

point. 13. The method of claim 12, Wherein the clusters of

2. The method of claim 1, Wherein said calculating a measurement points are formed by:

neighboring indeX includes:

determining for each given point X said given points for each a pair of attraction sets S1 and S2 from all

nearest neighbor ranks m relative to all other measure attraction sets, identifying a pair of points p1 and P2

ment points in the data set; belonging to S1 and S2, respectively, that satisfy the

calculating the Weighted ranks, On(X), of said given point conditions:

using the nearest neighbor ranks m; and (a) p1 and P2 are boundary points;

35

determining for each given point X the summation s(X) of (b) p1 is in K-nearest-neighbors of P2 or P2 is in

all the Weighted ranks On(X) for said given point. K-nearest-neighbors of p1; and

3. The method of claim 2, Wherein said calculating a (c) the distance D(p2,p1) betWeen p1 and P2 is minimum

neighboring indeX further includes: among all pairs of points satisfying conditions (a)

and

determining from the summations s(X) for each measure

14. The method of claim 13, further comprising:

ment point X a minimum summation value Smin;

determining from the summations s(X) for each measure

calculating L=Min(I(p1), I(p2)) for each pair of points

ment point X a maXimum summation value Smax; and

satisfying conditions (a), (b) and (c); and

storing the values (p 1, p2) Wherein into a list in decreasing

calculating for each point X the neighboring indeX I(X) order With respect to L.

corresponding to 45

15. The method of claim 14, further comprising:

traversing the list in decreasing order While merging sets

Smax _ 5min

corresponding to points p1 and P2 if the sets do not both

contain previously selected cluster kernels.

16. The method of claim 12, further comprising:

4. The method of claim 2 Wherein the Weighted rank of the correlating the electrofacies With the geological forma

point is On(X)=6Xp(II1/a). tions traversed by the bore hole.

5. The method of claim 1, further comprising: 17. The method of claim 16, further comprising:

determining attraction sets that are disjoint sets of mea prior to said calculating the neighboring indeX, selecting

surement points. 55 the recorded measurements points that are stable over

6. The method of claim 5, Wherein the attraction sets are consecutive levels; and

determined by: after said correlating, comparing a recorded measurement

calculating for each measurement point X in the data set point not selected in said selecting step to the clusters

a set of attraction values Attrx(y) Where y ranges over of selected measurement points to predict the facies of

the set of the nearest neighbors of X; the geological formation at the bore hole positions

calculating the maXimum Attrx(y) for each point X; corresponding to the unselected recorded measurement

for those points X having a maXimum Attrx(y) greater than points.

Zero, determining a directed connection from point X to 18. The method of claim 17, further comprising:

the point y that maXimiZes Attrx(y); producing a graph of the electrofacies of the geological

using the directed connections to categoriZe each mea formation as a function of the depth of the bore hole.

surement point X in the set as a kernel point of a mode, 19. An apparatus for performing automatic clustering,

slope point of a mode, or boundary point of a mode; and comprising:

US 6,295,504 B1

19 20

a memory unit con?gured to store log measurement points neighbor of X, D(X,y) is the distance betWeen X and y,

in d-dimensional space; and and a, b, c are predetermined constants.

a processing unit con?gured to retrieve the log measure 26. The apparatus of claim 19 Wherein the optimal cluster

ment points from the memory unit, and con?gured to number corresponds to a sharp drop in the Kernel Repre

calculates a neighboring indeX I of each log measure sentative IndeX

ment point. 27. The apparatus of claim 19, Wherein said processing

20. The apparatus of claim 19, Wherein the neighboring unit is further con?gured to perform mode merging of the

indeX corresponding to each log measurement point is attraction sets to form clusters of log measurement points,

calculated by: each cluster characteriZing an electrofacies.

determining for each given point X said given points 28. The apparatus of claim 27, Wherein the clusters of log

nearest neighbor ranks m relative to said given points measurement points are formed by:

N-nearest-neighbors; for each a pair of attraction sets S1 and S2 from all

calculating the Weighted ranks, On(X), of said given point attraction sets, identifying a pair of points p1 and P2

using the nearest neighbor ranks m; and belonging to S1 and S2, respectively, that satisfy the

15 conditions:

determining for said given point the summation s(X) of the

Weighted ranks of said given point. (a) p1 and P2 are boundary points;

21. The apparatus of method of claim 20, Wherein the (b) p1 is in K-nearest-neighbors of P2 or P2 is in

neighboring indeX corresponding to each log measurement K-nearest-neighbors of p1; and

point is calculated by: (c) the distance D(p2,p1) betWeen p1 and P2 is minimum

among all pairs of points satisfying conditions (a)

determining for each given point X said given points

nearest neighbor ranks m relative to said given points 29. The

andapparatus of claim 28, Wherein the clusters of log

N-nearest-neighbors; measurement points are further formed by:

calculating the Weighted ranks, On(X), of said given point calculating L=Min(I(p1), I(p2)) for each pair of points

using the nearest neighbor ranks m; and 25

satisfying conditions (a), (b) and (c); and

determining for said given point the summation s(X) of the

Weighted ranks of said given point; storing the values (p1,p2) Wherein into a list in decreasing

order With respect to L.

determining the minimum value Smax of s over each

30. The apparatus of claim 29, Wherein the clusters of log

log measurement point in the set; measurement points are further formed by:

determining the maXimum value Smwc of s(X) over each traversing the list in decreasing order While merging sets

log measurement point in the set; and corresponding to points p1 and P2 if the sets do not both

calculating the neighboring indeX I(X) for each point in contain a previously selected kernel points.

the set so that 31. The apparatus of claim 27, Wherein the processing unit

35 is further con?gured to correlate the electrofacies With the

5(X) 5min facies traversed by the bore hole.

max 5min

32. Amethod of de?ning the electrofacies of a geological

formation traversed by a bore hole comprising:

22. The apparatus of claim 19 Wherein said processing moving a sonde through a plurality of positions in said

unit is further con?gured to determine attraction sets, the bore hole and recording a data set having a number d

attraction sets containing log measurement points. of log measurements taken by the sonde at each of the

23. The apparatus of claim 22, Wherein the attraction sets predetermined levels, Wherein the d log measurements

are determined by: at each position are associable With a point in

calculating for each log measurement point p in the data d-dimensional space;

set a set of attraction values Attrx(y)Where y ranges 45 calculating a neighboring indeX I of each log measure

over the set of the nearest neighbors of X; ment points; and

calculating the maXimum Attrx(y) for each point X; determining an optimal cluster number by calculating a

for those points X having a maXimum Attrx(y) greater than Kernel Representative IndeX for each log mea

Zero, determining a directed connection from point X to surement point.

the point y that maXimiZes Attrx(y); 33. The method of claim 32, Wherein the Kernel Repre

using the directed connections to identify those Which log sentative IndeX is determined by:

measurement points serve as a kernel points; and determining for each point X in the data set, the nearest

for each kernel point, forming an attraction set that neighbor y of X that satis?es I(y)>I(X); and

includes the given kernel point and all those points 55 calculating F(X)=I(X)*Mb(X, y)*DC(X, y) in Which M(X,

having directed connections that lead to the given y) is a rank function that equals m When y is the mth

kernel point. neighbor of X, D(X,y) is the distance betWeen X and y,

24. The apparatus of claim 19, Wherein said processing and a, b, c are predetermined constants.

unit is further con?gured to determine an optimal cluster 34. The method of claim 32 Wherein the optimal cluster

number

for each by

logcalculating

measurementa Kernel

point. Representative IndeX number corresponds to a sharp drop in Kernel Representa

tive IndeX

25. The apparatus of claim 24, Wherein the Kernel Rep 35. An apparatus for performing automatic clustering,

resentative IndeX is determined by: comprising:

calculating for each point X in the data set the nearest a memory unit con?gured to store measurement points in

neighbor y of X satisfying I(y)>I(X); and d dimensional space;

calculating F(X)=I(X)*Mb(X, y)*DC(X, y) in Which M(X, a processing unit con?gured to retrieve the measurement

y) is a rank function that equals m When y is the mth points from the memory unit; and

US 6,295,504 B1

21 22

wherein said processing unit calculates a neighboring 41. The apparatus of claim 40, Wherein the Kernel Rep

index I of each measurement point. resentative IndeX is determined by:

36. The apparatus of claim 35, Wherein the neighboring

index corresponding to each measurement point is calcu calculating for each point X in the data set, the nearest

lated by: neighbor y of X satisfying I(y)>I(X); and

determining for each given point X said given points calculating F(X)=I(X)*Mb(X,y)*DC(X,y) in Which M(X,y)

nearest neighbor ranks m relative to all other points; is a rank function that equals m When y is the mth

calculating the Weighted ranks, On(X), of said given point neighbor of X, D(X,y) is the distance betWeen X and y,

using the nearest neighbor ranks m; and and a, b, c are predetermined constants.

determining for said given point the summation s(X) of the 10 42. The apparatus of claim 40 Wherein the optimal cluster

Weighted ranks for said given point. numbers correspond to sharp drops in the Kernel Represen

37. The apparatus of method of claim 36, Wherein the

neighboring indeX corresponding to each measurement point tative IndeX

is further calculated by: 43. The apparatus of claim 40, Wherein the processing unit

determining the minimum value Smin of s(X) over each 15

is further con?gured to perform mode merging of the

measurement point in the set; attraction sets to form clusters of measurement points, each

determining the maXimum value Smwc of s(X) over each cluster characteriZing a prototype.

measurement point in the set; and 44. The apparatus of claim 43, Wherein the clusters of

calculating the neighboring indeX I(X) for each point in measurement points are formed by:

the set by for each a pair of attraction sets S1 and S2 from all

attraction sets, identifying a pair of points p1 and P2

5(X) 5min

belonging to S1 and S2, respectively, that satisfy the

max 5min

conditions:

25 (a) p1 and P2 are boundary points;

38. The apparatus of claim 35, Wherein said processing (b) p1 is in K-nearest-neighbors of P2 or P2 is in

unit is further con?gured to determine attraction sets that are K-nearest-neighbors of p1; and

disjoint sets of measurement points. (c) the distance D(p2,p1) betWeen p1 and P2 is minimum

39. The apparatus of claim 38, Wherein the attraction sets

are determined by:

among all pairs of points satisfying conditions (a)

calculating for each measurement point X in the data set

a set of attraction values Attrx(y) Where y ranges over

45. The

and apparatus of claim 44, further comprising:

the set of the nearest neighbors of X; calculating L=Min(I(p1), I(p2) for each pair of points

calculating the maXimum Attrx(y) for each point X; satisfying conditions (a), (b) and (c); and

for those points X having a maXimum Attrx(y) greater than 35 storing the values (p1,p2) Wherein into a list in decreasing

Zero, directing point X to the point y that maXimiZes order With respect to L.

AWAY); 46. The apparatus of claim 45, further comprising:

using the directions to identify points that serve as a

traversing the list in decreasing order While merging sets

kernel point of a mode; and

corresponding to points p1 and P2 if the sets do not both

for each kernel point, forming an attraction set that

includes the given kernel point and all those points contain a previously selected kernel points.

directed to the kernel point. 47. The apparatus of claim 43, Wherein the processing unit

40. The apparatus of claim 38, Wherein the processing unit is further con?gured to correlate the prototype With neW

is further con?gured to determine one or more optimal measurement points.

cluster numbers by calculating a Kernel Representative 45

IndeX for each measurement point in the data set.

- TowardsaBusinessModelTaxonomyofStartupsintheFinanceSecTransféré parpufu
- A Fast and Flexible Clustering Algorithm Using Binary DiscretizationTransféré parMahito Sugiyama
- A Comparison Study Between Various Fuzzy Clustering AlgorithmsTransféré parKosa
- An Introduction to Cluster Analysis for Data MiningTransféré parmrmrva
- Depositional and Provenance Record of the Paleogene Transition From ForelandTransféré parMaría Alejandra Álvarez
- kmeansTransféré parapi-340879197
- ClusterTransféré parKarthik
- Fuzzy ModelingTransféré parAbouzar Sekhavati
- Ppt for Seminar 1Transféré parSreeja Dodla D
- [IJCST-V4I4P29]:M.Arthi,M.JayashreeTransféré parEighthSenseGroup
- ikmeans_sdm_workshop03Transféré parYaseen Hussain
- Proposal of KMSTME Data Mining Clustering Method for Prolonging Life of Wireless Sensor NetworksTransféré parMladen Vuk
- IJETR031830Transféré parerpublication
- Paper Pyroclastic FlowTransféré parSeli Astriani
- ITJDM02Transféré parramana
- asroaTransféré parAfzal Ahmad
- Significance of Classification Techniques in Prediction of Learning DisabilitiesTransféré parAdam Hansen
- Finding Prominent Features InTransféré parCS & IT
- 15-53 Data Warehousing and Data MiningTransféré parMukesh Kumar
- Data Warehouse Unit 5Transféré parYash Patni
- 03090560510572089Transféré parPalak Agarwal
- Visual Analytics and Interactive Technologies.docxTransféré parshafira
- proposalbeamer.pdfTransféré parEfren Ver Sia
- scimakelatex.10180.Qualal+Grammar.LRTransféré parOne TWo
- Comprehensive Comparative Analysis of Methods for Crime Rate PredictionTransféré parIRJET Journal
- A Review of Heavy Minerals in Clastic Sediments With Case Studies From the Alluvial Fan Through the Nearshore Marine EnvTransféré parLidia Nutu
- 20_ijictv3n10spl.pdfTransféré parKhusnulAzima
- xplore1417.pdfTransféré parkumar kumar
- Diskusi13-Clustreing Retailers by Store Space RequirementTransféré parNiki Paikito
- Patel 2016Transféré parmeena

- ANewToolforElectroFacies-RabillerTransféré parElok Galih Karuniawati
- GERAKAN LITERASI SEKOLAHTransféré parAntonius Suharsono
- PDVSA Facimage FinalTransféré parElok Galih Karuniawati
- buku-panduan-desa1.pdfTransféré pardesaku karyaindah
- PRM.1564.04_BasinClassification_Oct2017.pdfTransféré parElok Galih Karuniawati
- 932e0fcd0dd1763968e2637b44bb82b8Transféré parEgiAndi Purwanto
- ShaleGas-42017443Transféré parElok Galih Karuniawati
- 170859 Panduan Pendirian Usaha Pengembang Aplikasi DigitalTransféré parheaven knigt
- REGULASI-BARU-DESA-BARU-Ide-Misi-dan-Semangat-UU-Desa.pdfTransféré parElok Galih Karuniawati
- 09123-18-PA-Improved Reservoir Characterization Petrophysical Classifiers Electrofacies-05!07!12Transféré parElok Galih Karuniawati
- 1.41Transféré parRiska Wulan Juni
- Buku Saku Mendidik Anak Di Era Digital-edLina.pdfTransféré parElok Galih Karuniawati
- ChronostratChart2017-02Transféré parross182
- BUKU-PINTAR-DANA-DESA.pdfTransféré parWachid Al Qolbi
- garbage_enzyme.pdfTransféré parElok Galih Karuniawati
- Gas Costs Supply Growth April 2011 Web VersionTransféré parElok Galih Karuniawati
- oil and gasTransféré partabithapang
- 122-252-1-SMTransféré parElok Galih Karuniawati
- Skoll Scholarship BrochureTransféré parElok Galih Karuniawati
- Treaty Between Australia and the Indonesia on the Zone of Cooperation-PDFTransféré parElok Galih Karuniawati
- 5 Workshop - Leeth - Cement-bond LogsTransféré parElok Galih Karuniawati
- Adi Harsono LogTransféré parAhmad Mustoin
- Logging Cased HoleTransféré parDeepak Rana
- Petroleum GeologyTransféré paryudi
- Resolving Carbonate Complexity- SCHLUMBERGERTransféré paranto_ttz
- 175-358-1-SMTransféré parElok Galih Karuniawati
- CBLTransféré parZitro0991
- NMR-1505020102Transféré parElok Galih Karuniawati
- 267-540-1-SMTransféré parElok Galih Karuniawati

- Origin & Destination StudyTransféré parAjaz Raj
- Air-4.5 Overview Nawcad v1Transféré parNatural Sciences
- A Novel Protein Refolding Protocol for Solubilization and Purification of the Catalytic Fragment of Recombinant MMuLV Reverse 4Transféré parInternational Journal of Innovative Science and Research Technology
- The New Rules of GlobalizationTransféré parDragos Tudor
- Mach3 ThreadingTransféré parTombong
- 5. GR No. 161237 (2009) - Macababbad v. MasiragTransféré parNikki Estores
- 201412190943-NABL-105-doc.pdfTransféré parmahesh
- China Economics UpdateTransféré parsibilla2012
- competency 6 organizational planning (1)Transféré parapi-281319287
- leadership resumeTransféré parapi-314852963
- Tiffin Service Business Pros and Cons With Full CalculationTransféré parparagushah
- Power Transmission TowersTransféré parrakesheval
- Understanding Anesthetic Equipment & ProceduresTransféré parPradhap Prithiv
- Lm8365 Digital Clock Circuit BoardTransféré parMd Kamruzzaman Khan
- AttachedTransféré parMd. Monzur A Murshed
- Decree No. 46-2015-ND-CP Dated May 12- 2015Transféré parAnonymous 7uoreiQKO
- Short Iron CondorTransféré parNarendra Bhole
- Statistical Analysis of the Chemical Screening of a Small.pdfTransféré parfiercecat
- Comelec HistoryTransféré parJomel Manaig
- Fidelity in ComedraftTransféré parbagyabio
- ChromisPOS - User Manual v1.8-ConceptTransféré parwong.d
- OPS 12 Microchip Scanning FinalTransféré parmarlukha
- (Springer Texts in Business and Economics) Kolmar, Martin-Principles of Microeconomics _ an Integrative Approach-Springer (2017)Transféré parPhạm Phương Thảo
- EV4-TB07Transféré parFranky Wong
- Conduction Practicals(3)Transféré parRanjit Kadlag Patil
- EcoTransféré parJielyn San Miguel
- introduction to statisticsTransféré parapi-339611548
- Power Meters - Energy Management SolutionsTransféré parIsrael Exporter
- Judge Bredar Memo and Order Re Death Penalty Case no.: 16-cr-0259Transféré parWilliam Bond
- School Based Assessment-edited 2017Transféré parYɵʉňğ Ģênnä