Académique Documents
Professionnel Documents
Culture Documents
based on the way the individuals tend to .group in - mutually very close to eachother
clusters that differentiate themselvesfrom the others. - small sizes
We expectedthat the capability to display and quantify - very unbalancedsizes
natural differentiation would help greatly in the choice - very different densities/ volumes
of the most suitable log combination and in the Unfortunately, these combinationsoccur frequently
calibration process,mostly becausethe latter is based in real applications,aboveall for log data.
on the examination of discrete qualitative values Existing multi-dimensional unsupervised clustering
attributedto real objects. methodscan be categorizedinto metric methodsand
Thus, if all these objectives could be achieved,we statistical methods.
felt that facies analysiswould be more acceptableto the Metric. methods use the concept of similarity
geological community. Defining clusters by their betweenpoints, where the Euclidean distanceis often
differences fits both the way the geologist works and used. They generally try to optimize a criterion PP
the second proposition of the genuine definition of maximizing the dispersion between clusters while
Serra and Abbot: sets of log responses that minimizing the dispersionwithin clusters.
characterizea sediment and allow the sedimentto be Statistical methods analyze the underlying
distinrmishedhorn others. Probability Density Functions (PDF) of observation
distributions. Each cluster correspondsto a mode
PRIOR WORK ON CLUSTERING (local maximum of underlying PDF), and these
Clustering deals with the problem of detection of methods try to detect the functions describing the
clustersfrom a data set where a priori data structureis modes. Statistical methods are not opposite to metric
unknown. Clustering of data has extensiveapplications methods, they also use the distance to measurethe
including pattern recognition, computer vision, similarity betweenpoints.
diagnosisof complex systems,etc. It had been studied Metric Methods
for a long time in different contexts,and a large number
Metric methods include Hierarchical methods
of solutions have beenproposed.Their diversity shows
(Lukasova, 1979), and Iterative Optimization
that this problem is not simple. In fact, a standard
Methods (IOM) such as ISODATA (Ball & Hall,
defmition of clusters does not exist. Each method is
1965), fuzzy k-means(Gath & Geva, 1989), dynamic
associated with a criterion, which is not always
clustering (Diday et al., 1979), and Self-Organizing
consistent with the underlying structure of data. The
Map (Kohonen, 1989). With respect to statistical
obtained partitions have meaning only after the
methods, metric methods are simple to use and
validation of the results, which is often problematic for
implement. They are often used in two-step electro-
many applications.
facies analysis.
The study of Mourot et al. (1993) shows that
IOM try to assignthe data points to somekernels
common methods present several drawbacks which
(barycenters or neurons). The number of kernels is
make them difficult to use in practice:
given by the user. At each iteration, new kernels are
l- Need to know the numberof clustersbeforehand:
calculated by minimizing a criterion of distance
Most methods require the number of clusters as a
betweenkernels and dam points. The algorithms stop
parameter,which is often unknownby the user; and it
when the kernels are stabilized. However, the resulting
can substantiallyaffect the results.
kernels do not always fit to the modesof observations,
2- Problemsof initial conditionsand narameters:
and sometimesoccur in low data density zones.There
For many methods,the results are very sensitive to
is no possibility to control and have the kernels fit onto
initial conditions and variationsof parametervalues.
the modes. The results are sensitive to the given
3- Problemof reliabilitv:
numberof kernels.
Most clustering methods form ellipsoidal structures
but cannot detect clusters of varied shapes.Other Statktical Methods
methodsfavor elongateor curved structures,but do Two approachescan be distinguished in statistical
not work for clustersclose to one another.There are methods:parametric methodsbasedon probabilistic
methodswithout previous limitations but needa large modeling of the analyzed data structure, and non-
number of data points and/or clusters of equivalent parametric methods without using any model.
sizes (in terms of number of points). Few methods Parametric approach requires restrictive hypotheses
are reliable for data sets formed by clusters suchas the knowledgeabout the numberof clustersand
possessing the following configurations their a priori PDF. Non-parametricapproachdoes not
simultaneously: needany a priori knowledgeabout the structureof the
- varied shapes analyzed data distributions. Consequently, this
3
4
SPWLA 41 Annual Logging Symposium, June 4-7,200O
be consideredas an estimationof its rank. In practice,b a new methodfor forming consistentclustersand also a
is set equalto K+l . completely new concept about the number of optimal
With respectto each of its KNN, x has therefore a clusters,which are detailedbelow.
limited rank a,(x), n = 1,2, . .., K. Let Optimal Number of Clusters and Kernel
RepresentativeIndex
For data sets made of well-separatedclusters- clearly
displaying important probability density differences
betweenmodes and valleys - the optimal number of
clusters can be easily identified by humansas well as
The BI is defined on S and normalizedbetween0 and automatically.But in most of the real applicationcases,
1: the clusters are often ambiguous. The closenessof PP
dx) - smin clusters and the local irregularity of data make
BZ(x)
. _= recognition difftcult. The optimal numberof clustersis
smax-sti indeed a function of resolution, the resolutions at
BI quantifies mutual neighborhood relationships which a userwould like to analyzethe data.
using the rank order while the classical KNN methods Our method MRGC introduces this new concept,
estimatePDF using the radius from a point to its KNN. and proposesto the user several optimal numbers of
In other words, BI is sensitiveto the changeof local clusters corresponding to different resolutions. In
density of dam distribution rather than local density addition, the results of MRGC are organized in
itself. Hence, it is not affected by differences in hierarchical way so that the clusters of higher
densities and volumes among clusters. The higher its resolutions are always sub-clusters of the low-
value, the closer the point to the border of a proposed resolution clusters. Such a hierarchy correspondsto the
cluster. way that geologists organize facies. These multi-
BI is a function of K. The higher the K, the stronger resolutionand hierarchicalpropertieshelp a lot with the
the smoothing. We observedthat BI was sensitive to electro-facies interpretation, which will be
slight changeof K becauseof the window effect of the demonstratedin the next section.
rank function o, , where the window size is K. As we Local modes and local valleys in a data set can be
know from the signal processing, filtering using a easily identified using NI (see sub-section KNN
function of finite window size createsadditional noise Attraction). However, in order to recognize the
onto the result due to the first-order discontinuous optimal number of clusters, it is necessaryto evaluate
points at each side of the window. We thus generalized the representativityof each mode relative to the whole
BI into Neighboring Index (NI) where o, is replaced data set. In other words, is a mode an important mode
by a smooth exponential function and a quasi-infinite or just a local irregularity? Our idea is to characterize
window size (= N- 1, which is the size of data set): the representativityof eachpoint to be a cluster kernel,
called Kernel RepresentativeIndex (KRI). Points with
a,(x) = e.xp(-mla). the best values of KRI are selected as final cluster
Herexisthem*NNofy,mIN- l,anda>O. cr,isa kernels.Final clustersare then formed by merginglocal
strictly decreasingfunction varied from 1 to 0. When m modes(seesub-sectionWatershedMerging).
= 0, o, = 1. The higher the m, the closer the o, to zero; NI is an important factor for KRI, but it is just a
local indicator. Consequently,two other factors have
but o, never equalszero, i.e., [l, O[.
been introduced: the number of neighbors and the
The rest of the formulation of NI is the sameas BI,
distance at which a point finds another point with a
but with a,(x), n = 1, 2, . . . . N - 1. Becauseof the
higher value of NI. Let NZ(x)be the NI of point x, andy
inverse property of o, with respect to a,, the higher be the first neighbor of x verifying NZ@)> NZ(x). KRI
the value of NI, the closer the point to the kernel of x is calculatedas below:
(mode) of a cluster. Instead of K in BI, the smoothing
parameterof NI is a. But NI is less sensitive to the
slight changeof a. whereiU(x, v) = m ify is the rn neighborof x, and D(x,
In Gans method, the post-processing to form y) is the distancebetweenx andy. The factor NZ allows
clusters from BI was very rough. He proposedusing a us to recognizethe kernel of a mode; and M and D, the
relaxation process, which is, by our experienceswith importanceand the extensionof this mode to the whole
real log data sets, equivalentto a simple thresholdwith data set. The numberof neighbors,M, tendsto generate
a higher value of BI. It is, in our opinion, the origin of clustersof equivalentsize; and the distance,D, clusters
the three drawbacksdiscussedabove.We proposehere of equivalent volume. The combination of these two
5
6
SPWLA 415Annual Logging Symposium, June 4-7,200O
FIELD RESULTS AND COMPARISON WITH A 0 NMR clustering result and conventional logs: GR,
TWO-STEP METHOD DT, Density, Neutron-Density Separationand Pe
Four sets of log data recorded from very different (most independentof compaction)
geological environments (shaly sands in fluvial The results of thesethree models are displayed(facing
formation, stable platform carbonatesand evaporites, right in the electrofaciestracks: 5, 7 and 8) along with
evaporites, complex shorefacefacies with feldspathic the core descriptions(facing the left). The electrofacies
sandstoneand diageneticcarbonates)were usedto test (exceptNMR) were re-orderedand color encodedafter
the method in its early stage of development.They a quantitative contingency evaluation in order to ease
were chosen for the complexity and diversity of the their readingand comparisonwith core data.
patternsshownby the data structure. It is worth noting that in this particular case,using
Figure 2 shows the results of data set CARBl, NMR electrofacies,it is possibleto predict sedimentary
which contains 1979 points. Four logs are used for the facies with a confidence comparableto that attainable pp
clustering: RhoB, Nphi, GR, and DT. In this example, by using the complete set of conventional logs. In
we would like to show the choice of optimal numbers addition to providing new insights for the detection of
of clusters on the decreasingordered KRI curve, and very thin sandy alternationsin the shaly facies, the
the hierarchicalorganizationof MRGC. electrofaciesof NMR T2 distribution hasthe additional
The second example EVAPl, shown iu Figure 3, benefit of providing the users with a simple way to
contains2460 points. The same four logs are used for consistently define NMR zonation for calibration on
clustering. This example is selected to show the core, core sampling,or detailedfurther processing.
performanceof MRGC for the following two reasons. Without any a priori geological knowledge of the
First, the natural groups in this data set are well core and NMR processingresults, the model set up on
separatedin multi-dimensional log space. Our human the test well displays the trend of facies evolution seen
eyescan easily verify if MRGC really extractednatural on cores. The main groups of facies are recognizedin
data groups. Notice MRGC does an equally good job different reservoirsas shown by the color-coding.Due
for data structurescontainingnot well-separatedgroups to the compaction gradient it is difficult to perfectly
(see other examples).Second,this data set contains a recognizeeachsingle facies over such a depth interval.
very denseand compact,small zone that contains 1045 This is illustrated by the comparison of Neutron-
points (more than one-third of all data points), MRGC Density crossplot showing the Core Facies (Figure 4b
detectsit as one cluster (series4) while Gans method upper left) with that of NMR clustering results in the
recognizedit as many clusters. reservoir (Figure 4b lower left) that core facies are
The following two cases,shown iu Figures 4 and 5, spread over a large portion of the crossplots and
present two sets of results obtained on operational overlap significantly.
surveys.The MRGC results were obtainedon the first Case2: Silica-Clastic Reservoirwith Diagenetic
trial, with no apriori geological input. Carbonates(Fully Cored) in a Shallow Marine
Case 1: Turbiditic Silica-Clastic Reservoir(Fully Deposition Ei?vironment
Cored) in a Deep OflshoreField In this example (Figure 5), porosity evolution and
Their interpretationwas directed at setting up a model mineral composition of the reservoir are more
to recognize the 15 sedimentary facies, defined by informative of the sedimentary environment and
cores, and characterizethe architectureof the reservoir subsequentdiagenesis. However complex lithologies
through their vertical arrangement.In this type of (feldspars, sandstones, limestones, and dolomites
reservoir, texture and structure of the rocks, grain size without pure shale beds), the occurrenceof large gas
distribution, and grain sorting are consideredto be the effect, and dual porosity make recognition and
most discriminating information to differentiate the description of the sedimentary architecture difficult.
facies. Mineral composition is not discriminating; but, The MRGC method allows the analyst to take
in addition, to add to the complexity of the problem, in advantageof the complex and highly contrasteddata
such a shallow burial context, a small changein burial structure revealed by the crossplots and gives the
has large effect on compactionand log responseof any sedimentologistinformation he is unable to find in the
given facies. low dynamicGR.
The MRGC method allowed us to test different log The MRGC model was set up using GR, DT,
combinations.We present in Figure 4a the results of Density, and Neutron-Density Separation.The results
threemodelsset up on the test well by clustering: obtained in a single trial comparedvery well with the
l conventional logs: GR, DT, Density, Neutron- results obtained previously using a two-step method
Density Separationand Pe based on Dynamic Clustering and Ascending
l NMR T2 bin values Hierarchic Classification with a posteriori calibration
7
SPWLA 41 Annual Logging Symposium, June 4-7,200O
ACKNOWLEDGEMENTS
The authors thank the managementof Halliburton
Energy Services and Elf Exploration Production for
granting permission to publish this paper. They also
want to expresstheir gratitude to Daniel Vallat (HES),
who greatly helped in setting up this joint project and
Jean-PierreLeduc (EEP) for his enduring support and
constructivereviewing and testing.
Figure 2b: The hierarchical organizatmnof MRGC of data Flgwe 3: Ten clustersofdata set EVAPI (2460 points)
set CARBl shown on Nphi-RhoB and GR-RhoB crossplots detectedby MRGC, shown on Nphi-RboB, GR-RboB
@T is not show here becauseit is not very discriminant): and DT-RhoB crossplots. MRGC detected a very
the evolution of clusteringresults6om 2 clustersto 5. compact small zone: series 4 (containing 1045 points)
as one cluster while Gans method detectedit as many
ClUSterS.
SPWLA 41 Annual Logging Sym~ Gum, June 4-?, ZOOO
Figure 4a: This is a
depth display of
result.3of case I
obtained with MRGC
method applied to
different log
combination. They are
compared with core
data, displayed on the
lett hand half of each
electiofacies tracks 5,
7 and 8. Color coding
and patterns of
electrofacies and core
description are chosen
according to
geological PP
depositional
mechanism. The blue
encoding ofthe core
description has no
equivalent in the
electrofacies column,
it is a conglomeratic
shaly facies. From left
to right. Tracks 2 to 4
display the
conventional logs: GR,
NPHI, RHO, NMR
porosity, Neutron-
Density Separation,
PEF and Sonic. Track
5 displays the MRGC
clustering of
conventional logs
only: CR, RHO,
Neutron- Density
Separation, PEF and
Sonic. To obtain
details in the reservoir
formation 22
electrofacies was
found to be the
optimal log partition.
In a final stage ofthe
wvey some ofthem
were merged to fit the
core description. Track
5 displays the NMR
r2 distribution. Track
7 display the result of
MRGC applied to the
R bins, a 29.cluster
xutition was retained
o better show the
letails. Track 8
lisplays the result of
MRGC clustering of
:onventional logs plus
he previous I2 bin
:lassitication.
SPWLA 41 Annual Logging Symposium, June 4-7,200O
---
TNPH-1 (VA) ECGR
o- -16 o- - 16
Core Fades Core Facies
- --
TNPH-1 (V/V) ECGR
o- -16 0- - 16
NMR + Conventional Logs Electrofacies NMR + Conventional Logs Electrofacies
d;odoooodoood
TNPH -~1 (V/VI I
o- -29
NMR Electrofacies
Figure 4b: The crossplots of Figure 4a. The two uppermost crossplots (Density/Neutron and NDS / GR) are color coded
for core description. The two crossplots in the center are color coded for Electrofacies obtained by applying MRGC to
the NMR clustering results combined with GR, NDS, Pe. Rho and Sonic. Color-coding was made aAer a quantitative
survey of correspondence between clectrofacies and core description. The lower crossplot is color coded for NMR
clustering results. only reservoir facies are displayed for convenience. The figure on the lower right is the display of a
contingency table (best reservoir facies are in the center of the plot (brown, violet, rzd and blue). Shaly facies are on the
let? of the table, coded in deep blue, dark gray and magenta; Electrofacies give more detail than seen by
sedimentologists. Shaly conglomerate are found in the right par7of the table.
12
SPWLA 41 Annual Logging Symposium, June 4-7,ZOOO
PP
Figure 5x This figure is displayed ihe result of MRGC of Case 2 using GR, RIIO. Neutron-Density Separation and
Sonic. From lefl to righl are shown: .Track I to 4 the logs and the depth. Track 5 displays three curves in histogram
mode, they correspond to the 3 optimal partitions proposed by the MRCC method for B number of facies ranging from IO
to 25, the partition in 22 electrofacies was preferred. Track 6 to 8 display the electrofacies color and pattern encoding
between the Density and Rdeep log (Track 6). the Neutron and the sonic log (Track 7) and the coding for depositional
environment interpreted from cores and MRClC clustering (Track 8). Tracks 9 and 10 display the comparison of MRGC
clustering (facing the right) and a conventional two-step clustering (facing the I&). Track Ii displays the color-coding
used on cross~lots of Figure 5b.
13
SPWLA 41 Annual Logging Symposium, June 4-7,ZOOO
Figure 5b: The color-coding (identical to track 1 I of Figure 5a) on the Neutron/Density crossplot and a
GR/Neunon-Density Separation displays the results of the MRGC model set up on the well of Case 2
l
14