10 1109@icsssm 2016 7538562

A Framework of Data Mining
for Logging Reservoir Evaluation

Yili Ren
Yiting Ren
Institute of Computer Application Technology

Research Institute of Petroleum Exploration & Development
Beijing, China
renyili@petrochina.com.cn
School of Information Engineering

China University of Geosciences (Beijing)
Beijing, China
renyiting2008@163.com
Abstract-Data mining provides an automatic and effective way

for reservoir evaluation based on well logging interpretation. In
this paper, we propose a framework of data mining for logging
reservoir evaluation and verify its performance on two well
logging dataset. Experimental results show that the conclusions
of well-logging interpretation by the framework proposed in this
paper are consistent with the test oil results. Compared with
traditional
evaluation
methods,
it
is
efficient
and
not
so
dependent on expertise. This framework provides a reference for

the construction of big data platform for oil exploration and
development.
Keywords-reservoir
interpretation; big data
evaluation;
I.
data
mining; well logging
In recent years, China National Petroleum Corporation

(CNPC) pays a lot of attention to big data, especially in oil
exploration and development area. As the inner IT department
of CNPC, we devote ourselves to the application research of
big data in oil exploration and development area, especially the
big data processing platform which integrates effective data
mining algorithms with distributed processing framework, such
as Hadoop.
In this paper, we summarize these data mining algorithms
which are commonly used in reservoir well logging evaluation
and give a framework of data mining for reservoir evaluation
based on well logging interpretation.
II.
INTRODUCTION
RESERVOIR EVALUATION BASED ON WELL LOGGING

INTERPRETATlON
The basic idea of reservoir evaluation is taking full

advantage of various materials including well logging data to
evaluate reservoir properties, such as oil bearing evaluation,
layer classification, reservoir parameters prediction, capacity
estimation etc. With the steady accumulation of logging data,
one of the most important problems for oil exploration and
development is that how to fully mining and analysis
geological data. Data mining provides a new effective method.
Oil reservoir is these rocks with connected pores, it can not

only reserve oil and gas, but also make oil and gas permeate
and flow under certain pressure difference.
Data mining is regarded as one of the top 10 techniques

which are used for meeting challenges in oil exploration and
development. It is successfully used in reservoir real-time
dynamic monitoring [I], best practice recognition of petroleum
engineering work [2], reservoir description [3] etc. It has been
found that these nonlinear methods perform better than linear
statistical analysis technologies which are widely used in oil
exploration and development.
A.
As to well logging interpretation, one of advantages of data

mining is that professional knowledge, such as rock physics
mechanism, is not needed in advance. It only needs a small
amount of background knowledge and constructs model
directly from geological data. The main tasks of data mining
are predicting, association analysis, clustering analysis and
anomaly detection. Predicting is the most closely related task to
reservoir evaluation. It is widely used for lithologic
identification, sedimentary facies division, reservoir parameters
prediction and oil, gas or water layer identification etc.
Reservoir evaluation technology combines seismic,

geologic and logging data to track and predict reservoir
distribution, thickness, changes of lithology and physical
property.
Reservoir Parameters
The main reservoir parameters are porosity, permeability

and saturability.
1)
Porosity
Because of the forming process and the latter interaction,
there are cracks or gaps among rock solid particles. It is called
rock pores. Porosity is the volume ratio of pore to rock.
<pp
Vp
-
Vb
100%
(1)
Here, CPP is absolute porosity. Vp is the volume of all rock

pores. Vb is the volume of rock.
<PT
VT
Vb
100%
(2)
Effective porosity is defined as <PT, where VT is the volume

of effective rock pores.
2)
Permeability
978-1-5090-2842-9/16/$31.00 Q016 IE
The fluid in porous media flows under the condition of a

certain pressure difference. Permeability is defined as the
difficulty level of this flow. It is described by Darcy-Weisbach
formula.
Q
KP
Saturability
Reservoir's property of oil-bearing is described by
saturability. Saturability is defined as the
volume ratio of oil
bearing rock pores to all rock pores.

so =
Va
Vp
100%
(4)
Vp is the volume of all rock pores, and Va is the volume of

oil-bearing rock pores.
B.
Well Logging Interpretation

Geophysical well logging is a method which uses
physical principles to solve geophysical problems. It is used

for predicting geophysical parameters by using reservoir
electrochemical
properties,
conduction
characteristics,
acoustic characteristic, radioactivity etc. It has two main

tasks, data collection and well logging interpretation.
Original data collected by well logging usually includes

spontaneous potential curve (SP), gamma ray curve (GR),
resistivity curve (RT, RI, RXO), density log (DEN), acoustic
logging (AC), compensated neutron log (CNL) etc.
Common methods for measuring porosity are acoustic
compensated neutron log (CNL) and density log

(DEN). There are plenty of methods to measure saturability,
the most common ones are resistivity curve (RT, RI, RXO),
compensated neutron log (CNL) etc. Permeability is relatively
difficult to measure, common methods include resistivity curve
(RT, RI, RXO), porosity and irreducible water saturation
estimation, gamma ray curve (GR) etc.
logging (AC),
ermeability
porosity
Ill.
(3)
ilL
Q is the quantity of flow under the pressure difference P.

L is the thickness of the fluid. 11 is viscosity of the fluid. A is
the volume of rock's cross section. K is permeability.
3)
becomes more and more abundant. But the interpretation of the

existing model often becomes a bottleneck of reservoir
evaluation. So it is pressing to fmd more effective methods to
mine more information from well logging data.
saturability
EVALUATION
With the oil fields have entered the middle and later stage
of development, unconventional reservoir gradually becomes
the main object of oil exploration and development. Complex
evaluation problems like low porosity, low permeability, water
flooded layer etc. become more and more popular. Traditional
methods like cross plot, multiple discriminant analysis, can't
solve the evaluation task of these complex reservoirs. Data
mining technique becomes more and more popular in reservoir
evaluation.
Predictive modeling method is most closely related with
reservoir evaluation. It uses training data with class labels to
construct model and find an objective function y = [(x ) to
predict. It is classification when y is discrete and regression
when y is continuous. Common predictive methods include
Multiple Linear Regression, Decision Tree, Artificial Neural
Network, Naive Bayes, Support Vector Machine etc.
A.
relationship between wel1logging curves and reservoir parameters
The main task of well logging interpretation is oil-bearing

analysis, lithology identification and sedimentary facies
division using these well logging curves data.
With well logging technology developing to a higher level,
the geological information reflected by well logging data
Multiple Linear Regression
Multiple Linear Regression uses Least-Squares method to

construct a function of y with respect to n parameters
(xv xz, ... , xn)
(5)
Multiple Linear Regression is a regression model and its

result y is a continuous variable. It is classification problem
when y is a dummy variable. We can convert this formula to
the following form.
This is Logistic Regression Analysis. Yi represents the

probability. If f30 + f31Xl + f3zxz + ... + f3nxn > 0, we classify
this item as li . If f30 + f31X1 + f3zxz + ... + f3nxn < 0 , we
don't classify this item as li.
Multiple Linear Regression is usually used in reservoir
parameters prediction, such as porosity, saturation,
permeability. As we can see, MRA can indicate the order of
dependence between y and (xv xz, ... , xn) respectively. So
MRA can serve as a pioneering dimension-reduction tool in
geographic data mining. In fact, multiple linear regression is
usually used for dimension reduction in well logging
interpretation, together with cross plot.
B.
Figure I.
MODELING METHODS FOR LOGGING RESERVOIR
Decision Tree
Decision Tree has been applied gradually in the natural and

social sciences since the 1990s and widely applied in this 21st
century. TD3 (Quinlan, 1986) and C4.5 (Quinlan, 1993) are two
typical algorithms. In recent years, decision tree application to
geosciences has occurred to some extent.
Attribute selection is the key point for decision tree. C4.5
uses Gain Ratio instead of Information Gain in order to avoid
these attributes with massive output test conditions. CART
uses Gini and restricts test condition to be binary. CHAID uses

statistical test to find the best point of demarcation and
generates multi branches tree. For every leaf node of LMT
algorithm, it is a logistic regression model.
X2
C.
Artificial Neural Network
Neural Network is a parallel and distributed network

structure of information processing. Multilayer Feedforward
Neural Network and Error Back Propagation (BP) are two
typical algorithms. Radical Basis Function Neural Network is
another widely used forward neural network with radial basis
function as activation function. Because of abundant
hypothesis space and high degree of sensitivity to train data
noise, ANN is quite reliable on network structure, number of
node points and model parameters.
ANN's performance is not stable on geological data, but it
is an alternative offer, especially it has a relatively easy
programmmg.
D.
Bayesian Classification
Bayesian classification is a classification approach based on

statistics. The typical algorithms are Naive Bayesian
Classification, Bayesian Discrimination and Successive
Bayesian Discrimination.
Tn recent years, Naive Bayesian Classification has been
applied to geosciences to some extent, whereas Bayesian
Discrimination and Stepwise Bayesian Discrimination are
applied to geosciences relatively more widely.
.
Support Vector Machine
Support Vector Machine was established on the basis of

structural risk minimization principle and VC dimension. It
constructs classification model by fmding the optimal
classification interface. For nonlinear problem, SVM uses
kernel function to transfer input space into a higher
dimensional space and find the optimal classification interface
in this high dimensional space. C-SVM and v-SVM are two
typical classification model.
The performance of SVM is greatly affected by penalty
factor C, kernel function and its parameters. It is quite
necessary to optimize model parameters while using SVM.
SVM can also be used for regression. For example, Least
Square Support Vector Machine (LSSVM) [4] is a typical
algorithm. The difference between SVM and LSSVM is that
LSSVM changes inequality constraint to equality constraint.
The problem is transformed to the following optimization.
LSSVM can be used for predicting reservoir parameters,

such as porosity, saturability and permeability.
Because of the complexity of geological rules, reservoir
logging data has nonlinear relationship in the vast majority of
cases. In general, SVM is recommended when nonlinearity is
strong. And Artificial Neural Network, Decision Tree or
Bayesian Classification are recommended when nonlinearity is
not so strong.
TV. A FRAMEWORK OF DATA MINING FOR LOGGING
RESERVOIR EVALUATION
The main task of reservoir evaluation is using original data

collected by well logging to evaluate its properties, such as
porosity, oil saturability, lithological characters etc. and fmally
give a comprehensive evaluation.
Original data collected by well logging mainly includes
well logging curves, such as potential curve (SP), gamma ray
curve (GR), resistivity curve (RT, Rl, RXO), density log
(DEN), acoustic logging (Ae), compensated neutron log (CNL)
etc. Besides original well logging data, we also collect
experimental data of critical wells. For critical wells, we not
only conduct a comprehensive logging to collect original data,
but also construct experiments on rock samples collected from
cored wells. We obtain lithology, oil bearing properties by
analyzing the experimental data, such as porosity, permeability
etc. For non-critical well, we only have well logging data, such
as SP, GR, RT, Rl, RXO, AC, since it is quite expensive to
core from cored wells. We can't get experimental data for non
critical well since there are no rock samples.
Traditional procedure for logging reservoir evaluation is
shown below.
SimplifY the complexity of the strata and presume that

it is homogeneous and isotropic physical model.
Construct mathematical model Forg with undetermined

parameters according to physical theory of fluid and
solid.
Calculate the undetermined parameters of Forg using

data from critical wells which both have original data
and experimental data. Here, mathematical model Forg
with determined parameters becomes well logging
interpretation model FDecision-
Objective function:
. f()
w
mm
IIwl12
2- +
l N
2
Y"2 L.. k =l ek
(7)
Constraint condition:
T
Ydw <P(Xk)
b]
1- ek'
1,
. . .
,N
(8)
We get the following regression model by solving the

equation.
Set the threshold value and grade division standard

for porosity, oil saturability and lithological
characters.
Dorg
Construct data analysis on non-critical well data by

using well logging interpretation model FDecision and
use Dorg to do comprehensive assessment.
Data mining provides a new effective method. For logging

reservoir evaluation, parameters calculation, such as porosity,
permeability and saturability, corresponds to regression task in
data mmmg. And oil layer identification, lithology
identification and sedimentary facies division corresponds to
classification task in data mining.
A.
Reservoir Parameters Prediction
B.
As the foregoing, traditional method for measuring porosity,

saturability and permeability is constructing a mathematical
model Forg with predetermined independent variables. For
example, the independent variables of porosity are usually AC,
CNL and DEN. Model parameters of Forg are calculated by
training coring data. This is quite dependent on expert
experience since the final comprehensive assessment will be
given by professional experts.
As already mentioned, regression algorithm can be used to
predict reservoir parameters, such as multiple linear regression,
Least Square Support Vector Machine. The advantage of this
method is that it doesn't need to predetermine the independent
variables, but find the relationship between reservoir
parameters and well logging curves by learning training data
collected form the cored wells. It is a data driven method.
Data Mining/or Reservoir Logging Evaluation
After computing reservoir parameters, oil exploration and

development experts give the fmal evaluation conclusion
through taking into account both computing results and
professional knowledge. This is quite dependent on expert
expenence.
Data mining provides an automatic and data-driven method
for reservoir logging evaluation. The framework is shown as
follows.
: ----------------------------------- ri da;-:
:
experimental data fl'Om
:
:
logging curves: SP, GR, AC
:
I
I
DEN, CNL, RT, RL RXO.
cOl'ed wells: porosity,
_________________________________________
The procedure of porosity, permeability and saturability

interpretation are as follows.
I
I
evaluation
conclusion
regression model:
MRA. LSSV'v!
porosity interpretation model

Figure 2.
logging curves of non

critical wells: SP, GR. AC,
porosity interpretation framework
DEN, CNL RT. RJ. RXo.
Figure 5,
logging curves:
AC, IlEt--, CNL, RT RI, RXO,.,
Tn this framework, train data is comprised of logging curves

and experimental analyzing data collected from cored wells.
The biggest difference between this framework and traditional
method is that it is automatic and not so dependent on expertise.
It can be used as expert recommendation system of oil
exploration and development.
porosity
I
I
I
regression model:
permeability
Figure 3,
'v!RA, LSSVM
interpretation model
permeability interpretation framework
logging curves:
SP, GR,
AC, IlEt--, CNL. RT, RI, RXO,,,
porosity
permeability
I
I
regression model:
saturability
Figure 4,
A framework of data mining for logging reservoir evaluation
SP. GR,
'v!RA, LSSVM
interpretation model
saturability interpretation framework
There are some key points for this framework.

1)
Feature Selection
Since the dimension of input data is much higher than
traditional method, it may lead to overfitting. It is necessary to
do feature selection while modelling. Common feature
selection methods for well logging interpretation are crossplot
technique, correlation analysis or expert experiences.
In the process of data mining, there are 3 kinds of working
patterns between feature selection and classification, embedded
approach, filter approach and wrapper approach. Feature
selection appears as a part of the data mining algorithm for
embedded approach. Data mining algorithm is considered as a
black box for wrapper approach. Filter approach is the method
which is independent of classifier, such as Correlation-based
Feature Selection (CFS) [5] and Las Vegas Filter (LVF) [6].
The evaluation standards for CFS are that the correlation
between feature and class labels is high while the redundancy
between different features is low. LVF evaluate features by

computing the consistency of feature sunset.
2)
Cs
for
CE [C1,C2]
(e,G')
parameters
Gs
and
for
GE [G1,G2].
We use each pair of
for modeling and choose the best one which
performs best as the fmal parameters.
Model Optimization
The performance of different algorithms depends largely on
the setting of model parameters. So it is necessary to do model

optimization. In most cases, it can only rely on experience or
contrast experiment to set parameters. There are also few
researches on model parameters optimization. An effective
method is the combination of k-Cross Validation and Grid
Search.
3)
Model Evaluation
Since it is quite expensive to core by drilling, there are few

data which can be used for train data. Cross validation should
be used for model evaluation. Tn addition to the evaluation of
classifier, statistical test, such as X2 test and F test, is necessary,

because statistical model is used for predicting reservoir
parameters.
k-Cross Validation
V.
Training dataset will be split into k separate files of equal
Tn this section,
EXPERIMENT AND ANALYSIS
we compare the performance of
the
size. k-l is selected as training dataset and the rest as validation
framework which is mentioned before with traditional reservoir
set. Modeling process will be repeated k times and choose the
logging evaluation methods on two well logging data.
average value of MSE after k iterations to estimate expected

generalization error.
Training dataset is composed of logging curves and data of

core wells derived from tight sandstones of Jianghan Basin in
Grid Search
central China. Attributes include GR, KTH, RS, DEN, AC,
Grid search is a practical method of searching for data. It is

quite suitable for searching the multidimensional data from
different growing directions. Take an example to illustrate the
principle, when we choose RBF kernel function for SVM, we
CNL, porosity etc. Class label is oil saturation. Since the

calculation process of traditional reservoir logging evaluation
methods is more complicated and dependent on expertise, the
result is given in advance.
should confirm two parameters, penalty parameter C and

kernel function parameter
G.
Grid Search choose change step

TABLE I.
NO.1
1
2
Well Section
(m)
3294.903448.72
3449.503456.18
Experimental tools include Datahoop, Matlab, Weka and

our own data mining platform.
EXPERIMENTAL DATA DEMONSTRAnON
TH
KTH
GR
CAL
RS
RD
CNL
D EN
AC
Vsh
porosity
conclusion
25.385
27.007
9.449
1235.583
1492.l95
0.028
2.516
52.364
0.107
0.015
Oil layer
5.5
9.558
12.264
6.521
132.043
527.616
0.039
2.652
53.739
0.011
0.032
Water
layer
20
8.948
3.436
5.832
236.188
1120.818
0.009
2.694
49.408
0.004
0.008
Dry layer
10
4.51
4.51
6.438
173.395
369.639
0.011
2.682
50.562
0.000
0.015
Oil layer
...
61
62
A.
3499.363602.70
3603.883632.80
Reservoir Parameters Prediction

0.9
For the first dataset, regression task was constructed by
0.8
MRA since it is a linear problem in general. And LSSVM was
0.7
used on the second dataset because of the nonlinearity.
We use F-test to validate the validity of equation and t-test

to validate the validity of the equation coefficient. For the fmal
erpertlse
0.6
.UBSVM
0.5
Decision Tree
... 0.4
model after correction, the P-value of F-test on the first dataset
0.3
was 4.645e-08, and the P-value of F-test on the second dataset
0.2
was 6.996e-07. This indicated that the regression model is
0.1
effective.
NaIve Bayes
o
datal
B.
Well Log Interpretation
Figure 6.
We use LlBSVM, Decision Tree, NaIve Bayes, Artificial

Neural Network (ANN) to train the prediction model and use
IO-fold cross-validation to verify the model. The evaluation
results are shown in the Figure 6.
data2
evaluation results comparison
From the experimental results it can be seen that the

conclusions of well-logging interpretation by the framework
proposed in this paper are consistent with the test oil results.
Although it doesn't perform much better than traditional
method
based
on
expertise,
this
framework
provides
an
effective and automatic way to construct reservoir evaluation
well logging interpretation. This provides a reference for the
based on well logging interpretation.
construction of big data platform.
Reservoir evaluation of new district block and new layer

always face new problems,
different prediction modeling
methods are difficult to analyze the merits and demerits in
REFERENCES
[I]
C. Alimonti and G. Falcone, Knowledge discovery in databases and

multiphase flow metering: the integration of statistics, data mining,
neural networks, fuzzy logic, and Adhoc flow measurements towards
well monitoring and diagnosis, SPE Annual Technical Conference and
Exhibition,2002,Texas
[2]
S. D. Mohaghegh, A new methodology for the identification of best

practices in the oil and gas industry, using intelligent systems,Journal of
Petroleum Science and Engineering,vol. 49,pp. 239-260,2005
[3]
M. Nikravesh, Soft computing-based computational intelligent for

reservoir characterization, Expert Systems With Applications, vol. 26,
pp. 19-38,2004
[4]
J. A. K. Suykens, J. Vandewalle, Leastsquares Support Vector Machine

Classifiers,Neural Processing Letters,9(3): 293-300,1999
[5]
M. A. Hall, Correlation-based feature selection for machine learning.

Hamilton, New Zealand: University of Waikato, Department of
Computer Science,1999
[6]
H. Liu and R. Setiono, A probabilistic approach to feature selection-a

filter solution. Proceedings of the 13th International Conference on
Machine Learning, San Francisco: Morgan Kaufmann, pp. 319-327,
1996
[7]
Xinghe Yu, Base of oil and gas reservoir geology, Petroleum Industry
Press
theory, their applicability can only be determined through

experiments.
VI. CONCLUSION
Data mining provides a variety of modeling methods which
expand the reservoir evaluation method, so that the analysis is
not only to make predictions, but also to discover knowledge.
Especially faced with the challenges of big data, it becomes a
big problem to give a quick solution while facing new
problems.
In order to improve the efficiency, it is recommended to
develop the data mining software which is integrated with the
existing logging interpretation and evaluation software, and
constantly enrich the algorithm library, so as to give the best
solution to the problem. In this paper, we put forward a
framework of data mining for reservoir evaluation based on

10 1109@icsssm 2016 7538562

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

10 1109@icsssm 2016 7538562

Transféré par

Droits d'auteur :

Formats disponibles

A Framework of Data Mining

for Logging Reservoir Evaluation

Institute of Computer Application Technology

School of Information Engineering

Abstract-Data mining provides an automatic and effective way

dependent on expertise. This framework provides a reference for

mining; well logging

In recent years, China National Petroleum Corporation

RESERVOIR EVALUATION BASED ON WELL LOGGING

The basic idea of reservoir evaluation is taking full

Oil reservoir is these rocks with connected pores, it can not

Data mining is regarded as one of the top 10 techniques

As to well logging interpretation, one of advantages of data

Reservoir evaluation technology combines seismic,

The main reservoir parameters are porosity, permeability

Here, CPP is absolute porosity. Vp is the volume of all rock

Effective porosity is defined as <PT, where VT is the volume

The fluid in porous media flows under the condition of a

saturability. Saturability is defined as the

volume ratio of oil

bearing rock pores to all rock pores.

Vp is the volume of all rock pores, and Va is the volume of

Well Logging Interpretation

physical principles to solve geophysical problems. It is used

acoustic characteristic, radioactivity etc. It has two main

Original data collected by well logging usually includes

compensated neutron log (CNL) and density log

Q is the quantity of flow under the pressure difference P.

becomes more and more abundant. But the interpretation of the

relationship between wel1logging curves and reservoir parameters

The main task of well logging interpretation is oil-bearing

Multiple Linear Regression

Multiple Linear Regression uses Least-Squares method to

(xv xz, ... , xn)

Multiple Linear Regression is a regression model and its

This is Logistic Regression Analysis. Yi represents the

MODELING METHODS FOR LOGGING RESERVOIR

Decision Tree has been applied gradually in the natural and

uses Gini and restricts test condition to be binary. CHAID uses

Artificial Neural Network

Neural Network is a parallel and distributed network

Bayesian classification is a classification approach based on

Support Vector Machine

Support Vector Machine was established on the basis of

LSSVM can be used for predicting reservoir parameters,

The main task of reservoir evaluation is using original data

SimplifY the complexity of the strata and presume that

Construct mathematical model Forg with undetermined

Calculate the undetermined parameters of Forg using

We get the following regression model by solving the

Set the threshold value and grade division standard

Construct data analysis on non-critical well data by

Data mining provides a new effective method. For logging

Reservoir Parameters Prediction

As the foregoing, traditional method for measuring porosity,

Data Mining/or Reservoir Logging Evaluation

After computing reservoir parameters, oil exploration and

DEN, CNL, RT, RL RXO.

cOl'ed wells: porosity,

The procedure of porosity, permeability and saturability

porosity interpretation model

logging curves of non

porosity interpretation framework