Vous êtes sur la page 1sur 6

A Framework of Data Mining

for Logging Reservoir Evaluation


Yili Ren

Yiting Ren

Institute of Computer Application Technology


Research Institute of Petroleum Exploration & Development
Beijing, China
renyili@petrochina.com.cn

School of Information Engineering


China University of Geosciences (Beijing)
Beijing, China
renyiting2008@163.com

Abstract-Data mining provides an automatic and effective way


for reservoir evaluation based on well logging interpretation. In
this paper, we propose a framework of data mining for logging
reservoir evaluation and verify its performance on two well
logging dataset. Experimental results show that the conclusions
of well-logging interpretation by the framework proposed in this
paper are consistent with the test oil results. Compared with
traditional

evaluation

methods,

it

is

efficient

and

not

so

dependent on expertise. This framework provides a reference for


the construction of big data platform for oil exploration and
development.

Keywords-reservoir
interpretation; big data

evaluation;

I.

data

mining; well logging

In recent years, China National Petroleum Corporation


(CNPC) pays a lot of attention to big data, especially in oil
exploration and development area. As the inner IT department
of CNPC, we devote ourselves to the application research of
big data in oil exploration and development area, especially the
big data processing platform which integrates effective data
mining algorithms with distributed processing framework, such
as Hadoop.
In this paper, we summarize these data mining algorithms
which are commonly used in reservoir well logging evaluation
and give a framework of data mining for reservoir evaluation
based on well logging interpretation.
II.

INTRODUCTION

RESERVOIR EVALUATION BASED ON WELL LOGGING


INTERPRETATlON

The basic idea of reservoir evaluation is taking full


advantage of various materials including well logging data to
evaluate reservoir properties, such as oil bearing evaluation,
layer classification, reservoir parameters prediction, capacity
estimation etc. With the steady accumulation of logging data,
one of the most important problems for oil exploration and
development is that how to fully mining and analysis
geological data. Data mining provides a new effective method.

Oil reservoir is these rocks with connected pores, it can not


only reserve oil and gas, but also make oil and gas permeate
and flow under certain pressure difference.

Data mining is regarded as one of the top 10 techniques


which are used for meeting challenges in oil exploration and
development. It is successfully used in reservoir real-time
dynamic monitoring [I], best practice recognition of petroleum
engineering work [2], reservoir description [3] etc. It has been
found that these nonlinear methods perform better than linear
statistical analysis technologies which are widely used in oil
exploration and development.

A.

As to well logging interpretation, one of advantages of data


mining is that professional knowledge, such as rock physics
mechanism, is not needed in advance. It only needs a small
amount of background knowledge and constructs model
directly from geological data. The main tasks of data mining
are predicting, association analysis, clustering analysis and
anomaly detection. Predicting is the most closely related task to
reservoir evaluation. It is widely used for lithologic
identification, sedimentary facies division, reservoir parameters
prediction and oil, gas or water layer identification etc.

Reservoir evaluation technology combines seismic,


geologic and logging data to track and predict reservoir
distribution, thickness, changes of lithology and physical
property.
Reservoir Parameters

The main reservoir parameters are porosity, permeability


and saturability.
1)

Porosity
Because of the forming process and the latter interaction,
there are cracks or gaps among rock solid particles. It is called
rock pores. Porosity is the volume ratio of pore to rock.

<pp

Vp
-

Vb

100%

(1)

Here, CPP is absolute porosity. Vp is the volume of all rock


pores. Vb is the volume of rock.

<PT

VT

Vb

100%

(2)

Effective porosity is defined as <PT, where VT is the volume


of effective rock pores.
2)

Permeability

978-1-5090-2842-9/16/$31.00 Q016 IE

The fluid in porous media flows under the condition of a


certain pressure difference. Permeability is defined as the
difficulty level of this flow. It is described by Darcy-Weisbach
formula.
Q

KP

Saturability
Reservoir's property of oil-bearing is described by

saturability. Saturability is defined as the

volume ratio of oil

bearing rock pores to all rock pores.


so =

Va
Vp

100%

(4)

Vp is the volume of all rock pores, and Va is the volume of


oil-bearing rock pores.
B.

Well Logging Interpretation


Geophysical well logging is a method which uses

physical principles to solve geophysical problems. It is used


for predicting geophysical parameters by using reservoir
electrochemical

properties,

conduction

characteristics,

acoustic characteristic, radioactivity etc. It has two main


tasks, data collection and well logging interpretation.

Original data collected by well logging usually includes


spontaneous potential curve (SP), gamma ray curve (GR),
resistivity curve (RT, RI, RXO), density log (DEN), acoustic
logging (AC), compensated neutron log (CNL) etc.
Common methods for measuring porosity are acoustic

compensated neutron log (CNL) and density log


(DEN). There are plenty of methods to measure saturability,
the most common ones are resistivity curve (RT, RI, RXO),
compensated neutron log (CNL) etc. Permeability is relatively
difficult to measure, common methods include resistivity curve
(RT, RI, RXO), porosity and irreducible water saturation
estimation, gamma ray curve (GR) etc.
logging (AC),

ermeability
porosity

Ill.

(3)

ilL

Q is the quantity of flow under the pressure difference P.


L is the thickness of the fluid. 11 is viscosity of the fluid. A is
the volume of rock's cross section. K is permeability.
3)

becomes more and more abundant. But the interpretation of the


existing model often becomes a bottleneck of reservoir
evaluation. So it is pressing to fmd more effective methods to
mine more information from well logging data.

saturability

EVALUATION

With the oil fields have entered the middle and later stage
of development, unconventional reservoir gradually becomes
the main object of oil exploration and development. Complex
evaluation problems like low porosity, low permeability, water
flooded layer etc. become more and more popular. Traditional
methods like cross plot, multiple discriminant analysis, can't
solve the evaluation task of these complex reservoirs. Data
mining technique becomes more and more popular in reservoir
evaluation.
Predictive modeling method is most closely related with
reservoir evaluation. It uses training data with class labels to
construct model and find an objective function y = [(x ) to
predict. It is classification when y is discrete and regression
when y is continuous. Common predictive methods include
Multiple Linear Regression, Decision Tree, Artificial Neural
Network, Naive Bayes, Support Vector Machine etc.
A.

relationship between wel1logging curves and reservoir parameters

The main task of well logging interpretation is oil-bearing


analysis, lithology identification and sedimentary facies
division using these well logging curves data.
With well logging technology developing to a higher level,
the geological information reflected by well logging data

Multiple Linear Regression

Multiple Linear Regression uses Least-Squares method to


construct a function of y with respect to n parameters

(xv xz, ... , xn)

(5)

Multiple Linear Regression is a regression model and its


result y is a continuous variable. It is classification problem
when y is a dummy variable. We can convert this formula to
the following form.

This is Logistic Regression Analysis. Yi represents the


probability. If f30 + f31Xl + f3zxz + ... + f3nxn > 0, we classify
this item as li . If f30 + f31X1 + f3zxz + ... + f3nxn < 0 , we
don't classify this item as li.
Multiple Linear Regression is usually used in reservoir
parameters prediction, such as porosity, saturation,
permeability. As we can see, MRA can indicate the order of
dependence between y and (xv xz, ... , xn) respectively. So
MRA can serve as a pioneering dimension-reduction tool in
geographic data mining. In fact, multiple linear regression is
usually used for dimension reduction in well logging
interpretation, together with cross plot.
B.

Figure I.

MODELING METHODS FOR LOGGING RESERVOIR

Decision Tree

Decision Tree has been applied gradually in the natural and


social sciences since the 1990s and widely applied in this 21st
century. TD3 (Quinlan, 1986) and C4.5 (Quinlan, 1993) are two
typical algorithms. In recent years, decision tree application to
geosciences has occurred to some extent.
Attribute selection is the key point for decision tree. C4.5
uses Gain Ratio instead of Information Gain in order to avoid
these attributes with massive output test conditions. CART

uses Gini and restricts test condition to be binary. CHAID uses


statistical test to find the best point of demarcation and
generates multi branches tree. For every leaf node of LMT
algorithm, it is a logistic regression model.

X2

C.

Artificial Neural Network

Neural Network is a parallel and distributed network


structure of information processing. Multilayer Feedforward
Neural Network and Error Back Propagation (BP) are two
typical algorithms. Radical Basis Function Neural Network is
another widely used forward neural network with radial basis
function as activation function. Because of abundant
hypothesis space and high degree of sensitivity to train data
noise, ANN is quite reliable on network structure, number of
node points and model parameters.
ANN's performance is not stable on geological data, but it
is an alternative offer, especially it has a relatively easy
programmmg.
D.

Bayesian Classification

Bayesian classification is a classification approach based on


statistics. The typical algorithms are Naive Bayesian
Classification, Bayesian Discrimination and Successive
Bayesian Discrimination.
Tn recent years, Naive Bayesian Classification has been
applied to geosciences to some extent, whereas Bayesian
Discrimination and Stepwise Bayesian Discrimination are
applied to geosciences relatively more widely.
.

Support Vector Machine

Support Vector Machine was established on the basis of


structural risk minimization principle and VC dimension. It
constructs classification model by fmding the optimal
classification interface. For nonlinear problem, SVM uses
kernel function to transfer input space into a higher
dimensional space and find the optimal classification interface
in this high dimensional space. C-SVM and v-SVM are two
typical classification model.
The performance of SVM is greatly affected by penalty
factor C, kernel function and its parameters. It is quite
necessary to optimize model parameters while using SVM.
SVM can also be used for regression. For example, Least
Square Support Vector Machine (LSSVM) [4] is a typical
algorithm. The difference between SVM and LSSVM is that
LSSVM changes inequality constraint to equality constraint.
The problem is transformed to the following optimization.

LSSVM can be used for predicting reservoir parameters,


such as porosity, saturability and permeability.
Because of the complexity of geological rules, reservoir
logging data has nonlinear relationship in the vast majority of
cases. In general, SVM is recommended when nonlinearity is
strong. And Artificial Neural Network, Decision Tree or
Bayesian Classification are recommended when nonlinearity is
not so strong.
TV. A FRAMEWORK OF DATA MINING FOR LOGGING
RESERVOIR EVALUATION

The main task of reservoir evaluation is using original data


collected by well logging to evaluate its properties, such as
porosity, oil saturability, lithological characters etc. and fmally
give a comprehensive evaluation.
Original data collected by well logging mainly includes
well logging curves, such as potential curve (SP), gamma ray
curve (GR), resistivity curve (RT, Rl, RXO), density log
(DEN), acoustic logging (Ae), compensated neutron log (CNL)
etc. Besides original well logging data, we also collect
experimental data of critical wells. For critical wells, we not
only conduct a comprehensive logging to collect original data,
but also construct experiments on rock samples collected from
cored wells. We obtain lithology, oil bearing properties by
analyzing the experimental data, such as porosity, permeability
etc. For non-critical well, we only have well logging data, such
as SP, GR, RT, Rl, RXO, AC, since it is quite expensive to
core from cored wells. We can't get experimental data for non
critical well since there are no rock samples.
Traditional procedure for logging reservoir evaluation is
shown below.

SimplifY the complexity of the strata and presume that


it is homogeneous and isotropic physical model.

Construct mathematical model Forg with undetermined


parameters according to physical theory of fluid and
solid.

Calculate the undetermined parameters of Forg using


data from critical wells which both have original data
and experimental data. Here, mathematical model Forg
with determined parameters becomes well logging
interpretation model FDecision-

Objective function:

. f()
w

mm

IIwl12
2- +

l N
2
Y"2 L.. k =l ek

(7)

Constraint condition:

T
Ydw <P(Xk)

b]

1- ek'

1,

. . .

,N

(8)

We get the following regression model by solving the


equation.

Set the threshold value and grade division standard


for porosity, oil saturability and lithological
characters.

Dorg

Construct data analysis on non-critical well data by


using well logging interpretation model FDecision and
use Dorg to do comprehensive assessment.

Data mining provides a new effective method. For logging


reservoir evaluation, parameters calculation, such as porosity,
permeability and saturability, corresponds to regression task in
data mmmg. And oil layer identification, lithology
identification and sedimentary facies division corresponds to
classification task in data mining.

A.

Reservoir Parameters Prediction

B.

As the foregoing, traditional method for measuring porosity,


saturability and permeability is constructing a mathematical
model Forg with predetermined independent variables. For
example, the independent variables of porosity are usually AC,
CNL and DEN. Model parameters of Forg are calculated by
training coring data. This is quite dependent on expert
experience since the final comprehensive assessment will be
given by professional experts.
As already mentioned, regression algorithm can be used to
predict reservoir parameters, such as multiple linear regression,
Least Square Support Vector Machine. The advantage of this
method is that it doesn't need to predetermine the independent
variables, but find the relationship between reservoir
parameters and well logging curves by learning training data
collected form the cored wells. It is a data driven method.

Data Mining/or Reservoir Logging Evaluation

After computing reservoir parameters, oil exploration and


development experts give the fmal evaluation conclusion
through taking into account both computing results and
professional knowledge. This is quite dependent on expert
expenence.
Data mining provides an automatic and data-driven method
for reservoir logging evaluation. The framework is shown as
follows.

: ----------------------------------- ri da;-:
:
experimental data fl'Om
:
:
logging curves: SP, GR, AC
:

I
I

DEN, CNL, RT, RL RXO.

cOl'ed wells: porosity,

_________________________________________

The procedure of porosity, permeability and saturability


interpretation are as follows.

I
I

evaluation
conclusion

regression model:

MRA. LSSV'v!

porosity interpretation model


Figure 2.

logging curves of non


critical wells: SP, GR. AC,

porosity interpretation framework

DEN, CNL RT. RJ. RXo.

Figure 5,

logging curves:

AC, IlEt--, CNL, RT RI, RXO,.,

Tn this framework, train data is comprised of logging curves


and experimental analyzing data collected from cored wells.
The biggest difference between this framework and traditional
method is that it is automatic and not so dependent on expertise.
It can be used as expert recommendation system of oil
exploration and development.

porosity

I
I

I
regression model:

permeability

Figure 3,

'v!RA, LSSVM

interpretation model

permeability interpretation framework

logging curves:

SP, GR,

AC, IlEt--, CNL. RT, RI, RXO,,,

porosity
permeability

I
I

regression model:

saturability

Figure 4,

A framework of data mining for logging reservoir evaluation

SP. GR,

'v!RA, LSSVM

interpretation model

saturability interpretation framework

There are some key points for this framework.


1)

Feature Selection
Since the dimension of input data is much higher than
traditional method, it may lead to overfitting. It is necessary to
do feature selection while modelling. Common feature
selection methods for well logging interpretation are crossplot
technique, correlation analysis or expert experiences.
In the process of data mining, there are 3 kinds of working
patterns between feature selection and classification, embedded
approach, filter approach and wrapper approach. Feature
selection appears as a part of the data mining algorithm for
embedded approach. Data mining algorithm is considered as a
black box for wrapper approach. Filter approach is the method
which is independent of classifier, such as Correlation-based
Feature Selection (CFS) [5] and Las Vegas Filter (LVF) [6].
The evaluation standards for CFS are that the correlation
between feature and class labels is high while the redundancy

between different features is low. LVF evaluate features by


computing the consistency of feature sunset.

2)

Cs

for

CE [C1,C2]
(e,G')

parameters

Gs

and

for

GE [G1,G2].

We use each pair of

for modeling and choose the best one which

performs best as the fmal parameters.

Model Optimization
The performance of different algorithms depends largely on

the setting of model parameters. So it is necessary to do model


optimization. In most cases, it can only rely on experience or
contrast experiment to set parameters. There are also few
researches on model parameters optimization. An effective
method is the combination of k-Cross Validation and Grid
Search.

3)

Model Evaluation

Since it is quite expensive to core by drilling, there are few


data which can be used for train data. Cross validation should
be used for model evaluation. Tn addition to the evaluation of

classifier, statistical test, such as X2 test and F test, is necessary,


because statistical model is used for predicting reservoir

parameters.

k-Cross Validation

V.

Training dataset will be split into k separate files of equal

Tn this section,

EXPERIMENT AND ANALYSIS

we compare the performance of

the

size. k-l is selected as training dataset and the rest as validation

framework which is mentioned before with traditional reservoir

set. Modeling process will be repeated k times and choose the

logging evaluation methods on two well logging data.

average value of MSE after k iterations to estimate expected


generalization error.

Training dataset is composed of logging curves and data of


core wells derived from tight sandstones of Jianghan Basin in

Grid Search

central China. Attributes include GR, KTH, RS, DEN, AC,

Grid search is a practical method of searching for data. It is


quite suitable for searching the multidimensional data from
different growing directions. Take an example to illustrate the
principle, when we choose RBF kernel function for SVM, we

CNL, porosity etc. Class label is oil saturation. Since the


calculation process of traditional reservoir logging evaluation
methods is more complicated and dependent on expertise, the
result is given in advance.

should confirm two parameters, penalty parameter C and


kernel function parameter

G.

Grid Search choose change step


TABLE I.

NO.1
1
2

Well Section
(m)
3294.903448.72
3449.503456.18

Experimental tools include Datahoop, Matlab, Weka and


our own data mining platform.

EXPERIMENTAL DATA DEMONSTRAnON

TH

KTH

GR

CAL

RS

RD

CNL

D EN

AC

Vsh

porosity

conclusion

25.385

27.007

9.449

1235.583

1492.l95

0.028

2.516

52.364

0.107

0.015

Oil layer

5.5

9.558

12.264

6.521

132.043

527.616

0.039

2.652

53.739

0.011

0.032

Water
layer

20

8.948

3.436

5.832

236.188

1120.818

0.009

2.694

49.408

0.004

0.008

Dry layer

10

4.51

4.51

6.438

173.395

369.639

0.011

2.682

50.562

0.000

0.015

Oil layer

...
61
62

A.

3499.363602.70
3603.883632.80

Reservoir Parameters Prediction


0.9

For the first dataset, regression task was constructed by

0.8

MRA since it is a linear problem in general. And LSSVM was

0.7

used on the second dataset because of the nonlinearity.

We use F-test to validate the validity of equation and t-test


to validate the validity of the equation coefficient. For the fmal

erpertlse

0.6

.UBSVM

0.5

Decision Tree

... 0.4

model after correction, the P-value of F-test on the first dataset

0.3

was 4.645e-08, and the P-value of F-test on the second dataset

0.2

was 6.996e-07. This indicated that the regression model is

0.1

effective.

NaIve Bayes

o
datal

B.

Well Log Interpretation

Figure 6.

We use LlBSVM, Decision Tree, NaIve Bayes, Artificial


Neural Network (ANN) to train the prediction model and use
IO-fold cross-validation to verify the model. The evaluation
results are shown in the Figure 6.

data2

evaluation results comparison

From the experimental results it can be seen that the


conclusions of well-logging interpretation by the framework
proposed in this paper are consistent with the test oil results.
Although it doesn't perform much better than traditional
method

based

on

expertise,

this

framework

provides

an

effective and automatic way to construct reservoir evaluation

well logging interpretation. This provides a reference for the

based on well logging interpretation.

construction of big data platform.

Reservoir evaluation of new district block and new layer


always face new problems,

different prediction modeling

methods are difficult to analyze the merits and demerits in

REFERENCES
[I]

C. Alimonti and G. Falcone, Knowledge discovery in databases and


multiphase flow metering: the integration of statistics, data mining,
neural networks, fuzzy logic, and Adhoc flow measurements towards
well monitoring and diagnosis, SPE Annual Technical Conference and
Exhibition,2002,Texas

[2]

S. D. Mohaghegh, A new methodology for the identification of best


practices in the oil and gas industry, using intelligent systems,Journal of
Petroleum Science and Engineering,vol. 49,pp. 239-260,2005

[3]

M. Nikravesh, Soft computing-based computational intelligent for


reservoir characterization, Expert Systems With Applications, vol. 26,
pp. 19-38,2004

[4]

J. A. K. Suykens, J. Vandewalle, Leastsquares Support Vector Machine


Classifiers,Neural Processing Letters,9(3): 293-300,1999

[5]

M. A. Hall, Correlation-based feature selection for machine learning.


Hamilton, New Zealand: University of Waikato, Department of
Computer Science,1999

[6]

H. Liu and R. Setiono, A probabilistic approach to feature selection-a


filter solution. Proceedings of the 13th International Conference on
Machine Learning, San Francisco: Morgan Kaufmann, pp. 319-327,
1996

[7]

Xinghe Yu, Base of oil and gas reservoir geology, Petroleum Industry
Press

theory, their applicability can only be determined through


experiments.
VI. CONCLUSION
Data mining provides a variety of modeling methods which
expand the reservoir evaluation method, so that the analysis is
not only to make predictions, but also to discover knowledge.
Especially faced with the challenges of big data, it becomes a
big problem to give a quick solution while facing new
problems.
In order to improve the efficiency, it is recommended to
develop the data mining software which is integrated with the
existing logging interpretation and evaluation software, and
constantly enrich the algorithm library, so as to give the best
solution to the problem. In this paper, we put forward a
framework of data mining for reservoir evaluation based on

Vous aimerez peut-être aussi