Vous êtes sur la page 1sur 63

ANLISIS EXPLORATORIO DE DATOS

Prof. Orestes Gmez Gonzlez, MSc.


Geologist Senior Consultant
oregomez43@yahoo.ca
Telefono: +569 5626 0461

CONTENIDO
EDA: Exploratory Data Analysis EDA: Exploratory Data Analysis
Content
Content
Geology model
EDA - Applications
Not EDA, but a quick word
Understanding/Checking Data
Purpose of EDA
Maps: Location, trends, ...
Sample statistics: Spacing, orientation,...
Usual Graphs
Piechart
Histogram, probability
plot
Boxplot
Scattergram, regression
QQ plot
Relative difference plot
EDA Envelope
Important concept
Copyright 2013 - CAE Mining
Chile S.A.

Checking pairs of values


Checking twin holes, grade profiles
Important for Estimation/Simulation

Compositing
Declustering
Trimming, cutting outliers
Checking geological boundaries
Reference to Geological Process
GP 4.0 "Create Geology Model"
GP 5.3 "Choose Estimation Method and
Parameters"

GEOLOGY MODEL
A good geology model is most important.
Only the geology that tells something about the
mineralization is useful.
A geology model consists of several geology domains. Each
domain has its own statistical characteristics that do not
depend on location (homogeneity stationarity).
Boundaries between geological domains can be hard or
soft.
Statistics have to be computed per geology domain and
within the EDA envelope.
Copyright 2013 - CAE Mining
Chile S.A.

GEOLOGY MODEL
Useful statistics to check differences between domains:
Multiple boxplots, histograms
Multiple cumulative distribution
Variograms (different directions)

Copyright 2013 - CAE Mining


Chile S.A.

GEOLOGY MODEL: GEOLOGICAL PROCESS

Copyright 2013 - CAE Mining


Chile S.A.

GEOLOGY MODEL
EDA
Purpose

EDA: Geological Process


Geological Process 5.3 "Choose Estimation
method and Parameters"

Data familiarization
Detecting possible errors
Identifying/confirming different
mineralizations
Answering questions such as:
Ordinary or Indicator Kriging

5.3.2 "Sample Compositing"


5.3.3 "Exploratory Data Analysis"
5.3.4 "Choice of Estimation Method"
5.3.5 "Cutting and/or Indicator Classes"
5.3.7 "Assess Boundary Conditions (Evaluar
las condiciones de frontera

What trimming values

Mean and variance


Providing information for:
Model validation
Reconciliation
Copyright 2013 - CAE Mining
Chile S.A.

EDA: GEOLOGICAL PROCESS

Copyright 2013 - CAE Mining


Chile S.A.

EDA GRAPHS PIECHARTS

Copyright 2013 - CAE Mining


Chile S.A.

EDA GRAPHS HISTOGRAMS

Copyright 2013 - CAE Mining


Chile S.A.

EDA GRAPHS MULTIPLE PROBABILITY


PLOTS DIFFERENT MINERALIZATIONS

Copyright 2013 - CAE Mining


Chile S.A.

10

EDA GRAPHS BOXPLOT


A graph summarizing a distribution essential
statistics
Multiple boxplots can be displayed on the
same page

Great
Number
of display!
4859 28050 20902 13793 1911 Number of
data
Mean
Std. Dev.

Copyright 2013 - CAE Mining


Chile S.A.

Coef. of Var.
Maximum
Upper
quartile
Median
Lower
quartile

4.766

1.224

0.750

6.119

21.957

6.313

3.723

36.353

4.606
680
2.9

5.155
370
0.9

4.959
398
0.5

5.940
972
4.5

7 data
10.67 Mean
8
45.28 Std. Dev.
6
4.241 Coef. of Var.
1000 Maximum
6.9 Upper quartile

1
0.3

0.23
0.1

0.11
0.1

1.2
0.2

2.25 Median
0.5 Lower quartile

11

Copyright 2013 - CAE Mining


Chile S.A.

12

EDA GRAPHS SCATTERGRAM

Copyright 2013 - CAE Mining


Chile S.A.

13

EDA GRAPHS QQ PLOT


Useful to compare 2 populations, say A and B.
The quantiles of A and B, (ai, bi),(a2,&2), (0100,6100), are plotted
on a X/Y graph.

Copyright 2013 - CAE Mining


Chile S.A.

14

EDA GRAPHS RELATIVE DIFFERENCE


PLOT
Useful to investigate conditional bias between two
populations, say A and B.
til para investigar el sesgo condicional entre dos poblaciones,
digamos A y B.
X axis: Mean of pair of values.
Y axis: Relative difference between values.
Notes on graph:
1) Low values of A < B
2) High values of A > B
3) A few outliers (analytical
errors?).
Algunos valores atpicos
Copyright
2013 - CAE Mining
(errores
de anlisis?)
Chile S.A.

15

EDA ENVELOPE (1/4)


An EDA envelope is a 3D envelope within which statistics are
computed.
Una envolvente EDA es una envolvente 3D en el que se calculan las
estadsticas.
Which statistics?
Declustered mean, variance.
Choice of trimming values.
Choice of indicator cut-off grades.
Variogram.
Resource block model validation.
Copyright 2013 - CAE Mining
Chile S.A.

16

EDA ENVELOPE (1/4)


Why an EDA envelope?
To restrict statistics to where "it matters". Restringir la estadstica a lo
importante
Fairly tight around sample locations. Bastante cerca de los puntos de
muestreo
No extensive "waste" areas. No extensiva a las reas de estril
Well covered with samples. Bien cubierta con muestras
Para reducir el impacto de los "bordes" cuando desagrupamos y
calculamos los estadgrafos.
Para asegurarse de que las comparaciones hechas durante la validacin
(por ejemplo, las muestras frente a leyes promedio krigeadas)
corresponden al mismo material (es decir, el material dentro de la
Copyright 2013 - CAE Mining
17
envolvente EDA para ambas muestras y estimaciones krigeadas).
Chile S.A.

EDA ENVELOPE (2/4)


Cmo definir la envolvente EDA?
No sea demasiado preciso.
Bastante apretada alrededor de la zona
razonablemente bien muestreada.
Alrededor de material que importa. Las zonas
de estril significativamente por debajo de la
ley de borde pueden ser ignoradas.
Digitalizar sobre una serie de bancos, y luego
crear una wireframe y una cuadrcula 0/1 de
indicadores.
En general, la geologa puede ser ignorada
cuando se define la envolvente EDA.
Copyright 2013 - CAE Mining
Chile S.A.

18

EDA ENVELOPE (3/4)

Copyright 2013 - CAE Mining


Chile S.A.

19

EDA ENVELOPE (4/4)


Algunas observaciones:
La envolvente EDA se utiliza para calcular los estadgrafoss.
Todos los datos dentro y fuera de la envolvente se pueden utilizar en
la etapa de estimacin.
No puede haber estimaciones fuera de la envolvente EDA.
Adems de la envolvente tiene que ser considerado un modelo de
geologa.

Copyright 2013 - CAE Mining


Chile S.A.

20

EDA MAPS: LOCATION MAPS

Copyright 2013 - CAE Mining


Chile S.A.

21

EDA MAPS: LOCATING ANOMALIES

Copyright 2013 - CAE Mining


Chile S.A.

22

EDA MAPS: CHECKING TRENDS

Copyright 2013 - CAE Mining


Chile S.A.

23

EDA MAPS: PROPORTIONAL EFFECT

Copyright 2013 - CAE Mining


Chile S.A.

24

EDA: SAMPLE SPACINGS, GRADE STATS


o Spacing between closest samples from 2
different holes
Domain
RT2
RT3
RT5

Copyright 2013 - CAE Mining


Chile S.A.

Ave
15,7
11,3
10,3

Min
0,36
0,07
0,01

Max
85,9
64,6
73,9

Med
11,3
9,1
8,6

% Data
100
100
100

25

EDA: SAMPLE SPACINGS, GRADE STATS


o Number of assays above cut-off
grades
1,5 m Au Composites - In EDA - Domain 02 Number of Assay
Intervals Above Grade Cut-Of

Copyright 2013 - CAE Mining


Chile S.A.

26

EDA: CHECKING PAIRS OF VALUES

Complete QA/QC on
Analysis"

Copyright 2013 - CAE Mining


Chile S.A.

27

EDA: CHECKING TWIN HOLES

Copyright 2013 - CAE Mining


Chile S.A.

28

EDA: GRADE PROFILES

omparison of Au, Ag, Cu, Zn, C and S for DDH161

Copyright 2013 - CAE Mining


Chile S.A.

29

COMPOSITING 1/3
Support size (point, 2 m sample, block, etc.) is important.
Different support sizes in different variabilities.
Blocks are less variable than samples.
In theory, samples must be representative of the population. 5 m
samples are not representative of a 1 m sample population.
(Most) estimation algorithms do not account for sample size, e.g. do
not make the difference between a 10 m and a 1 m sample.
Solution: Composite samples so that resulting "composite lengths" are
identical.

Copyright 2013 - CAE Mining


Chile S.A.

30

COMPOSITING 1/3

Copyright 2013 - CAE Mining


Chile S.A.

31

COMPOSITING 2/3
Compositing may be required if:
Sample lengths are much different: average length of 1.5m, many
50cm long samples centered on high grade veins.
Before compositing:
Histogram of sample lengths.
Histograms of sample grades per interval of lengths.
Trim or cut very high grade (outliers) to avoid smearing them over
much longer lengths (More about outliers in Section "Bivariate
Statistics").

Copyright 2013 - CAE Mining


Chile S.A.

32

COMPOSITING 2/3
Composite length should be such that:
Enough variability is retained when estimating.
No geological boundary crossing.
Do not exceed block size:
5 m benches: 2/3 m composites OK; 5 m is maximum.
If possible, composite only what is needed, i.e. leave untouched
composites if in specified Min/Max limits.

Copyright 2013 - CAE Mining


Chile S.A.

33

COMPOSITING 3/3
Impact of compositing:
Lose original samples;
Grade variability reduced;
Number of samples reduced;
Geological contacts can be smeared out.
If an original sample length is very long, compositing will split it in
many regular smaller lengths.
OK if the original grade is very low.
Problem if original grade is very high, because the location of the
high grade is unknown.

Copyright 2013 - CAE Mining


Chile S.A.

34

COMPOSITING 3/3
Useful check: display drill holes with the composited grade histogram
on one side and the original grade histogram on the other side.
Geological Process 5.3.2 "Sample Compositing"

Copyright 2013 - CAE Mining


Chile S.A.

35

DESAGRUPAMIENTO: INTRODUCCIN
El agrupamiento de muestras son comunes en la
industria minera.
Problema potencial:
Los agrupamientos a menudo se
encuentran en las zonas de alto
grado. Su impacto puede ser una
sobreestimacin grave de la ley
media y la variabilidad, si no se
tienen en cuenta.
Solucin:
Desagrupar.
El objetivo es reducir el "peso" de
cada dato agrupados + / - en
Copyright 2013 - CAE Mining
proporcin a la densidad de
Chile S.A.

36

DECLUSTERING: METHODS
Celda de dasagrupamiento
Superponer una rejilla de celdas en los datos;
El tamao de las celdas puede ser ms o menos la distancia media
red de muestreo,. La idea es lograr valores medios de una muestra
por celda, donde hay agrupamiento. La celda puede ser rectangular.

Copyright 2013 - CAE Mining


Chile S.A.

37

DESAGRUPAMIENTO: MTODOS
Desagrupamiento polygonal
En 2D, los pesos desagrupados son proporcionales a los polgonos
de influencia de los datos correspondientes.
En 3D, el mismo principio se aplica con carcter de banco.

Copyright 2013 - CAE Mining


Chile S.A.

38

DESAGRUPAMIENTO: MTODOS

Krigeaje
El Krigeaje es una exelente herramienta desagrupadora (ver abajo).
Una rejilla de celdas regular es superimpuesta sobre los datos. El tamao de
la celda no importa mucho.
Las celdas son krigeadas usando las muestras. Los pesos de las muestras
usadas en el krigeaje pueden ser guardados en memoria.
El peso desagrupado de una muestra dada es la suma de los pesos
correspondientes del krigeaje, guardados en memoria.
Modelo Vecino Ms Cercano
Los datos son usados para estimar un modelo de celdas/bloques regular. Los
datos mas cercanos son usados para estimar cada bloque.
La distribucin resultante es la distribucin de los bloques estimados.
La forma de la distribucin resultante es muy similar a la forma de la
distribucin del poligono desagrupado.
Ventajas: Puede usarse con diferentes software por ejemplo Studio 3 y/o
Vulcan,
etc.
Copyright 2013
- CAE Mining
39
Chile S.A.

DECLUSTERING: METHODS
Desagrupamiento automtico de
celdas
Basado sobre la asuncin de que
los grupos estan siempre en las
zonas de altas leyes. Por
consiguiente el promedio es
sobreestimado sin querer.
Automaticamente pueden usarse
diferentes tamaos de celda para
desagrupar.
El tamao de la celda
Muyseleccionada
bueno en teora,
peroque
a menudo
no concluyentes en
es aquella
da el
la prctica.
Nobajo.
es recomendado.
promedio==
ms

Copyright 2013 - CAE Mining


Chile S.A.

40

DESAGRUPAMIENTO: MTODOS
Nota 1:
Algunos pesos de desagrupamiento pueden ser muy altos debido a:
El rea del poligono es muy grande (sobre la franja)
Localizaciones especilas de muestras (inicio/y fin del sondaje)
La solucin consiste en:
Introducir un valor alto cuando desagrupamos
Desagrupar dentro de una envolvente de AED

Copyright 2013 - CAE Mining


Chile S.A.

41

DESAGRUPAMIENTO: MTODOS
Nota 2:
La desagrupacvin poligonal es incompleta debido al un radio pequeo
de desagrupamiento en los poligonos pequeos.
La solucin consiste en:
Mapas de chequeo de reas a desagrupar.
Incremetar los radios de dasagrupamiento y usar envolventes de
AED para controlar las franjas
Chequeo de los pesos de dasagrupamiento y eventualmente
reducirlos.
Copyright 2013 - CAE Mining
Chile S.A.

42

DESAGRUPAMIENTO: MTODOS
Visualizaciones tiles
Histogramas de los pesos
Mapas de los valores de los pesos
Mapas de las reas desagrupadas
Geological Process 5.3.3 "Exploratory Data Analysis"
Let N AU values [z(xi),i = l,...,N] and [wi,i = 1,...,N] rescaled declustered
weights such

Copyright 2013 - CAE Mining


Chile S.A.

43

DECLUSTERING: STATISTICS
Let N AU values [z(xi),i = l,...,N] and [wi,i = 1,...,N] rescaled declustered
weights such

Means:

Copyright 2013 - CAE Mining


Chile S.A.

Variance:

44

DECLUSTERING: STATISTICS
Variance:
Standard deviation: Median: the value z50 such that the sum of the
declustered weights of the values less than z50 is 0,5.
Note: if pairs of values are available, the covariance (see bivariate
statistics) can also be declustered.

Copyright 2013 - CAE Mining


Chile S.A.

45

DECLUSTERING: EXAMPLE (1/3)

Copyright 2013 - CAE Mining


Chile S.A.

Very few excessive weights. Keep as is, or trim


to 40.

46

DECLUSTERING: EXAMPLE (2/3)

-14 % change in grade

Copyright 2013 - CAE Mining


Chile S.A.

47

DECLUSTERING: EXAMPLE (3/3)

Incomplete declustering in the


Northern portion of the map
and in elongated domain in the
SW.

Copyright 2013 - CAE Mining


Chile S.A.

48

DECLUSTERING: EXERCISE 8
Let the following sampling situation:

What would be a reasonable cell


declustering size?

Copyright 2013 - CAE Mining


Chile S.A.

49

TRIMMING / CUTTING OUTLIERS (1 Y 2)

Copyright
Chile S.A.

When computing statistics within a geological


domain,
we make the assumption that there is only one
population and that all samples belong to that population.
Outliers or extreme values are often observed.
Their
impact can be a serious overestimation of:
- the average grade and its variability;
- the mean, variance, variogram, block model estimates, etc.
Note that the outliers impact all estimates, even
the
traditional ones such as polygonal and l/d 2.
Various solutions:
- Outliers are erroneous: delete or correct them;
- Outliers are from different "population":
- define new geology domain;
- trim them down prior to computing
statistics;
- restrict their influence during estimation
(l/d2
or kriging);
- use indicator kriging.
2013
- CAE Mining
Geological
Process 5.3.5 "Cutting and/or Indicator
Classes".

The main questions are:


- Is trimming/cutting warranted?
- If yes, which value(s) to choose?
The answers are subjective.
The following graphs might be useful.
- Decile analysis.
- Actual versus smoothed grade profile along
holes.
- Histogram and cumulative probability plots.
- Indicator correlation plot.
- Coefficient of variation plot.
- Quantity of metal plot.
Also useful:
- Number of trimmed/cut data

50

TRIMMING / CUTTING OUTLIERS (3)


Grade profiles along holes
Outliers stand out with respect to
smoothed grade profile.

Copyright 2013 - CAE Mining


Chile S.A.

Similar techniques can be


applied in 2 and 3D:
2D: sample values &
contours;
map
of
residuals.
3D: sample values &
3D estimates; list of
residuals.
Advantage: detect local51
outliers.

TRIMMING / CUTTING OUTLIERS (4)


Grade profiles along holes
Outliers stand out with respect to
smoothed grade profile.

A possible trimming/cutting value is


where the histogram classes start to
be isolated on the horizontal axis.
Possible trimming value from graph:
80 g/t.
Copyright 2013 - CAE Mining
Chile S.A.

52

TRIMMING / CUTTING OUTLIERS (5)


Cumulative (log)probability
plots.

Copyright 2013 - CAE Mining


Chile S.A.

A single population would be


represented on the plot by a gradually
increasing line.
If the population is (log)normal, then the
curve is a straight line on
(log)probability paper. This, however, is
of secondary importance.
A "kink" or a "break" in the curve might
indicate two populations or the presence
outliers.
53

TRIMMING / CUTTING OUTLIERS (6)

Copyright 2013 - CAE Mining


Chile S.A.

A possible trimming/cutting value is


around the kink/break where the second
population (outliers) gets predominant.
Possible trimming value from graph: 70
g/t.
Decile Analysis
Introduced by I.S. Parrish, Min.
Eng., Apr. '97.
See next page for 40/10 rule of
thumb
40/10 rule to be reduced if last
decile / percentile do no contain a
full "complement" of samples.
54

TRIMMING / CUTTING OUTLIERS (7)


Decile Analysis (Contd.)
If Top decile contains:
More than 40% of metal, or
More than twice the metal of previous decile
Split it in 10 percentiles
If top percentile contains:
More than 10% of metal
Trimming is warranted
Suggested trimming value is then:
Highest value of previous percentile
Generally conservative?
Possible trimming value from graph
Note that last decile / percentile not "full".
Trimming may be warranted.
Previous percentiles 3 values: 61, 109, 110 g/t
Possible trimming value: 100 g/t
Copyright 2013 - CAE Mining
Chile S.A.

55

TRIMMING / CUTTING OUTLIERS (8)


Indicator correlation
plot.

This
plot
shows
the
correlation
coefficient of two adjacent down-hole
sample indicators for increasing cut-offs.
Indicator: ic(x) =

Copyright 2013 - CAE Mining


Chile S.A.

where zc is the cut-off (indicator


threshold).
As the cut-off zc increases, the
correlation decreases.
A possible trimming/cutting value is
when the correlation is or is getting
close to 0.

56

TRIMMING / CUTTING OUTLIERS (9)


Coefficient of variation
plot.
This plot shows the coefficient of variation of
the grades below cut-off (cutting limit).
As cut-off increases, coefficient of variation
increases.
A possible trimming/cutting value is whn the
coefficient of variation curve gets out of
control.
Possible trimming value from graph: 100 g/t.

Copyright 2013 - CAE Mining


Chile S.A.

57

TRIMMING / CUTTING OUTLIERS (10)


Quantity of metal
plot.

Copyright 2013 - CAE Mining


Chile S.A.

This plot shows the relative quantity of


metal contained within the trimmeddown samples for various trimming
values.
Useful to know the quantity of metal
"discarded" by trimming.
93 % of metal corresponding to 70 g/t
trimming value.
=> 7 % of metal "loss" if larger than 80
g/t Au samples are trimmed down to 80
g/t.

58

TRIMMING / CUTTING OUTLIERS (11)

Copyright 2013 - CAE Mining


Chile S.A.

59

TRIMMING / CUTTING OUTLIERS (12)


Trimming/Cutting Summary
Table
Histogram

80 g/t

Probability Plot

70 g/t

Decile Analysis

100 g/t

Indicator Correlation
Coefficient of Variation
Final Choice
Number of Data Trimmed
Metal "Loss"
Copyright 2013 - CAE Mining
Chile S.A.

60 g/t
100 g/t
80 g/t
4 of 455
7%
60

HARD / SOFT GEOLOGICAL BOUNDARIES


MUSSELWHITE - Comparison of Consecutive Down
Hole Assays

Copyright 2013 - CAE Mining


Chile S.A.

61

HARD / SOFT GEOLOGICAL BOUNDARIES

Note:
mineralization
contact.

sometimes,
occurs
at

Geological
Process
5.3.7
"Assess Boundary Conditions"

Copyright 2013 - CAE Mining


Chile S.A.

62

HARD / SOFT GEOLOGICAL BOUNDARIES

Copyright 2013 - CAE Mining


Chile S.A.

63

Vous aimerez peut-être aussi