Interpreting Extended Statistics

Interpreting
Extended Statistics
ChinarAliyev
As you know in oracle 11g introduced extended statistics to improve selectivity estimation of correlated columns.
But when and how query optimizer (QO) use these statistics? What are its restrictions? Lets see step by step
Correlated Columns
We will use customers table in SH schema
SQL> select count (*) from customers

2 where CUST_STATE_PROVINCE=CA and COUNTRY_ID=52790;
COUNT (*)
---------3341
SQL>
Without histogram QO estimate cardinality as below

SQL>
BEGIN
2
DBMS_STATS.gather_table_stats ('SH',
3
'CUSTOMERS',
4
5
estimate_percent=>null,
6
cascade => true,
7
method_opt=>'FOR ALL COLUMNS SIZE 1'
8
);
9
END;
10
/
SQL> SELECT *
2
FROM customers a
3
where CUST_STATE_PROVINCE='CA' and COUNTRY_ID=52790;
(Q1)
Execution Plan
---------------------------------------------------------Plan hash value: 2008213504
------------------------------------------------------------------------------| Id | Operation
| Name
| Rows | Bytes | Cost (%CPU)| Time
|
------------------------------------------------------------------------------|
0 | SELECT STATEMENT |
|
20 | 3620 |
406
(1)| 00:00:05 |
|* 1 | TABLE ACCESS FULL| CUSTOMERS |
20 | 3620 |
406
(1)| 00:00:05 |
------------------------------------------------------------------------------Predicate Information (identified by operation id):
--------------------------------------------------1 - filter ("CUST_STATE_PROVINCE"='CA' AND "COUNTRY_ID"=52790)
SQL>
With histograms QO estimate cardinality as below

SQL> BEGIN
2
DBMS_STATS.gather_table_stats ('SH',
3
'CUSTOMERS',
4
5
cascade => true,
6
method_opt=>'FOR ALL COLUMNS SIZE skewonly'
7
);
8 END;
9 /
SQL> SELECT *
2
FROM customers a
3
where CUST_STATE_PROVINCE='CA' and COUNTRY_ID=52790;
Execution Plan
---------------------------------------------------------Plan hash value: 2008213504
------------------------------------------------------------------------------| Id | Operation
| Name
|
------------------------------------------------------------------------------|
| 1115 |
197K|
406
(1)| 00:00:05 |
|* 1 | TABLE ACCESS FULL| CUSTOMERS | 1115 |
197K|
406
(1)| 00:00:05 |
SQL>
Even we use histograms QO cannot estimate correct cardinality due to here is column correlation. To
solving this problem RDBMS should be gather statistics for combined (concatenate) of these columns.
Like one of the most statistic is NDV (number of distinct values), to find and storing this statistic in data
dictionary RDBMS should be analyze both columns together (number of distinct row groups) like below.
SQL> SELECT COUNT (*) ndv

2
FROM (SELECT
cust_state_province, country_id
3
FROM customers
4
GROUP BY cust_state_province, country_id)
5
SQL> /
NDV
---------145
SQL>
But just NDV is not enough to estimate correct cardinality because there can be skew data for
concatenated correlation columns. Still we talking about just simplewhere col1=a1 and col2=a2 and
Coln=an(p1) predicate, QO should be estimate selectivity of this column groups which contain skewed
data. Therefore in Oracle RDBMS these column groups (which are correlated) mapped as equivalent
virtual column then according this virtual column statistics QO use to estimate selectivity of predicate p1.
So what is happen when creating extended statistics? To create this you can useCREATE_EXTENDED_STATS
orGATHER_TABLE_STATSwithMETHOD_OPToption ofDBMS_STATSpackage. In our example
cust_state_province and country_id columns are correlated then we can create column groups for
these column as:
SQL> begin
2
DBMS_STATS.GATHER_TABLE_STATS (
3
'SH',
4
'CUSTOMERS',
5
6
METHOD_OPT =>'FOR COLUMNS (CUST_STATE_PROVINCE,
COUNTRY_ID) size 1');
7 end;
8 /
PL/SQL procedure successfully completed.
SQL>
When creating column groups if we enable sql trace then we can be found below statement.
Alter table "SH"."CUSTOMERS" add (SYS_STU#S#WF25Z#QAHIHE#MOFFMM_ as
(sys_op_combined_hash (CUST_STATE_PROVINCE, COUNTRY_ID)) virtual BY USER for
statistics)
It means when creating extended statistics oracle first add virtual column, and then gather statistics for
this column. Why use hash function? We just see p1 (known as Point correlation). It means every col1,
col2 coln columns values must according only one unique value when virtual column creating. So
here must be Y=F (col1, col2 coln) relationship between correlated columns and virtual columns
therefore if this exists then QO can estimate selectivity of these column groups. Every input values
function F must generate unique values. Of course just columns concatenate can be enough for most of
cases but using hash function provides guarantee unique values and this is best option. Now we have
one column group without histogram and separately column has histograms. Lets see what is happen in
this case for query (Q1)
Plan hash value: 2008213504

------------------------------------------------------------------------------| Id | Operation
| Name
|
------------------------------------------------------------------------------|
| 1115 |
210K|
406
(1)| 00:00:05 |
210K|
406
(1)| 00:00:05 |
And from trace file
SINGLE TABLE ACCESS PATH

Single Table Cardinality Estimation for CUSTOMERS[A]
Column (#11):
NewDensity:0.000144, OldDensity:0.000009 BktCnt:55500, PopBktCnt:55500, PopValCnt:145,
NDV:145
Column (#11): CUST_STATE_PROVINCE (
AvgLen: 11 NDV: 145 Nulls: 0 Density: 0.000144
Histogram: Freq #Bkts: 145 UncompBkts: 55500 EndPtVals: 145
Column (#13):
NDV:19
Column (#13): COUNTRY_ID(
AvgLen: 5 NDV: 19 Nulls: 0 Density: 0.000676 Min: 52769 Max: 52791
Column (#24): SYS_STU#S#WF25Z#QAHIHE#MOFFMM_(
ColGroup (#1, VC) SYS_STU#S#WF25Z#QAHIHE#MOFFMM_
Col#: 11 13
CorStregth: 19.00
ColGroup Usage:: PredCnt: 2 Matches Full: Partial:
Table: CUSTOMERS Alias: A
Card: Original: 55500.000000 Rounded: 1115 Computed: 1114.87 Non Adjusted: 1114.87
Access Path: TableScan
Cost: 405.71 Resp: 405.71 Degree: 0
Cost_io: 404.00 Cost_cpu: 35392510
Resp_io: 404.00 Resp_cpu: 35392510
As you see QO detected column groups but it is not use virtual column statistics due to in this case QO
is not estimate selectivity(in trace file Partial for this column group(CG) is null). It use traditional
method and estimated selectivity.
SQL> select num_rows,blocks from user_tables where table_name='CUSTOMERS';

NUM_ROWS
BLOCKS
---------- ---------55500
1486
SQL> select num_distinct, histogram from user_tab_col_statistics
2 where table_name='CUSTOMERS'
3 and column_name in ('CUST_STATE_PROVINCE','COUNTRY_ID');
NUM_DISTINCT
-----------145
19
HISTOGRAM
--------------FREQUENCY
FREQUENCY
SQL>
From Histogram for column

Endpoint_number
7650
7980
11321
12098
12255
Endpoint_Actual_value
Brittany
Buenos Aires
CA
CO
CT
From Histogram for column

Endpoint_number
29169
29244
29335
36892
55412
55500
CUST_STATE_PROVINCE
COUNTRY_ID
Endpoint_value
52786
52787
52788
52789
52790
52791
They are frequency histograms. Therefore selectivity will be:
Sel(cust_state_province)=(11321-7980) /55500=3341/55500=0.06019
Sel(country_id)=(55412-36892) /55500=18520/55500=0.33369
Sel(cust_state_province and country_id)= 0.06019*0.33369=0.02008
Card=num_rows*sel=55500*0.02008=1114.706~1115
Now lets gather histogram statistics for this column group and see what is happen.
SQL> BEGIN
2
DBMS_STATS.gather_table_stats
3
('SH',
4
'CUSTOMERS',
5
estimate_percent
=> NULL,
6
method_opt
=> 'FOR COLUMNS (CUST_STATE_PROVINCE,COUNTRY_ID) size
skewonly'
7
);
8 END;
9 /
SQL> select column_name,num_distinct,histogram from user_tab_col_statistics
3 and column_name='SYS_STU#S#WF25Z#QAHIHE#MOFFMM_';
COLUMN_NAME
NUM_DISTINCT HISTOGRAM
------------------------------ ------------ --------------SYS_STU#S#WF25Z#QAHIHE#MOFFMM_
145 FREQUENCY
SQL>
Execution Plan
---------------------------------------------------------Plan hash value: 2008213504
------------------------------------------------------------------------------| Id | Operation
| Name
|
------------------------------------------------------------------------------|
| 3341 |
629K|
406
(1)| 00:00:05 |
629K|
406
(1)| 00:00:05 |
--------------------------------------------------1 - filter("CUST_STATE_PROVINCE"='CA' AND "COUNTRY_ID"=52790)
SQL>
As you see in this case QO estimate cardinality correctly and from trace file

Column (#11):
NDV:145
Column (#11): CUST_STATE_PROVINCE(
Column (#13):
NDV:19
Column (#24):
NDV:145
Col#: 11 13
CorStregth: 19.00
ColGroup Usage:: PredCnt: 2 Matches Full: #1 Partial: Sel: 0.0602
Cost_io: 404.00 Cost_cpu: 35837710
Resp_io: 404.00 Resp_cpu: 35837710
How QO estimate selectivity? This is frequency histogram, when creating histogram if we enable sql
trace then we can see oracle build histogram using below statement.
select substrb(dump(val,16,0,32),1,120) ep, cnt

from
(select /*+ no_expand_table(t) index_rs(t)
no_parallel(t)
no_parallel_index(t) dbms_stats cursor_sharing_exact use_weak_name_resl
dynamic_sampling(0) no_monitoring no_substrb_pad
*/mod("SYS_STU#S#WF25Z#QAHIHE#MOFFMM_",9999999999) val,count(*) cnt from

"SH"."CUST" t where mod("SYS_STU#S#WF25Z#QAHIHE#MOFFMM_",9999999999) is
not null group by mod("SYS_STU#S#WF25Z#QAHIHE#MOFFMM_",9999999999)) order
by val
Therefore we can use below sql and result can be

SELECT endpoint_number e
FROM user_tab_histograms
WHERE table_name = 'CUSTOMERS'
AND column_name = 'SYS_STU#S#WF25Z#QAHIHE#MOFFMM_'
AND endpoint_value = MOD (sys_op_combined_hash ('CA', 52790), 9999999999))
column_name
endpoint_number
endpoint_value
SYS_STU#S#WF25Z#QAHIHE#MOFFMM_
20225
4701058945
21244
4752431017
24585
4800861232
24813
4861997875
Sel=(24585-21244)/55500=0.06019
Card=num_rows*sel=3341
Two Column Groups Case 1

Now assume we have two column groups.
CG1= (cust_state_province, country_id)
CG2= (CUST_CITY_ID, cust_state_province, country_id)
Consider for column group CG1 gathered statistics but for CG2 is not. So CG2 to be estimate
selectivity. In this case execution plan was
SQL>
2
3
(Q2)
SELECT
*
FROM customers a
where CUST_STATE_PROVINCE='CA' and COUNTRY_ID=52790 and cust_city_id=51919;
Execution PlanPlan hash value: 2008213504

------------------------------------------------------------------------------| Id | Operation
| Name
|
------------------------------------------------------------------------------|
|
39 | 8424 |
406
(1)| 00:00:05 |
39 | 8424 |
406
(1)| 00:00:05 |
--------------------------------------------------1 - filter("CUST_CITY_ID"=51919 AND "CUST_STATE_PROVINCE"='CA' AND
"COUNTRY_ID"=52790)
Column (#11):
NDV:145
Column (#13):
NDV:19

Column (#10):
NewDensity:0.001189, OldDensity:0.002179 BktCnt:254, PopBktCnt:77, PopValCnt:34, NDV:620
Column (#10): CUST_CITY_ID(
Histogram: HtBal #Bkts: 254 UncompBkts: 254 EndPtVals: 212
Column (#26): SYS_STU4RAPXUESG1VO3#Q7ZH365D7( NO STATISTICS (using defaults)
Column (#25):
NDV:145
Col#: 11 13
CorStregth: 19.00
Cost_io: 404.00 Cost_cpu: 35045008
Resp_io: 404.00 Resp_cpu: 35045008
For column group CG1 we already know how calculated and

sel(CG2)=sel(CG1)*sel(cust_city_id)
SQL> select column_name,num_distinct,histogram from user_tab_col_statistics
3 and column_name='CUST_CITY_ID';
COLUMN_NAME
------------------------------ ------------ --------------CUST_CITY_ID
620 HEIGHT BALANCED
SQL>
From the histogram information for this column

Endpoint_number
157
158
161
162
163
165
166
endpoint_value
51916
51917
51919
51924
51930
51934
51971
sel(CUST_CITY_ID)=(161-158)/num_buckets=0.0118110236
sel(cg2)=sel(cg1)*sel(CUST_CITY_ID)=7.1102362204724409448818897637795e-4
card=sel(cg2)*num_rows=39.46
Actually this is clear method, without CG2 QO will estimate cardinality also as 39 because our predicate is
CUST_STATE_PROVINCE='CA' and COUNTRY_ID=52790 and cust_city_id=51919 then optimizer detected here is
column group and have sufficient statistics therefore we can write predicate SYS_STU#S#WF25Z#QAHIHE#MOFFMM_=
MOD (sys_op_combined_hash ('CA', 52790), 9999999999) and cust_city_id=51919 Due to selectivity will be
sel (SYS_STU#S#WF25Z#QAHIHE#MOFFMM_)*sel (cust_city_id)
Two Column Groups Case 2

Assume we have three column groups like below
CG1 = ("CUST_STATE_PROVINCE","COUNTRY_ID","CUST_CITY_ID")
CG2 = ("CUST_STATE_PROVINCE","COUNTRY_ID")
CG3 = ("CUST_STATE_PROVINCE","CUST_CITY_ID")
And our predicate isCUST_STATE_PROVINCE='CA' and COUNTRY_ID=52790 and CUST_CITY=51919 (P2). So

in this case how QO will estimate selectivity? There sel (CG1) to be estimated, CG1 has not statistics.
But for two column group CG2 and CG3 statistics were gathered. According previous example
selectivity of CG1 can be estimated as below
Sel (CG1) =sel (CG2)*sel (CUST_CITY_ID) (F1)
Or
Sel (CG1) =sel (CG3)*sel (COUNTRY_ID)
(F2)
So which formula QO will choose and what based on? Lets see execution plan and trace file
SQL> select * from customers

2 where CUST_STATE_PROVINCE='CA' and CUST_CITY_ID=51919 and COUNTRY_ID=52790;
Execution Plan
---------------------------------------------------------Plan hash value: 2008213504
------------------------------------------------------------------------------| Id | Operation
| Name
|
------------------------------------------------------------------------------|
|
219 | 43800 |
406
(1)| 00:00:05 |
219 | 43800 |
406
(1)| 00:00:05 |
--------------------------------------------------1 - filter("CUST_CITY_ID"=51919 AND "CUST_STATE_PROVINCE"='CA' AND
"COUNTRY_ID"=52790)
SQL>
Single Table Cardinality Estimation for CUSTOMERS[CUSTOMERS]
Column (#13):
NDV:19
Column (#11):
NDV:145
Column (#10):
Column (#10): CUST_CITY_ID(
Column (#26): SYS_STU14HX98$V3_$3Z$ZSWQ0O8O0(
Column (#25):
Column (#25): SYS_STULHUROKG217F9$OWA1IEIZLA(

Column (#24):
NDV:145
ColGroup (#1, VC) SYS_STU14HX98$V3_$3Z$ZSWQ0O8O0
Col#: 10 11 13
CorStregth: 2755.00
ColGroup (#2, VC) SYS_STULHUROKG217F9$OWA1IEIZLA
Col#: 10 11
CorStregth: 145.00
Col#: 11 13
CorStregth: 19.00
Table: CUSTOMERS Alias: CUSTOMERS
Extension_name
SYS_STULHUROKG217F9$OWA1IEIZLA
Extension
("CUST_STATE_PROVINCE","COUNTRY_ID")
("CUST_STATE_PROVINCE","CUST_CITY_ID")
Histogram
FREQUENCY
HEIGHT BALANCED
SQL> select column_name, num_distinct from user_tab_col_statistics

3 and column_name in ('CUST_STATE_PROVINCE','COUNTRY_ID','CUST_CITY_ID',
4 SYS_STULHUROKG217F9$OWA1IEIZLA','SYS_STU#S#WF25Z#QAHIHE#MOFFMM_')
5;
COLUMN_NAME
------------------------------ ------------ --------------SYS_STU#S#WF25Z#QAHIHE#MOFFMM_
145 FREQUENCY
SYS_STULHUROKG217F9$OWA1IEIZLA
620 HEIGHT BALANCED
CUST_CITY_ID
620 HEIGHT BALANCED
CUST_STATE_PROVINCE
145 FREQUENCY
COUNTRY_ID
19 FREQUENCY
As you see QO chooseSYS_STULHUROKG217F9$OWA1IEIZLAvirtual column (Matches Full: #2) why this?

Because this CG correlation strength is more than other (145>19). CorStength is indicating deeply of
correlation between columns (in column groups). It seems QO identify correlation strength using
NDV`s .So
CorStrengh(col1,col2,,coln)=NDV(col1)*NDV(col2)**NDV(coln)/NDV(col1,col2,,coln)
CorStrengh (cust_state_province, cust_city_id) = 145*620/620=145
CorStrengh (cust_state_province, country_id) = 145*19/145=19
From histogram for country_id

Endpoint_number
36892
55412
55500
Endpoint_value
52789
52790
52791
Therefore
sel(country _id)= (55412-36892)/55500= 0.33369.
sel(p2)=sel()*sel(country_id)= 0.00393758
Card=sel (p2)*num_rows= 0.00393758*55500=218.536~219
IfCorStrengh(col1,col2,,coln)=1 it mean is they col1,col2,,coln columns are not correlated.
Extended Statistics and equijoin

QO also can detect extended statistics in equijoin operation and can use this.
SQL>
2
3
4
5
6
7
8
9
10
11
create table t
as
select
trunc(dbms_random.value(0,25)) n1,
trunc(dbms_random.value(0,20)) n2,
lpad(rownum,10,'0') small_vc
from
all_objects
where
rownum <= 10000
;
Table created.
SQL> update t set n2=n1 where rownum<=9955;
9955 rows updated.
SQL> commit;
Commit complete.
SQL>
2
3
4
5
6
7
8
9
10
begin
dbms_stats.gather_table_stats(
user,
't',
cascade => true,
method_opt=> 'for all columns size 1 FOR COLUMNS (n1,n2) size 1 ');
end;
/
SQL>
2
3
4
5
6
7
8
9
select
count(*)
from
t t1,
t t2
where
t1.n1 = t2.n1
and t1.n2 = t2.n2
;
Execution Plan
---------------------------------------------------------Plan hash value: 791582492
---------------------------------------------------------------------------| Id | Operation
| Name | Rows | Bytes | Cost (%CPU)| Time
|
---------------------------------------------------------------------------|
0 | SELECT STATEMENT
|
|
1 |
12 |
32 (25)| 00:00:01 |
|
1 | SORT AGGREGATE
|
|
1 |
12 |
|
|
|* 2 |
HASH JOIN
|
| 1470K|
16M|
32 (25)| 00:00:01 |
|
3 |
TABLE ACCESS FULL| T
| 10000 | 60000 |
12
(0)| 00:00:01 |
|
4 |
TABLE ACCESS FULL| T
| 10000 | 60000 |
12
(0)| 00:00:01 |
---------------------------------------------------------------------------Predicate Information (identified by operation id):

--------------------------------------------------2 - access("T1"."N1"="T2"."N1" AND "T1"."N2"="T2"."N2")
I do not full text of trace file here but some need information here.

Single Table Cardinality Estimation for T[T2]
Table: T Alias: T2
Cost_io: 12.00 Cost_cpu: 1956372
Resp_io: 12.00 Resp_cpu: 1956372
Best:: AccessPath: TableScan
Cost: 12.09 Degree: 1 Resp: 12.09 Card: 10000.00 Bytes: 0
Column (#4): SYS_STUBZH0IHA7K$KEBJVXO5LOHAS(
ColGroup (#1, VC) SYS_STUBZH0IHA7K$KEBJVXO5LOHAS
Col#: 1 2
CorStregth: 9.19
Column (#4): SYS_STUBZH0IHA7K$KEBJVXO5LOHAS(
ColGroup (#1, VC) SYS_STUBZH0IHA7K$KEBJVXO5LOHAS
Col#: 1 2
CorStregth: 9.19
Join ColGroups for T[T1] and T[T2] : (#1, #1)
Therefore
Join selectivity will be 0.014706 and final cardinality 0.014706*10000*10000=1470600
Projections
QO can estimate cardinality using extended statistics during GROUP BY operation.
SQL> select count(*) from (

2 select count(*) from customers
3
group by CUST_STATE_PROVINCE,COUNTRY_ID);
COUNT(*)
---------145
SQL> select column_name,num_distinct from user_tab_col_statistics
3 and column_name in ('CUST_STATE_PROVINCE','COUNTRY_ID');
COLUMN_NAME
NUM_DISTINCT
------------------------------ -----------CUST_STATE_PROVINCE
145
COUNTRY_ID
19
SQL> select count(*) from customers
2
group by CUST_STATE_PROVINCE,COUNTRY_ID;
Execution Plan
---------------------------------------------------------Plan hash value: 1577413243
-------------------------------------------------------------------------------| Id
| Operation
| Name
| Rows
| Bytes | Cost (%CPU)| Time
-------------------------------------------------------------------------------|
1949 | 31184 |
408
(1)| 00:00:05 |
1 |
2 |
HASH GROUP BY
1949 | 31184 |
TABLE ACCESS FULL| CUSTOMERS | 55500 |
867K|
408
(1)| 00:00:05 |
406
(1)| 00:00:05 |
--------------------------------------------------------------------------------
SQL>
Without extended statistics QO estimate cardinality as 145*19/sqrt (2)~1948.097

But with extended statistics estimated cardinality will correct
SQL> begin DBMS_STATS.GATHER_TABLE_STATS(
2
'SH',
'CUSTOMERS',
5
1');
METHOD_OPT =>'FOR COLUMNS (cust_state_proince,country_id) size
end;

SQL> select count(*) from customers
2
group by CUST_STATE_PROVINCE,COUNTRY_ID;
Execution Plan
---------------------------------------------------------Plan hash value: 1577413243
------------------------------------------------------------------------------| Id
| Operation
| Name
| Rows
------------------------------------------------------------------------------|
145 | 2320 |
1 |
145 |
2320 |
408
(1)| 00:00:05
2 |
TABLE ACCESS FULL| CUSTOMERS | 55500 |
867K|
406
(1)| 00:00:05
HASH GROUP BY
408
(1)| 00:00:05
Identifying candidate columns to creating column groups based on workload statistics.

Oracle provides some procedures to finding candidate columns for column groups. But this method is not work
based on statistics or real data. It is looks works just find candidate columns from dynamic performance views
(like v$sql,v$sql_plan). It means in this case oracle do not investigate/find real columns correlation. Lets see
below example.
SQL> create table t_candidate

2
as
select
trunc(dbms_random.value(0,25)) p1,
trunc(dbms_random.value(0,20)) p2,
lpad(rownum,10,'0') padding
from
all_objects
where
10
rownum <= 10000
11
Table created.
SQL>
SQL> begin
2
dbms_stats.gather_table_stats(
user,
't_candidate',
cascade => true,
method_opt=> 'for all columns size 1');
end;

SQL> Exec DBMS_STATS.SEED_COL_USAGE(null,null,120);
SQL> select count(*) from
t_candidate where
p1=19 and p2=14;
COUNT(*)
---------19
SQL>
SQL> select * from table(dbms_xplan.display_cursor);
PLAN_TABLE_OUTPUT
SQL_ID
9g4vdacy7pc62, child number 0
------------------------------------select count(*) from
t_candidate where
p1=19 and p2=14

| Id
| Operation
| Name
| Rows
|
-------------------------------------------------------------------------------|
12 (100)|
1 |
1 |
6 |
|*
2 |
TABLE ACCESS FULL| T_CANDIDATE |
20 |
120 |
SORT AGGREGATE
12
(0)| 00:00:01
-------------------------------------------------------------------------------Predicate Information (identified by operation id):

2 - filter(("P1"=19 AND "P2"=14))
19 rows selected.
SQL>
SET LONG 7000
SQL>
SET LONGCHUNKSIZE 7000
SQL>
SET LINESIZE 500
SQL>
Select dbms_stats.report_col_usage('SH','t_candidate') from dual ;
DBMS_STATS.REPORT_COL_USAGE('SH','T_CANDIDATE')
--------------------------------------------------------------------------------------------------LEGEND:
.......
EQ
: Used in single table EQuality predicate
RANGE
: Used in single table RANGE predicate
LIKE
: Used in single table LIKE predicate
NULL
: Used in single table is (not) NULL predicate
EQ_JOIN
: Used in EQuality JOIN predicate
NONEQ_JOIN : Used in NON EQuality JOIN predicate

FILTER
: Used in single table FILTER predicate
JOIN
: Used in JOIN predicate
---------------------------------------------------------------------------------------------------
GROUP_BY
: Used in GROUP BY expression
...............................................................................
###############################################################################
COLUMN USAGE REPORT FOR SH.T_CANDIDATE
......................................
1. P1
: EQ
2. P2
: EQ
3. (P1, P2)
: FILTER
SQL> select dbms_stats.create_extended_stats('SH','t_candidate') from dual;
DBMS_STATS.CREATE_EXTENDED_STATS('SH','T_CANDIDATE')
--------------------------------------------------------------------------------------------------###############################################################################
EXTENSIONS FOR SH.T_CANDIDATE

.............................
1. (P1, P2)
: SYS_STUIV1F__U9NUVZ7#MDKL81$SY created
###############################################################################
SQL> exec dbms_stats.gather_table_stats('SH','t_candidate',method_opt=>'for all columns size
skewonly for columns (p1,p2) size skewonly');
SQL> select column_name,num_distinct,histogram from user_tab_col_statistics where
table_name='T_CANDIDATE';
COLUMN_NAME
------------------------------ ------------ --------------P1
25 FREQUENCY
P2
20 FREQUENCY
PADDING
10000 HEIGHT BALANCED
SYS_STUIV1F__U9NUVZ7#MDKL81$SY
SQL> select count(*) from
COUNT(*)
----------
500 NONE
t_candidate where
p1=19 and p2=14;
19
SQL> select * from table(dbms_xplan.display_cursor);
PLAN_TABLE_OUTPUT
-----------------------------------------------------------SQL_ID
9g4vdacy7pc62, child number 0
------------------------------------select count(*) from
t_candidate where
p1=19 and p2=14

-------------------------------------------------------------------------------| Id
| Operation
| Name
| Rows
12 (100)|
1 |
1 |
6 |
|*
2 |
TABLE ACCESS FULL| T_CANDIDATE |
20 |
120 |
SORT AGGREGATE
12
(0)| 00:00:01
Predicate Information (identified by operation id):

2 - filter(("P1"=19 AND "P2"=14))
SQL>
So this method is not discover only correlated, result is: candidate columns also contain noncorrelated/independent columns.
Also QO can use column groups statistics through composite index without obviously adding to data dictionary.
In this case selectivity will calculate based on DISTINCT_KEYS of this index (but I fully not investigate that).
Another question can be related sql profile (SQP) and correlation data. Of course if here is columns correlation
then you can use SQP if in result of sql tuning adviser task appear accepting sql profile. SQP is collection of
internal hints (like opt_estimate), using offline optimization method it estimate selectivity/cardinality accurately
and give information for online optimizer to choosing best plan. Finally note that still Oracle`s QO cannot
(estimate selectivity of correlated columns) use extended statistics for non-equal, range and out of bound
predicate, may be in such cases need additional statistics(also gathering method) and will solve future releases.

Interpreting Extended Statistics

Transféré par

Informations du document

Description originale:

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Interpreting Extended Statistics

Transféré par

Droits d'auteur :

Formats disponibles

Interpreting

SQL> select count (*) from customers

Without histogram QO estimate cardinality as below

With histograms QO estimate cardinality as below

SQL> SELECT COUNT (*) ndv

Plan hash value: 2008213504

And from trace file

SINGLE TABLE ACCESS PATH

SQL> select num_rows,blocks from user_tables where table_name='CUSTOMERS';

From Histogram for column

From Histogram for column

They are frequency histograms. Therefore selectivity will be:

SINGLE TABLE ACCESS PATH

select substrb(dump(val,16,0,32),1,120) ep, cnt

*/mod("SYS_STU#S#WF25Z#QAHIHE#MOFFMM_",9999999999) val,count(*) cnt from

Therefore we can use below sql and result can be

Two Column Groups Case 1

Execution PlanPlan hash value: 2008213504

Histogram: Freq #Bkts: 19 UncompBkts: 55500 EndPtVals: 19

For column group CG1 we already know how calculated and

From the histogram information for this column

Two Column Groups Case 2

And our predicate isCUST_STATE_PROVINCE='CA' and COUNTRY_ID=52790 and CUST_CITY=51919 (P2). So

Sel (CG1) =sel (CG2)*sel (CUST_CITY_ID) (F1)

SQL> select * from customers

Histogram: HtBal #Bkts: 254 UncompBkts: 254 EndPtVals: 221

SQL> select column_name, num_distinct from user_tab_col_statistics

As you see QO chooseSYS_STULHUROKG217F9$OWA1IEIZLAvirtual column (Matches Full: #2) why this?

From histogram for country_id

IfCorStrengh(col1,col2,,coln)=1 it mean is they col1,col2,,coln columns are not correlated.

Extended Statistics and equijoin

PL/SQL procedure successfully completed.

---------------------------------------------------------------------------Predicate Information (identified by operation id):

SINGLE TABLE ACCESS PATH

Join selectivity will be 0.014706 and final cardinality 0.014706*10000*10000=1470600

SQL> select count(*) from (

| Bytes | Cost (%CPU)| Time

TABLE ACCESS FULL| CUSTOMERS | 55500 |

Without extended statistics QO estimate cardinality as 145*19/sqrt (2)~1948.097

METHOD_OPT =>'FOR COLUMNS (cust_state_proince,country_id) size

PL/SQL procedure successfully completed.

| Bytes | Cost (%CPU)| Time

TABLE ACCESS FULL| CUSTOMERS | 55500 |

Identifying candidate columns to creating column groups based on workload statistics.

SQL> create table t_candidate

rownum <= 10000

cascade => true,

method_opt=> 'for all columns size 1');

PL/SQL procedure successfully completed.

p1=19 and p2=14;

9g4vdacy7pc62, child number 0

------------------------------------select count(*) from

p1=19 and p2=14

Plan hash value: 374408457

| Bytes | Cost (%CPU)| Time

TABLE ACCESS FULL| T_CANDIDATE |

-------------------------------------------------------------------------------Predicate Information (identified by operation id):

SET LONG 7000

SET LONGCHUNKSIZE 7000

SET LINESIZE 500

Select dbms_stats.report_col_usage('SH','t_candidate') from dual ;

: Used in single table EQuality predicate

: Used in single table RANGE predicate

/mod("SYS_STU#S#WF25Z#QAHIHE#MOFFMM_",9999999999) val,count() cnt from

Join selectivity will be 0.014706 and final cardinality 0.0147061000010000=1470600