Hi We are using version 11.2.0.4 of oracle Exadata. We have a query which is suffering due to bad cardinality estimation and after digging into that i found one of the table which is partitioned is having issue in cardinailty estimation with a filter on one column(INDICATOR). Below is the query and its plan. and the statistics. Apology for the bad formatting of the table/column statistics, somehow i am not able to make the alignment proper in this editor.
The global column stats shows there exists no histogram on the column and num_distinct as "2" as it was an indicator(Y/N), so i was expecting the cardinality estimation to be num_rows/num_distinct=1335,425,399/2=667,712,700 , but the plan shows estimation as 1. I did notice the density column is having some very low value (5.56393360313218E-10) stored in it , ideally it should be .5 considering two distinct values and no histogram, so want to understand why its happening and if this density column is the cause of this bad cardinality estimation? And why this density is coming so bad?
Again when i put the additional filter on partition key as then the estimation shows as ~76million which is correct one. it seems to be coming from DBA_PART_HISTOGRAMS. still not able to decode the value stored in ENDPOINT_VALUE to validate?
This table is range partitione on column SDT and column "INDICATOR" is only holding values 'Y' and 'N'. And if i just query the specific partition, i see below results.
SQL> select INDICATOR,count(*) from RSTDL partition(RSTDL_20181122_P) group by INDICATOR;
D COUNT(*)
- ----------
Y 76377353
N 126695652
Global Table stats:-
TABLE_NAME NUM_ROWS BLOCKS AVG_ROW_LEN SAMPLE_SIZE LAST_ANALYZED GLOBAL_STATS
RSTDL 1335425399 49673661 268 4916 11/24/2018 12:33:29 PM YES
All individual Partition stats :-
TABLE_NAME PARTITION_NAME NUM_ROWS BLOCKS AVG_ROW_LEN SAMPLE_SIZE LAST_ANALYZED GLOBAL_STATS
RSTDL RSTDL_20181122_P 203073005 7543488 256 203073005 11/23/2018 12:49:11 AM YES
RSTDL RSTDL_20170729_P 154399686 5757723 254 154399686 7/30/2017 5:37:14 AM YES
RSTDL RSTDL_20181119_P 195229089 7269843 255 195229089 11/20/2018 12:38:35 AM YES
RSTDL RSTDL_20181120_P 212601907 7905866 257 212601907 11/21/2018 12:38:47 AM YES
RSTDL RSTDL_20181121_P 211516854 7861260 257 211516854 11/22/2018 12:38:43 AM YES
RSTDL RSTDL_20181123_P 217522243 8082610 257 217522243 11/24/2018 12:39:42 AM YES
RSTDL RSTDL_20181124_P 155855612 5800301 256 155855612 11/24/2018 2:03:14 PM YES
Global column stats:-
TABLE_NAME COLUMN_NAME NUM_DISTINCT LOW_VALUE HIGH_VALUE DENSITY NUM_NULLS NUM_BUCKETS LAST_ANALYZED SAMPLE_SIZE GLOBAL_STATS AVG_COL_LEN HISTOGRAM
RSTDL INDICATOR 2 4E 59 5.56393360313218E-10 0 1 11/24/2018 12:51:17 PM 1339276439 YES 2 NONE
Column stats specific to the one partition in which filter applied:-
TABLE_NAME PARTITION_NAME COLUMN_NAME NUM_DISTINCT LOW_VALUE HIGH_VALUE DENSITY NUM_NULLS NUM_BUCKETS LAST_ANALYZED SAMPLE_SIZE GLOBAL_STATS AVG_COL_LEN HISTOGRAM
RSTDL RSTDL_20181122_P INDICATOR 2 4E 59 2.45109735048318E-9 0 2 11/23/2018 12:46:52 AM 5528 YES 2 FREQUENCY
Result from DBA_PART_HISTOGRAMS for the specific column and partition:-
OWNER TABLE_NAME PARTITION_NAME COLUMN_NAME BUCKET_NUMBER ENDPOINT_VALUE ENDPOINT_ACTUAL_VALUE
user1 RSTDL RSTDL_20181122_P INDICATOR 3442 4.04999154965717E35
user1 RSTDL RSTDL_20181122_P INDICATOR 5528 4.621144204096E35
Result from DBA_tab_HISTOGRAMS for that specific column:-
OWNER TABLE_NAME COLUMN_NAME ENDPOINT_NUMBER ENDPOINT_VALUE ENDPOINT_ACTUAL_VALUE
user1 RSTDL INDICATOR 0 4.04999154965717E35
user1 RSTDL INDICATOR 1 4.621144204096E35
SQL> select * from RSTDL
2 where RSTDL.INDICATOR = 'Y';
Elapsed: 00:00:00.05
Execution Plan
----------------------------------------------------------
Plan hash value: 1587933015
-----------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 268 | 14M (1)| 43:12:14 | | |
| 1 | PARTITION RANGE ALL | | 1 | 268 | 14M (1)| 43:12:14 | 1 | 11 |
|* 2 | TABLE ACCESS STORAGE FULL| RSTDL | 1 | 268 | 14M (1)| 43:12:14 | 1 | 11 |
-----------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - storage("RSTDL"."INDICATOR"='Y')
filter("RSTDL"."INDICATOR"='Y')
SQL> select * from RSTDL
2 where RSTDL.INDICATOR = 'Y'
3 and SDT='22-nov-2018';
Elapsed: 00:00:00.08
Execution Plan
----------------------------------------------------------
Plan hash value: 147367774
-----------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 76M| 18G| 2153K (1)| 06:34:54 | | |
| 1 | PARTITION RANGE SINGLE | | 76M| 18G| 2153K (1)| 06:34:54 | 5 | 5 |
|* 2 | TABLE ACCESS STORAGE FULL| RSTDL | 76M| 18G| 2153K (1)| 06:34:54 | 5 | 5 |
-----------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - storage("RSTDL"."INDICATOR"='Y' AND "SDT"=TO_DATE(' 2018-11-22 00:00:00',
'syyyy-mm-dd hh24:mi:ss'))
filter("RSTDL"."INDICATOR"='Y' AND "SDT"=TO_DATE(' 2018-11-22 00:00:00',
'syyyy-mm-dd hh24:mi:ss'))