Running Oracle 12.1.0.2.0 on Solaris Sparc 11.3.
I have a query I've been asked to optimize. After replacing views with base tables and eliminating redundant joins, replacing joins with exists were their only use was to filter results etc... I was left with something that ran pretty well. But... I did notice that the execution plan was producing some pretty incorrect cardinality estimates. I narrowed in on the pieces that are causing the problem and was left with the following query.
select /*+ PARALLEL(2) DW2*/
i.ods_invoice_id
, il.offering_id
from invoice_ods_admin.invoice i
, invoice_ods_admin.invoice_line il
where 1=1
and i.current_ind = 1
and i.ods_invoice_id = il.ods_invoice_id
and i.delivery_date = il.delivery_date
and i.delivery_date between to_date('05/28/2017','mm-dd-yyyy') and to_date('07/09/2017','mm-dd-yyyy');
The original query submitted had hard-coded dates instead of bind variables (which I asked them to change) but to replicate the issue, I am keeping them here. The estimates for each individual table are pretty correct. 17.8M in INVOICE_LINE table and 1.4M in INVOICE table.
Note: there are no histograms on delivery_date column.
However, I noticed two issues when I run the query.
SQL_ID f88czbnkza4jt, child number 0
-------------------------------------
select /*+ PARALLEL(2) DW2*/ invoice.ods_invoice_id
, invoice_line.offering_id from
invoice_ods_admin.invoice , invoice_ods_admin.invoice_line where
1=1 and invoice.current_ind = 1 and invoice.ods_invoice_id =
invoice_line.ods_invoice_id and invoice.delivery_date =
invoice_line.delivery_date and invoice.delivery_date between
to_date('05/28/2017','mm-dd-yyyy') and
to_date('07/09/2017','mm-dd-yyyy')
Plan hash value: 2751839929
-------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib |
-------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 1564K(100)| | | | |
| 1 | PX COORDINATOR | | | | | | | | |
| 2 | PX SEND QC (RANDOM) | :TQ10001 | 1409K| 53M| 1564K (1)| 00:01:02 | Q1,01 | P->S | QC (RAND) |
|* 3 | HASH JOIN BUFFERED | | 1409K| 53M| 1564K (1)| 00:01:02 | Q1,01 | PCWP | |
| 4 | PX BLOCK ITERATOR | | 1409K| 24M| 351K (1)| 00:00:14 | Q1,01 | PCWC | |
|* 5 | TABLE ACCESS FULL | INVOICE | 1409K| 24M| 351K (1)| 00:00:14 | Q1,01 | PCWP | |
| 6 | PX RECEIVE | | 17M| 361M| 1212K (1)| 00:00:48 | Q1,01 | PCWP | |
| 7 | PX SEND BROADCAST | :TQ10000 | 17M| 361M| 1212K (1)| 00:00:48 | Q1,00 | S->P | BROADCAST |
| 8 | PX SELECTOR | | | | | | Q1,00 | SCWC | |
| 9 | TABLE ACCESS BY INDEX ROWID BATCHED| INVOICE_LINE | 17M| 361M| 1212K (1)| 00:00:48 | Q1,00 | SCWC | |
|* 10 | INDEX RANGE SCAN | INVOICE_LINE_IX2 | 18M| | 105K (1)| 00:00:05 | Q1,00 | SCWP | |
-------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("INVOICE"."ODS_INVOICE_ID"="INVOICE_LINE"."ODS_INVOICE_ID" AND
"INVOICE"."DELIVERY_DATE"="INVOICE_LINE"."DELIVERY_DATE")
5 - access(:Z>=:Z AND :Z<=:Z)
filter(("INVOICE"."DELIVERY_DATE">=TO_DATE(' 2017-05-28 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "INVOICE"."CURRENT_IND"=1
AND "INVOICE"."DELIVERY_DATE"<=TO_DATE(' 2017-07-09 00:00:00', 'syyyy-mm-dd hh24:mi:ss')))
10 - access("INVOICE_LINE"."DELIVERY_DATE">=TO_DATE(' 2017-05-28 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
"INVOICE_LINE"."DELIVERY_DATE"<=TO_DATE(' 2017-07-09 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
Note
-----
- dynamic statistics used: dynamic sampling (level=7)
- Degree of Parallelism is 2 because of hint
1) The row estimate after the two tables are joined is the same as the lower of the two estimates (1.4M rows). This is not correct. The actual rows returned is 17.8M, as I would have expected based on the fact that the invoice_id and delivery_date must match in both tables.
So why is the planner estimating the lower of the two for the cardinality after it performs the HASH JOIN BUFFERED operation?
2) If I do not use the parallel hint, the query planner uses an additional index on the INVOICE table and then performs an index join. Which is what I would expect to happen even with the parallel hint. Should I presume that the optimizer has determined that it is better to parallelize and use a full table scan rather than use any available indexes to filter the INVOICE records?
SQL_ID bf1q08g77d547, child number 0
-------------------------------------
select /* PARALLEL(2) DW1*/ invoice.ods_invoice_id
, invoice_line.offering_id from
invoice_ods_admin.invoice , invoice_ods_admin.invoice_line where
1=1 and invoice.current_ind = 1 and invoice.ods_invoice_id =
invoice_line.ods_invoice_id and invoice.delivery_date =
invoice_line.delivery_date and invoice.delivery_date between
to_date('05/28/2017','mm-dd-yyyy') and
to_date('07/09/2017','mm-dd-yyyy')
Plan hash value: 1084986235
-----------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | | 1656K(100)| |
|* 1 | HASH JOIN | | 737K| 28M| 21M| 1656K (1)| 00:01:05 |
|* 2 | VIEW | index$_join$_001 | 737K| 12M| | 412K (1)| 00:00:17 |
|* 3 | HASH JOIN | | | | | | |
|* 4 | HASH JOIN | | | | | | |
|* 5 | INDEX RANGE SCAN | INVOICE_IX4 | 737K| 12M| | 6257 (1)| 00:00:01 |
|* 6 | INDEX FAST FULL SCAN | INVOICE_IX1 | 737K| 12M| | 277K (1)| 00:00:11 |
| 7 | INDEX FAST FULL SCAN | INVOICE_PK | 737K| 12M| | 171K (1)| 00:00:07 |
| 8 | TABLE ACCESS BY INDEX ROWID BATCHED| INVOICE_LINE | 18M| 397M| | 1212K (1)| 00:00:48 |
|* 9 | INDEX RANGE SCAN | INVOICE_LINE_IX2 | 18M| | | 105K (1)| 00:00:05 |
-----------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("INVOICE"."ODS_INVOICE_ID"="INVOICE_LINE"."ODS_INVOICE_ID" AND
"INVOICE"."DELIVERY_DATE"="INVOICE_LINE"."DELIVERY_DATE")
2 - filter(("INVOICE"."DELIVERY_DATE">=TO_DATE(' 2017-05-28 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
"INVOICE"."DELIVERY_DATE"<=TO_DATE(' 2017-07-09 00:00:00', 'syyyy-mm-dd hh24:mi:ss')))
3 - access(ROWID=ROWID)
4 - access(ROWID=ROWID)
5 - access("INVOICE"."DELIVERY_DATE">=TO_DATE(' 2017-05-28 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
"INVOICE"."DELIVERY_DATE"<=TO_DATE(' 2017-07-09 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
6 - filter("INVOICE"."CURRENT_IND"=1)
9 - access("INVOICE_LINE"."DELIVERY_DATE">=TO_DATE(' 2017-05-28 00:00:00', 'syyyy-mm-dd hh24:mi:ss')
AND "INVOICE_LINE"."DELIVERY_DATE"<=TO_DATE(' 2017-07-09 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
Note
-----
- this is an adaptive plan
3) The cardinality estimates are actually lower if I do not use the parallel hint. I presume this is because of dynamic statistics? In the query with the parallel hint, there is a note saying: "dynamic statistics used: dynamic sampling (level=7)".
Why would dynamic statistics be used when using parallel int and not with the regular query
Appreciate help in understanding these questions. Glad to provide any other information as required.
Dave