SQL & PL/SQL

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Incorrect cardinality estimates on join (and odd behavior with parallel hint)

DaithiJul 17 2017 — edited Jul 17 2017

Running Oracle 12.1.0.2.0 on Solaris Sparc 11.3.

I have a query I've been asked to optimize. After replacing views with base tables and eliminating redundant joins, replacing joins with exists were their only use was to filter results etc... I was left with something that ran pretty well. But... I did notice that the execution plan was producing some pretty incorrect cardinality estimates. I narrowed in on the pieces that are causing the problem and was left with the following query.

select  /*+ PARALLEL(2) DW2*/
        i.ods_invoice_id
,       il.offering_id
from    invoice_ods_admin.invoice i
,       invoice_ods_admin.invoice_line il
where   1=1
and     i.current_ind = 1
and     i.ods_invoice_id = il.ods_invoice_id
and     i.delivery_date = il.delivery_date
and     i.delivery_date between to_date('05/28/2017','mm-dd-yyyy') and to_date('07/09/2017','mm-dd-yyyy');

The original query submitted had hard-coded dates instead of bind variables (which I asked them to change) but to replicate the issue, I am keeping them here. The estimates for each individual table are pretty correct. 17.8M in INVOICE_LINE table and 1.4M in INVOICE table.

Note: there are no histograms on delivery_date column.

However, I noticed two issues when I run the query.

SQL_ID  f88czbnkza4jt, child number 0
-------------------------------------
select  /*+ PARALLEL(2) DW2*/         invoice.ods_invoice_id 
,       invoice_line.offering_id from    
invoice_ods_admin.invoice ,       invoice_ods_admin.invoice_line where  
 1=1 and     invoice.current_ind = 1 and     invoice.ods_invoice_id = 
invoice_line.ods_invoice_id and     invoice.delivery_date = 
invoice_line.delivery_date and     invoice.delivery_date between 
to_date('05/28/2017','mm-dd-yyyy') and 
to_date('07/09/2017','mm-dd-yyyy')
Plan hash value: 2751839929
-------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                 | Name             | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
-------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                          |                  |       |       |  1564K(100)|          |        |      |            |
|   1 |  PX COORDINATOR                           |                  |       |       |            |          |        |      |            |
|   2 |   PX SEND QC (RANDOM)                     | :TQ10001         |  1409K|    53M|  1564K  (1)| 00:01:02 |  Q1,01 | P->S | QC (RAND)  |
|*  3 |    HASH JOIN BUFFERED                     |                  |  1409K|    53M|  1564K  (1)| 00:01:02 |  Q1,01 | PCWP |            |
|   4 |     PX BLOCK ITERATOR                     |                  |  1409K|    24M|   351K  (1)| 00:00:14 |  Q1,01 | PCWC |            |
|*  5 |      TABLE ACCESS FULL                    | INVOICE          |  1409K|    24M|   351K  (1)| 00:00:14 |  Q1,01 | PCWP |            |
|   6 |     PX RECEIVE                            |                  |    17M|   361M|  1212K  (1)| 00:00:48 |  Q1,01 | PCWP |            |
|   7 |      PX SEND BROADCAST                    | :TQ10000         |    17M|   361M|  1212K  (1)| 00:00:48 |  Q1,00 | S->P | BROADCAST  |
|   8 |       PX SELECTOR                         |                  |       |       |            |          |  Q1,00 | SCWC |            |
|   9 |        TABLE ACCESS BY INDEX ROWID BATCHED| INVOICE_LINE     |    17M|   361M|  1212K  (1)| 00:00:48 |  Q1,00 | SCWC |            |
|* 10 |         INDEX RANGE SCAN                  | INVOICE_LINE_IX2 |    18M|       |   105K  (1)| 00:00:05 |  Q1,00 | SCWP |            |
-------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("INVOICE"."ODS_INVOICE_ID"="INVOICE_LINE"."ODS_INVOICE_ID" AND 
              "INVOICE"."DELIVERY_DATE"="INVOICE_LINE"."DELIVERY_DATE")
   5 - access(:Z>=:Z AND :Z<=:Z)
       filter(("INVOICE"."DELIVERY_DATE">=TO_DATE(' 2017-05-28 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "INVOICE"."CURRENT_IND"=1 
              AND "INVOICE"."DELIVERY_DATE"<=TO_DATE(' 2017-07-09 00:00:00', 'syyyy-mm-dd hh24:mi:ss')))
  10 - access("INVOICE_LINE"."DELIVERY_DATE">=TO_DATE(' 2017-05-28 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND 
              "INVOICE_LINE"."DELIVERY_DATE"<=TO_DATE(' 2017-07-09 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
Note
-----
   - dynamic statistics used: dynamic sampling (level=7)
   - Degree of Parallelism is 2 because of hint

1) The row estimate after the two tables are joined is the same as the lower of the two estimates (1.4M rows). This is not correct. The actual rows returned is 17.8M, as I would have expected based on the fact that the invoice_id and delivery_date must match in both tables.

So why is the planner estimating the lower of the two for the cardinality after it performs the HASH JOIN BUFFERED operation?

2) If I do not use the parallel hint, the query planner uses an additional index on the INVOICE table and then performs an index join. Which is what I would expect to happen even with the parallel hint. Should I presume that the optimizer has determined that it is better to parallelize and use a full table scan rather than use any available indexes to filter the INVOICE records?

SQL_ID  bf1q08g77d547, child number 0
-------------------------------------
select  /* PARALLEL(2) DW1*/         invoice.ods_invoice_id 
,       invoice_line.offering_id from    
invoice_ods_admin.invoice ,       invoice_ods_admin.invoice_line where  
 1=1 and     invoice.current_ind = 1 and     invoice.ods_invoice_id = 
invoice_line.ods_invoice_id and     invoice.delivery_date = 
invoice_line.delivery_date and     invoice.delivery_date between 
to_date('05/28/2017','mm-dd-yyyy') and 
to_date('07/09/2017','mm-dd-yyyy')
Plan hash value: 1084986235
-----------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name             | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |                  |       |       |       |  1656K(100)|          |
|*  1 |  HASH JOIN                           |                  |   737K|    28M|    21M|  1656K  (1)| 00:01:05 |
|*  2 |   VIEW                               | index$_join$_001 |   737K|    12M|       |   412K  (1)| 00:00:17 |
|*  3 |    HASH JOIN                         |                  |       |       |       |            |          |
|*  4 |     HASH JOIN                        |                  |       |       |       |            |          |
|*  5 |      INDEX RANGE SCAN                | INVOICE_IX4      |   737K|    12M|       |  6257   (1)| 00:00:01 |
|*  6 |      INDEX FAST FULL SCAN            | INVOICE_IX1      |   737K|    12M|       |   277K  (1)| 00:00:11 |
|   7 |     INDEX FAST FULL SCAN             | INVOICE_PK       |   737K|    12M|       |   171K  (1)| 00:00:07 |
|   8 |   TABLE ACCESS BY INDEX ROWID BATCHED| INVOICE_LINE     |    18M|   397M|       |  1212K  (1)| 00:00:48 |
|*  9 |    INDEX RANGE SCAN                  | INVOICE_LINE_IX2 |    18M|       |       |   105K  (1)| 00:00:05 |
-----------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("INVOICE"."ODS_INVOICE_ID"="INVOICE_LINE"."ODS_INVOICE_ID" AND 
              "INVOICE"."DELIVERY_DATE"="INVOICE_LINE"."DELIVERY_DATE")
   2 - filter(("INVOICE"."DELIVERY_DATE">=TO_DATE(' 2017-05-28 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND 
              "INVOICE"."DELIVERY_DATE"<=TO_DATE(' 2017-07-09 00:00:00', 'syyyy-mm-dd hh24:mi:ss')))
   3 - access(ROWID=ROWID)
   4 - access(ROWID=ROWID)
   5 - access("INVOICE"."DELIVERY_DATE">=TO_DATE(' 2017-05-28 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND 
              "INVOICE"."DELIVERY_DATE"<=TO_DATE(' 2017-07-09 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
   6 - filter("INVOICE"."CURRENT_IND"=1)
   9 - access("INVOICE_LINE"."DELIVERY_DATE">=TO_DATE(' 2017-05-28 00:00:00', 'syyyy-mm-dd hh24:mi:ss') 
              AND "INVOICE_LINE"."DELIVERY_DATE"<=TO_DATE(' 2017-07-09 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
Note
-----
   - this is an adaptive plan

3) The cardinality estimates are actually lower if I do not use the parallel hint. I presume this is because of dynamic statistics? In the query with the parallel hint, there is a note saying: "dynamic statistics used: dynamic sampling (level=7)".

Why would dynamic statistics be used when using parallel int and not with the regular query

Appreciate help in understanding these questions. Glad to provide any other information as required.

Dave

Locked Post

New comments cannot be posted to this locked post.

Locked on Aug 14 2017

Added on Jul 17 2017

4 comments

447 views