Good afternoon,
I'm requesting your expertise/experience with a SELECT statement running faster when tables have no statistics. I'm trying to understand/fine a way to correct the behavior (right now my workaround is to remove statistics on those tables before the execution)
oracle version: Version 19.8.0.0.0
The SQL statement is below (please note that the end-user can only modify a part of the WHERE CLAUSE, if the whole SQL would have to be corrected it would mean a fix at the level of the software)
SELECT
REFERENCE_ID
FROM
STPDOC_ENTRY_TABLE
WHERE
/* customization starts here */
FC_ID = 0
AND DOC_PARENT_ID in
(SELECT
DOC.DOC_PARENT_ID
FROM
STPDOC_ENTRY_TABLE DOC
LEFT JOIN DLV_CASH_DBF DLV ON DLV.M_REFERENCE = DOC.DOC_PARENT_ID
WHERE
DLV.M_REFERENCE IS NULL
AND DOC.FC_ID = 0)
/* customization ends here */
AND
(XMLFLOW_STATUS IN
(SELECT
B.XMLFLOW_STATUS
FROM
MWP_TMP_STATUSES_2B_PURGED B)
OR
(XMLFLOW_STATUS = '~SYS_ERROR~'
AND XMLFLOW_ERROR_TYPE = 'D'))
ORDER BY
REFERENCE_ID;
Few facts about the tables
STPDOC_ENTRY_TABLE:
- #rows: 13815398
- # distinct FC\_ID: 1000001
- SELECT fc\_id, count(\*) FROM stpdoc\_entry\_table GROUP BY fc\_id ORDER BY count(\*) DESC;
| FC_ID | COUNT |
| 0 | 2754848 |
| 2100806412| 44 |
| 2100019476| 44 |
| 2100757468| 44 |
| 2100743915| 44 |
- # distinct XMLFLOW\_STATUS: 37
- SELECT XMLFLOW\_STATUS, count(\*) FROM stpdoc\_entry\_table GROUP BY XMLFLOW\_STATUS ORDER BY count(\*) DESC;
| XMLFLOW_STATUS | COUNT |
| 26976 | 2230119 |
| 27044 | 1531720 |
| 23940 | 1000000 |
| 27116 | 1000000 |
| 23647 | 1000000 |
- # distinct XMLFLOW\_ERROR\_TYPE: 1
DLV_CASH_DBF:
- #rows: 2230119
- # distinct M\_REFERENCE: 2230119
- SELECT M\_REFERENCE, count(\*) FROM dlv\_cash\_dbf GROUP BY M\_REFERENCE ORDER BY count(\*) desc
| M_REFERENCE | COUNT |
| 1810 | 1 |
| 1811 | 1 |
| 1812 | 1 |
| 1813 | 1 |
| 1814 | 1 |
MWP_TMP_STATUSES_2B_PURGED:
- #rows: 296
Explain plan without statistics
Remove any existing statistics with
exec DBMS_STATS.DELETE_TABLE_STATS(ownname=>USER,tabname=>'STPDOC_ENTRY_TABLE',no_invalidate=>FALSE);
exec DBMS_STATS.DELETE_TABLE_STATS(ownname=>USER,tabname=>'DLV_CASH_DBF',no_invalidate=>FALSE);
SELECT statement took less than a minute and the plan generated is available in the file explain_plan_without_statistics.txt
Explain plan with statistics
Statistics generated with command
exec DBMS_STATS.GATHER_TABLE_STATS(ownname=>USER, tabname=>'STPDOC_ENTRY_TABLE', estimate_percent=>DBMS_STATS.AUTO_SAMPLE_SIZE, method_opt=>'FOR COLUMNS FC_ID, DOC_PARENT_ID, XMLFLOW_STATUS, XMLFLOW_ERROR_TYPE SIZE SKEWONLY', degree=>DBMS_STATS.AUTO_DEGREE, cascade=>TRUE, no_invalidate=>FALSE);
exec DBMS_STATS.GATHER_TABLE_STATS(ownname=>USER, tabname=>'DLV_CASH_DBF', estimate_percent=>DBMS_STATS.AUTO_SAMPLE_SIZE, method_opt=>'FOR COLUMNS M_REFERENCE SIZE SKEWONLY', degree=>DBMS_STATS.AUTO_DEGREE, cascade=>TRUE, no_invalidate=>FALSE);
We can see statistics generated

If I execute the SQL again, this time it took 20 mins, explain plan is available in explain_plan_with_statistics.txt
Could you help me analyze/understand such difference?
I thank you in advance
Simon