Hi all, I am using Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production and I am trying to take an estimation from a join between two very large tables (550 millions each one) with some specific criteria
TABLE_A and TABLE_B
I want to take the estimation in order avoid runing the real query if the results are above a limit
So, the structure of my query is as follows :
SELECT APPROX_COUNT_DISTINCT(TABLE_A_ID)
FROM TABLE_A sample(20) seed(1000) A
INNER JOIN TABLE_B B ON A.COLUMN_1 = B.COLUMN_1
AND A.COLUMN_2 = B.COLUMN_2
AND A.COLUMN_3 = B.COLUMN_3
WHERE A.COLUMN_4 = 'TEST_VALUE_1'
AND A.COLUMN_5 = 'TEST_VALUE_2'
FETCH FIRST 10000 ROWS ONLY ;
Then I multiply the results with the factor 100/20 (because the query uses sample 20%)
This query runs for about 300secs.
Is there any way to improve this estimation in time, even if I lose some accuracy ?
The execution plan of the above query is :
-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10000 | 253K| 1548 (1)| 00:00:01 |
|* 1 | VIEW | | 10000 | 253K| 1548 (1)| 00:00:01 |
|* 2 | WINDOW NOSORT STOPKEY| | 1 | 156 | 1548 (1)| 00:00:01 |
| 3 | SORT AGGREGATE APPROX| | 1 | 156 | | |
|* 4 | HASH JOIN | | 6038 | 919K| 1548 (1)| 00:00:01 |
|* 5 | INDEX SAMPLE FAST FULL SCAN| IDX_TABLE_A_COL9 | 3082 | 117K| 70 (0)| 00:00:01 |
| 6 | VIEW | index$_join$_002 | 30818 | 3521K| 1478 (1)| 00:00:01 |
|* 7 | HASH JOIN | | | | | |
|* 8 | INDEX FAST FULL SCAN | IDX_TABLE_B_COL9_COL4_COL5| 30818 | 3521K| 614 (0)| 00:00:01 |
| 9 | INDEX FAST FULL SCAN | SYS_C0010781 | 30818 | 3521K| 938 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------