The trick here would be to optimise the hash outer join, and I have a couple of suggestions:
Firstly, you could do this by hash partitioning (or subpartitioning) on the join keys. Partition-wise joins would reduce the chances of the join spilling to disk.
Secondly, you might be able to introduce a redundant predicate into the source query that can be used to partition prune on the target table.
The latter option works in a situation where you are merging a set of data into a table that has a limited range of values on the join key, and a classic example of this would be in maintaining an aggregate table in a data warehouse by merging in recent data only. The aggregate table might be range partitioned on date_of_transaction and cover a range of values over ten years, but the new data might just cover the most recent week. In that case you would like to be able to place a predicate directly on the target table to say "I'm only going to find a join on this range of rows". You can't do that directly but the optimiser can infer it if you place a redundant predicate in the USING clause.
Demonstration script:
create table src (col1 number, col2 number);
insert into src
select 2, rownum
from dual
connect by rownum < 10000
/
create table tgt (col1 number, col2 number)
partition by range (col1)
(
partition p1 values less than (2),
partition p2 values less than (3)
)
;
insert into tgt
select 1, rownum
from dual
connect by rownum < 100000
/
commit;
exec dbms_stats.gather_table_stats(user,'src');
exec dbms_stats.gather_table_stats(user,'tgt');
Table tgt only has values in partition P1, but the merge is only going to populate partition P2.
explain plan for
merge into tgt
using (select * from src) src
on (tgt.col1 = src.col1)
when matched then update
set col2 = src.col2
when not matched then insert
values (src.col1, src.col2)
/
select * from table(dbms_xplan.display)
/
explain plan succeeded.
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3718868795
-----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
-----------------------------------------------------------------------------------------------
| 0 | MERGE STATEMENT | | 9999 | 449K| 64 (5)| 00:00:01 | | |
| 1 | MERGE | TGT | | | | | | |
| 2 | VIEW | | | | | | | |
|* 3 | HASH JOIN OUTER | | 9999 | 126K| 64 (5)| 00:00:01 | | |
| 4 | TABLE ACCESS FULL | SRC | 9999 | 59994 | 6 (0)| 00:00:01 | | |
| 5 | PARTITION RANGE ALL| | 99999 | 683K| 56 (2)| 00:00:01 | 1 | 2 |
| 6 | TABLE ACCESS FULL | TGT | 99999 | 683K| 56 (2)| 00:00:01 | 1 | 2 |
-----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("TGT"."COL1"(+)="SRC"."COL1")
18 rows selected
So you see from the above that a full scan of both partitions of TGT is performed.
We introduce a redundant predicate into the USING clause:
explain plan for
merge into tgt
using (select * from src where col1 >= 2) src
on (tgt.col1 = src.col1)
when matched then update
set col2 = src.col2
when not matched then insert
values (src.col1, src.col2)
/
select * from table(dbms_xplan.display)
/
explain plan succeeded.
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2500172128
--------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
--------------------------------------------------------------------------------------------------
| 0 | MERGE STATEMENT | | 9999 | 449K| 9 (12)| 00:00:01 | | |
| 1 | MERGE | TGT | | | | | | |
| 2 | VIEW | | | | | | | |
|* 3 | HASH JOIN RIGHT OUTER | | 9999 | 429K| 9 (12)| 00:00:01 | | |
| 4 | PARTITION RANGE SINGLE| | 1 | 38 | 2 (0)| 00:00:01 | 2 | 2 |
| 5 | TABLE ACCESS FULL | TGT | 1 | 38 | 2 (0)| 00:00:01 | 2 | 2 |
|* 6 | TABLE ACCESS FULL | SRC | 9999 | 59994 | 6 (0)| 00:00:01 | | |
--------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("TGT"."COL1"(+)="SRC"."COL1")
6 - filter("COL1">=2)
19 rows selected
You see from the plan that only one partition of TGT is now being scanned, and as that is empty in this case it will be a very fast action.
Even without partition pruning this could well be more efficient than the "vanilla" alternative by giving the optimiser more information about the size of the subset of TGT rows that are being merged into and giving the opportunity of a more efficient path for reading them.