SQL & PL/SQL

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Performance Tune a MERGE with no Primary or Unique Keys

Sam_PAug 5 2017 — edited Aug 12 2017

@"Stew Ashton" @"Paulzip"

Hello,

I am in a bit of a bind with an ETL task at hand and hoping to get some ideas on how to reduce the time it is taking to compare a SOURCE table with a TARGET table. The SOURCE table gets refreshed (truncated and records re-loaded with Informatica tool on a daily basis) by another department and I have no ADMIN-level rights or ownership to do anything on that side. I simply have SELECT rights on that particular SOURCE table. On the TARGET side, I have full rights to connect to PROXY CONNECT as the TARGET_OWNER and perform all DDL and DML operations.

The SOURCE table from that department contains 10 million records, however, there is a small filter in the WHERE clause that I apply, which pulls 8 million records (a subset) from the SOURCE table and creates a completely re-freshable Materialized View object in the TARGET schema. The daily full-refresh of the MV takes about 2.5 mins only, which is very impressive!. This MV is growing by about 250,000 records a day, when refreshed on a daily basis.

This MV in TARGET schema, in turn, acts as my SOURCE object from here on as part of my ETL automation using custom-made procedures in PL/SQL as I do not have any ETL tools like Informatica. Since both of the objects -- SOURCE_MV and TARGET_TABLE are all in the same TARGET_OWNER's schema, I then proceed with "sync'ing" them on a daily basis using the awesome utility package made by @"Stew Ashton" from https://stewashton.wordpress.com/2015/01/21/stew_compare_sync-introducing-the-package/ and I'm using the compare_sync.sync_sql package function call inside a custom-made procedure to retrieve the generated MERGE code spewed-out by this package and then dynamically execute it using the EXECUTE IMMEDIATE statement. The SOURCE_MV and TARGET_TABLE both do not have any unique primary keys and all columns combined together, do not uniquely identify a unique record due to duplicate records. I have contacted the other department's ETL person to look into his side of the automation if there's something he can do to reduce loading duplicate records. Furthermore, the SOURCE_MV has no indexes but the TARGET_TABLE has one CONTEXT index on a column which contains a lot of "free-text" entered values which more Text Mining is required in the project. This CONTEXT index is synced soon after the data is MERGED or "sync'ed" into the TARGET_TABLE. This CONTEXT index sync operation only takes 2.5 mins currently.

The issue is when it comes to "sync'ing data" using the MERGE code that compares the SOURCE_MV and TARGET_TABLE for any changed records (inserts, updates and deletes) which is currently taking 10 mins to process 8 million records as of today and given that the "delta" of 250K per day, this will keep on taking longer. Here's the MERGE code generated by the package with some slight additions of performance optimizer hints that I have added, mainly PARALLEL(n) and NOLOGGING, just to see if I can speed up, however, it did no speed-up significantly:-

merge /*+ use_nl(O) parallel(8) */ into HEART_FAILURE.TB_EF_ECHO_OTHERS_HISTORY O

using (

select /*+ parallel(8) nologging */ *

from (

select /*+ parallel(8) nologging */

SUM(Z##FLAG) over(partition by "PROCEDURETYPE", "STUDYID", "MRN", "STUDYSTARTDATETIME", "SITE", "STUDYTYPE", "REPORTGROUPDESC", "REPORTCOMMENTS") Z##NUM_ROWS,

COUNT(NULLIF(Z##FLAG,-1)) over(partition by "PROCEDURETYPE", "STUDYID", "MRN", "STUDYSTARTDATETIME", "SITE", "STUDYTYPE", "REPORTGROUPDESC", "REPORTCOMMENTS" order by null rows unbounded preceding) Z##NEW,

COUNT(NULLIF(Z##FLAG,1)) over(partition by "PROCEDURETYPE", "STUDYID", "MRN", "STUDYSTARTDATETIME", "SITE", "STUDYTYPE", "REPORTGROUPDESC", "REPORTCOMMENTS" order by null rows unbounded preceding) Z##OLD,

a.*

from (

select /*+ parallel(8) nologging */ "PROCEDURETYPE", "STUDYID", "MRN", "STUDYSTARTDATETIME", "SITE", "STUDYTYPE", "REPORTGROUPDESC", "REPORTCOMMENTS",

-1 Z##FLAG,

rowid Z##RID

from HEART_FAILURE.TB_EF_ECHO_OTHERS_HISTORY O

union all

select /*+ parallel(8) nologging */ "PROCEDURETYPE", "STUDYID", "MRN", "STUDYSTARTDATETIME", "SITE", "STUDYTYPE", "REPORTGROUPDESC", "REPORTCOMMENTS",

1 Z##FLAG,

null

from HEART_FAILURE.MV_EF_ECHO_OTHERS_SRC_SUBSET N

) a

)

where Z##NUM_ROWS != 0

and SIGN(Z##NUM_ROWS) = Z##FLAG

and ABS(Z##NUM_ROWS) >= case SIGN(Z##NUM_ROWS) when 1 then Z##NEW else Z##OLD end

) N

on (O.rowid = N.Z##RID)

when matched then

update set "PROCEDURETYPE" = N."PROCEDURETYPE"

delete where 1=1

when not matched then

insert ( "PROCEDURETYPE", "STUDYID", "MRN", "STUDYSTARTDATETIME", "SITE", "STUDYTYPE", "REPORTGROUPDESC", "REPORTCOMMENTS")

values ( N."PROCEDURETYPE", N."STUDYID", N."MRN", N."STUDYSTARTDATETIME", N."SITE", N."STUDYTYPE", N."REPORTGROUPDESC", N."REPORTCOMMENTS");

I am running the above code inside a procedure using the EXECUTE IMMEDIATE on 11gR2 (11.2.0.4) database server.

Any way I could further optimize / performance tune the above MERGE query and further reduce the time it takes to compare and sync? I have noticed that even if there are no new records added or updated or deleted in the SOURCE_MV, it still takes the above code 10 mins (as of now) to fully compare, even when there are no records to sync per say.

Thank you for all the help in advance.

This post has been answered by Stew Ashton on Aug 11 2017

Jump to Answer

Locked Post

New comments cannot be posted to this locked post.

Locked on Sep 8 2017

Added on Aug 5 2017

51 comments

2,771 views