Skip to Main Content

SQL & PL/SQL

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Interested in getting your voice heard by members of the Developer Marketing team at Oracle? Check out this post for AppDev or this post for AI focus group information.

Best way to remove duplicates based on multiple tables

PyrocksMar 22 2012 — edited Mar 29 2012
Hi,

I have a mechanism which loads flat files into multiple tables (can be up to 6 different tables) using external tables.
Whenever a new file arrives, I need to insert duplicate rows to a side table, but the duplicate rows are to be searched in all 6 tables according to a given set of columns which exist in all of them.

In the SQL Server Version of the same mechanism (which i'm migrating to Oracle) it uses an additional "UNIQUE" table with only 2 columns(Checksum1, Checksum2) which hold the checksum values of 2 different sets of columns per inserted record. when a new file arrives it computes these 2 checksums for every record and look it up in the unique table to avoid searching all the different tables.
We know that working with checksums is not bulletproof but with those sets of fields it seems to work.

My questions are:
should I use the same checksums mechanism? if so, should I use the owa_opt_lock.checksum function to calculate the checksums?
Or should I look for duplicates in all tables one after the other (indexing some of the columns we check for duplicates with)?

Note:
These tables are partitioned with day partitions and can be very large.

Any advice would be welcome.
Thanks.
Comments
Locked Post
New comments cannot be posted to this locked post.
Post Details
Locked on Apr 26 2012
Added on Mar 22 2012
5 comments
2,021 views