Skip to Main Content

Analytics Software

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Deduplication in ODI 10g

724168Apr 28 2011 — edited Apr 29 2011
Hi All,

I was just wondering if ODI 10g alone can and is intended to perform deduplication (data cleansing)? By deduplication, I mean that from an error table (where the duplicates were stored), we'll perform a series of pattern matching checks until eventually, only one record is left among the duplicates and loaded to the target.

Example of the checks needed to be done:

Assume:
Table Columns: col0 | col1 | col2 | col3 | col4 | col5
*PK is col0

Scenario#1: between duplicate records (same PK), if their non-PK col1 value is the same and their non-PK col2 value is the same, select the record where non-PK col3 is not equal to 0; if there is no such record or if there are still 2 or more duplicate records that satisfy this, apply another rule

Scenario#2: between duplicate records (same PK), if their non-PK col1 value is the same but their non-PK col2 is different, select the record with the most number of fields that are not null; if there are records with the same number of 'not null' fields, select the one where col4 is filled over the record where col5 is filled, etc.

Scenario#3: ...
Scenario#4: ...
...

Basing on the data integrity checks available in ODI, this cannot be done without doing a manual way of cleansing (setting up one interface per scenario/case in the set of pattern matching rules and creating a lot of temporary staging tables) which will result in a more complex setup. Is this a correct approach or is there an alternative way of doing this? (customizing the KM, etc.) Will the ODQ tool be useful in this kind of situation? Hope someone knowledgeable on this would be able to help.

Thanks a lot,
Marco
Comments
Locked Post
New comments cannot be posted to this locked post.
Post Details
Locked on May 27 2011
Added on Apr 28 2011
1 comment
475 views