SQL & PL/SQL

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Normalize names in a huge table using UTL_MATCH

1590733Feb 26 2015 — edited Feb 27 2015

Hi,

I've got a big table (350 million records) with a "full name" column.

That column has some typos, so I have to 'normalize' the data (only for that column), using UTL_MATCH.JARO_WINKLER_SIMILARITY.

I've done some tests with a reduced table, and this works for showing the similar names:

SELECT a.name, b.name FROM typotable a, typotable b WHERE utl_match.jaro_winkler_similarity(a.name,b.name) BETWEEN 85 and 99 AND a.rowid>b.rowid;

But:

1) The test table was small, using this code directly on the 350 million records table would take ages... what can be done about that?

2) This just show the similar names. How could I update the table while looking for similarities, choosing any of them as the only value for each name?

Thanks

This post has been answered by BluShadow on Feb 27 2015

Jump to Answer

Locked Post

New comments cannot be posted to this locked post.

Locked on Mar 27 2015

Added on Feb 26 2015

9 comments

873 views