Hi,
I've got a big table (350 million records) with a "full name" column.
That column has some typos, so I have to 'normalize' the data (only for that column), using UTL_MATCH.JARO_WINKLER_SIMILARITY.
I've done some tests with a reduced table, and this works for showing the similar names:
SELECT a.name, b.name FROM typotable a, typotable b WHERE utl_match.jaro_winkler_similarity(a.name,b.name) BETWEEN 85 and 99 AND a.rowid>b.rowid;
But:
1) The test table was small, using this code directly on the 350 million records table would take ages... what can be done about that?
2) This just show the similar names. How could I update the table while looking for similarities, choosing any of them as the only value for each name?
Thanks