Skip to Main Content

Database Software

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

How to create index - Fuzzy Matching and Stemming

909317Mar 7 2012 — edited Mar 9 2012
Hi,

I have a problem with creating and index for fuzzy matching and stemming with my expect behavious

I Have table customer and column sortname. This table ussally contains values in format: "Surname, FirstName"
"Lenox, Carl"
"Svensson, Max"
"Hamlberg, Ulf"

etc...

1. Previously I have index created like this:

CREATE INDEX IX_CUST_TMP_CTXSORTNAME ON CUSTOMER (SORTNAME) INDEXTYPE IS CTXSYS.CONTEXT PARAMETERS ('SYNC(ON COMMIT)');

If I run:
select * from CUSTOMER where contains(sortname,'Hamlberg') > 0; -- "Hamlberg, Ulf" is returned - OK
select * from CUSTOMER where contains(sortname,'Ulf') > 0; -- "Hamlberg, Ulf" is returned - OK

2. I drop the index and create new one for fuzzy matching and stemming

--NOTE: something is commented out, but I try to run pretty much all the possible varations

exec ctx_ddl.create_preference('MYYY_SWEDISH_LEXER', 'BASIC_LEXER');
--exec ctx_ddl.set_attribute('MYYY_SWEDISH_LEXER', 'printjoins', '_-');
--exec ctx_ddl.set_attribute('MYYY_SWEDISH_LEXER', 'skipjoins', ',');
exec ctx_ddl.set_attribute('MYYY_SWEDISH_LEXER', 'index_themes', 'NO');
exec ctx_ddl.set_attribute('MYYY_SWEDISH_LEXER', 'index_text', 'YES');
exec ctx_ddl.set_attribute('MYYY_SWEDISH_LEXER', 'index_stems', 'SWEDISH');
exec ctx_ddl.set_attribute('MYYY_SWEDISH_LEXER', 'alternate_spelling', 'SWEDISH');
exec ctx_ddl.set_attribute('MYYY_SWEDISH_LEXER', 'base_letter', 'YES');
exec ctx_ddl.set_attribute('MYYY_SWEDISH_LEXER', 'base_letter_type', 'GENERIC');
exec ctx_ddl.set_attribute('MYYY_SWEDISH_LEXER', 'override_base_letter', 'TRUE');
--exec ctx_ddl.set_attribute('MYYY_SWEDISH_LEXER', 'whitespace', ',');

exec ctx_ddl.create_preference('MYYY_SWEDISH_FUZZY_PREF', 'BASIC_WORDLIST');
exec ctx_ddl.set_attribute('MYYY_SWEDISH_FUZZY_PREF','FUZZY_MATCH','AUTO');
exec ctx_ddl.set_attribute('MYYY_SWEDISH_FUZZY_PREF','FUZZY_SCORE','60');
exec ctx_ddl.set_attribute('MYYY_SWEDISH_FUZZY_PREF','FUZZY_NUMRESULTS','100');
exec ctx_ddl.set_attribute('MYYY_SWEDISH_FUZZY_PREF','SUBSTRING_INDEX','TRUE');
exec ctx_ddl.set_attribute('MYYY_SWEDISH_FUZZY_PREF','STEMMER','AUTO');

create index IX_CUST_TMP_CTXSORTNAME on CUSTOMER (sortname) INDEXTYPE IS CTXSYS.CONTEXT PARAMETERS ('LEXER MYYY_SWEDISH_LEXER Wordlist MYYY_SWEDISH_FUZZY_PREF SYNC(ON COMMIT)');

If I run:
select * from CUSTOMER where contains(sortname,'Hamlberg') > 0; -- "Hamlberg, Ulf" is returned - OK
select * from CUSTOMER where contains(sortname,'Ulf') > 0; -- "Hamlberg, Ulf" is returned - NOT OK - Why it is not returned ??

3. Then I want to use fuzzy for searching names:

a. SELECT score(1), sortname FROM CUSTOMER where CONTAINS(sortname, 'fuzzy({Hamlberg}, 50, 10, weight)',1) > 0; - it should return "Hamlberg, Ulf"
b. SELECT score(1), sortname FROM CUSTOMER where CONTAINS(sortname, 'fuzzy({Ulf}, 50, 10, weight)',1) > 0; - it should return "Hamlberg, Ulf"
c. SELECT score(1), sortname FROM CUSTOMER where CONTAINS(sortname, 'fuzzy({Hamlberg Ulf}, 50, 10, weight)',1) > 0; - it should return "Hamlberg, Ulf"
d. SELECT score(1), sortname FROM CUSTOMER where CONTAINS(sortname, 'fuzzy({Hamlberg, Ulf}, 50, 10, weight)',1) > 0; - it should return "Hamlberg, Ulf"

Is it possible to do that somehow ?

Thx for answers.

Edited by: 906314 on 7.3.2012 4:01
Comments
Locked Post
New comments cannot be posted to this locked post.
Post Details
Locked on Apr 6 2012
Added on Mar 7 2012
14 comments
1,104 views