How to create index - Fuzzy Matching and Stemming
909317Mar 7 2012 — edited Mar 9 2012Hi,
I have a problem with creating and index for fuzzy matching and stemming with my expect behavious
I Have table customer and column sortname. This table ussally contains values in format: "Surname, FirstName"
"Lenox, Carl"
"Svensson, Max"
"Hamlberg, Ulf"
etc...
1. Previously I have index created like this:
CREATE INDEX IX_CUST_TMP_CTXSORTNAME ON CUSTOMER (SORTNAME) INDEXTYPE IS CTXSYS.CONTEXT PARAMETERS ('SYNC(ON COMMIT)');
If I run:
select * from CUSTOMER where contains(sortname,'Hamlberg') > 0; -- "Hamlberg, Ulf" is returned - OK
select * from CUSTOMER where contains(sortname,'Ulf') > 0; -- "Hamlberg, Ulf" is returned - OK
2. I drop the index and create new one for fuzzy matching and stemming
--NOTE: something is commented out, but I try to run pretty much all the possible varations
exec ctx_ddl.create_preference('MYYY_SWEDISH_LEXER', 'BASIC_LEXER');
--exec ctx_ddl.set_attribute('MYYY_SWEDISH_LEXER', 'printjoins', '_-');
--exec ctx_ddl.set_attribute('MYYY_SWEDISH_LEXER', 'skipjoins', ',');
exec ctx_ddl.set_attribute('MYYY_SWEDISH_LEXER', 'index_themes', 'NO');
exec ctx_ddl.set_attribute('MYYY_SWEDISH_LEXER', 'index_text', 'YES');
exec ctx_ddl.set_attribute('MYYY_SWEDISH_LEXER', 'index_stems', 'SWEDISH');
exec ctx_ddl.set_attribute('MYYY_SWEDISH_LEXER', 'alternate_spelling', 'SWEDISH');
exec ctx_ddl.set_attribute('MYYY_SWEDISH_LEXER', 'base_letter', 'YES');
exec ctx_ddl.set_attribute('MYYY_SWEDISH_LEXER', 'base_letter_type', 'GENERIC');
exec ctx_ddl.set_attribute('MYYY_SWEDISH_LEXER', 'override_base_letter', 'TRUE');
--exec ctx_ddl.set_attribute('MYYY_SWEDISH_LEXER', 'whitespace', ',');
exec ctx_ddl.create_preference('MYYY_SWEDISH_FUZZY_PREF', 'BASIC_WORDLIST');
exec ctx_ddl.set_attribute('MYYY_SWEDISH_FUZZY_PREF','FUZZY_MATCH','AUTO');
exec ctx_ddl.set_attribute('MYYY_SWEDISH_FUZZY_PREF','FUZZY_SCORE','60');
exec ctx_ddl.set_attribute('MYYY_SWEDISH_FUZZY_PREF','FUZZY_NUMRESULTS','100');
exec ctx_ddl.set_attribute('MYYY_SWEDISH_FUZZY_PREF','SUBSTRING_INDEX','TRUE');
exec ctx_ddl.set_attribute('MYYY_SWEDISH_FUZZY_PREF','STEMMER','AUTO');
create index IX_CUST_TMP_CTXSORTNAME on CUSTOMER (sortname) INDEXTYPE IS CTXSYS.CONTEXT PARAMETERS ('LEXER MYYY_SWEDISH_LEXER Wordlist MYYY_SWEDISH_FUZZY_PREF SYNC(ON COMMIT)');
If I run:
select * from CUSTOMER where contains(sortname,'Hamlberg') > 0; -- "Hamlberg, Ulf" is returned - OK
select * from CUSTOMER where contains(sortname,'Ulf') > 0; -- "Hamlberg, Ulf" is returned - NOT OK - Why it is not returned ??
3. Then I want to use fuzzy for searching names:
a. SELECT score(1), sortname FROM CUSTOMER where CONTAINS(sortname, 'fuzzy({Hamlberg}, 50, 10, weight)',1) > 0; - it should return "Hamlberg, Ulf"
b. SELECT score(1), sortname FROM CUSTOMER where CONTAINS(sortname, 'fuzzy({Ulf}, 50, 10, weight)',1) > 0; - it should return "Hamlberg, Ulf"
c. SELECT score(1), sortname FROM CUSTOMER where CONTAINS(sortname, 'fuzzy({Hamlberg Ulf}, 50, 10, weight)',1) > 0; - it should return "Hamlberg, Ulf"
d. SELECT score(1), sortname FROM CUSTOMER where CONTAINS(sortname, 'fuzzy({Hamlberg, Ulf}, 50, 10, weight)',1) > 0; - it should return "Hamlberg, Ulf"
Is it possible to do that somehow ?
Thx for answers.
Edited by: 906314 on 7.3.2012 4:01