Fuzzy search - more accurate score??
909317Mar 14 2012 — edited Jun 6 2012Hello again :)
I have a problem with score returned from fuzzy search. Short example:
Table Customer contain column sortname. Sortname is in format "Surname, Firstname". Table customer contain a lot of members with same Surname for example: "Karlsson"
Then I run fuzzy search for name "Karlsson Hanna" and I want to have more accurate results on top.
But the problem is that "Karlsson, Hanna" has score 7 and also "Karlsson, Anna" or "Carlsson, Anna" has score 7.
Why? Why the "Karlsson, Hanna" does not have more??
Example:
--create table
CREATE TABLE TEMP_CUSTOMER
(
SORTNAME VARCHAR2(200 BYTE)
);
--import
Import.sql is here - LINK REMOVED BY MODERATOR
--create index
exec ctx_ddl.drop_preference('TEST1_LEXER');
exec ctx_ddl.create_preference('TEST1_LEXER', 'BASIC_LEXER');
exec ctx_ddl.set_attribute('TEST1_LEXER', 'index_themes', 'NO');
exec ctx_ddl.set_attribute('TEST1_LEXER', 'index_text', 'YES');
exec ctx_ddl.set_attribute('TEST1_LEXER', 'index_stems', 'NONE');
exec ctx_ddl.set_attribute('TEST1_LEXER', 'alternate_spelling', 'SWEDISH');
exec ctx_ddl.set_attribute('TEST1_LEXER', 'base_letter', 'YES');
exec ctx_ddl.set_attribute('TEST1_LEXER', 'base_letter_type', 'GENERIC');
exec ctx_ddl.set_attribute('TEST1_LEXER', 'override_base_letter', 'TRUE');
exec ctx_ddl.drop_preference('TEST1_SWEDISH_FUZZY_PREF');
exec ctx_ddl.create_preference('TEST1_SWEDISH_FUZZY_PREF', 'BASIC_WORDLIST');
exec ctx_ddl.set_attribute('TEST1_SWEDISH_FUZZY_PREF','FUZZY_MATCH','AUTO');
exec ctx_ddl.set_attribute('TEST1_SWEDISH_FUZZY_PREF','FUZZY_SCORE','60');
exec ctx_ddl.set_attribute('TEST1_SWEDISH_FUZZY_PREF','FUZZY_NUMRESULTS','100');
exec ctx_ddl.set_attribute('TEST1_SWEDISH_FUZZY_PREF','SUBSTRING_INDEX','TRUE');
exec ctx_ddl.set_attribute('TEST1_SWEDISH_FUZZY_PREF','STEMMER','AUTO');
drop index IX_CUST_TMP_CTXSORTNAME;
create index IX_CUST_TMP_CTXSORTNAME on TEMP_CUSTOMER (sortname) INDEXTYPE IS CTXSYS.CONTEXT PARAMETERS ('LEXER TEST1_LEXER Wordlist TEST1_SWEDISH_FUZZY_PREF SYNC(ON COMMIT)');
--search
SELECT sortname FROM TEMP_CUSTOMER where CONTAINS(sortname, 'Karlsson Hanna') > 0;
SELECT score(1), sortname FROM TEMP_CUSTOMER where CONTAINS(sortname, 'fuzzy({Karlsson Hanna}, 50, 100, weight)',1) > 0 order by score(1) desc;
SELECT score(1), sortname FROM TEMP_CUSTOMER where CONTAINS(sortname, 'fuzzy({Karlsson Hanna}, 70, 100, weight)',1) > 0 order by score(1) desc;
Is it possible to do something with this?
I also try to change the lexer preferences, change the printjoins, skipjoins, whitespaces, punctations, but nothing works.
exec ctx_ddl.set_attribute('TEST1_LEXER', 'endjoins', ',');
exec ctx_ddl.set_attribute('TEST1_LEXER', 'printjoins', ',');
exec ctx_ddl.set_attribute('TEST1_LEXER', 'skipjoins', ',');
exec ctx_ddl.set_attribute('TEST1_LEXER', 'punctuations', ',');
EDIT: remove exec ctx_ddl.set_attribute('TEST1_LEXER', 'endjoins', ','); from example
Edited by: Mariooo on 14.3.2012 3:55
Edited by: Mariooo on 14.3.2012 3:59
Edited by: Roger Ford on Mar 14, 2012 9:46 AM