Skip to Main Content

Database Software

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Fuzzy search - more accurate score??

909317Mar 14 2012 — edited Jun 6 2012
Hello again :)

I have a problem with score returned from fuzzy search. Short example:

Table Customer contain column sortname. Sortname is in format "Surname, Firstname". Table customer contain a lot of members with same Surname for example: "Karlsson"

Then I run fuzzy search for name "Karlsson Hanna" and I want to have more accurate results on top.

But the problem is that "Karlsson, Hanna" has score 7 and also "Karlsson, Anna" or "Carlsson, Anna" has score 7.
Why? Why the "Karlsson, Hanna" does not have more??

Example:

--create table
CREATE TABLE TEMP_CUSTOMER
(
SORTNAME VARCHAR2(200 BYTE)
);

--import
Import.sql is here - LINK REMOVED BY MODERATOR

--create index
exec ctx_ddl.drop_preference('TEST1_LEXER');
exec ctx_ddl.create_preference('TEST1_LEXER', 'BASIC_LEXER');
exec ctx_ddl.set_attribute('TEST1_LEXER', 'index_themes', 'NO');
exec ctx_ddl.set_attribute('TEST1_LEXER', 'index_text', 'YES');
exec ctx_ddl.set_attribute('TEST1_LEXER', 'index_stems', 'NONE');
exec ctx_ddl.set_attribute('TEST1_LEXER', 'alternate_spelling', 'SWEDISH');
exec ctx_ddl.set_attribute('TEST1_LEXER', 'base_letter', 'YES');
exec ctx_ddl.set_attribute('TEST1_LEXER', 'base_letter_type', 'GENERIC');
exec ctx_ddl.set_attribute('TEST1_LEXER', 'override_base_letter', 'TRUE');

exec ctx_ddl.drop_preference('TEST1_SWEDISH_FUZZY_PREF');
exec ctx_ddl.create_preference('TEST1_SWEDISH_FUZZY_PREF', 'BASIC_WORDLIST');
exec ctx_ddl.set_attribute('TEST1_SWEDISH_FUZZY_PREF','FUZZY_MATCH','AUTO');
exec ctx_ddl.set_attribute('TEST1_SWEDISH_FUZZY_PREF','FUZZY_SCORE','60');
exec ctx_ddl.set_attribute('TEST1_SWEDISH_FUZZY_PREF','FUZZY_NUMRESULTS','100');
exec ctx_ddl.set_attribute('TEST1_SWEDISH_FUZZY_PREF','SUBSTRING_INDEX','TRUE');
exec ctx_ddl.set_attribute('TEST1_SWEDISH_FUZZY_PREF','STEMMER','AUTO');

drop index IX_CUST_TMP_CTXSORTNAME;
create index IX_CUST_TMP_CTXSORTNAME on TEMP_CUSTOMER (sortname) INDEXTYPE IS CTXSYS.CONTEXT PARAMETERS ('LEXER TEST1_LEXER Wordlist TEST1_SWEDISH_FUZZY_PREF SYNC(ON COMMIT)');

--search
SELECT sortname FROM TEMP_CUSTOMER where CONTAINS(sortname, 'Karlsson Hanna') > 0;
SELECT score(1), sortname FROM TEMP_CUSTOMER where CONTAINS(sortname, 'fuzzy({Karlsson Hanna}, 50, 100, weight)',1) > 0 order by score(1) desc;
SELECT score(1), sortname FROM TEMP_CUSTOMER where CONTAINS(sortname, 'fuzzy({Karlsson Hanna}, 70, 100, weight)',1) > 0 order by score(1) desc;

Is it possible to do something with this?
I also try to change the lexer preferences, change the printjoins, skipjoins, whitespaces, punctations, but nothing works.

exec ctx_ddl.set_attribute('TEST1_LEXER', 'endjoins', ',');
exec ctx_ddl.set_attribute('TEST1_LEXER', 'printjoins', ',');
exec ctx_ddl.set_attribute('TEST1_LEXER', 'skipjoins', ',');
exec ctx_ddl.set_attribute('TEST1_LEXER', 'punctuations', ',');

EDIT: remove exec ctx_ddl.set_attribute('TEST1_LEXER', 'endjoins', ','); from example

Edited by: Mariooo on 14.3.2012 3:55

Edited by: Mariooo on 14.3.2012 3:59

Edited by: Roger Ford on Mar 14, 2012 9:46 AM
Comments
Locked Post
New comments cannot be posted to this locked post.
Post Details
Locked on Jul 4 2012
Added on Mar 14 2012
9 comments
4,583 views