Fuzzy searching using multiple words
531177Feb 1 2007 — edited Jan 24 2008Hi All
I have a query that looks like this.
SELECT DISTINCT stname,stoid
FROM TEMPTBL
WHERE CONTAINS (stname, '?(SYN(1ST) AND SYN(ST) ) ' ,1) > 0
Just to focus on the problem though I'll take out SYN. So the query looks like
this.
SELECT DISTINCT stname
FROM TEMPTBL
WHERE CONTAINS (stname, '?((1ST) AND (ST) ) ' ,1) > 0
This query is dynamically created from a webpage. What happens is the user can
enter anything and a function in java breaks the input into words. These words are then placed into the query above.
My problem when I run the query above (without SYN)
I get the following results. The problem with this is it seems like it is giving me everything
that has an ST in it. Which I would think it would give me only the records bolded.
RPAD(STNAME,20)
--------------------
ST
FIST ST
1ST ST
BIST ST
11ST ST
Now if you run the same query except change the input a little. to 11st st.
You get the output that I expect.
SELECT DISTINCT stname,stoid
FROM TEMPTBL
WHERE CONTAINS (stname, '?((11ST) AND (ST) ) ' ,1) > 0
RPAD(STNAME,20)
--------------------
1ST ST
11ST ST
I saw on oracles website
http://downloadeast.oracle.com/docs/cd/B19306_01/text.102/b14218/cqoper.htm#i997330
that for Fuzzy "The word needs to be at least 3 characters for the fuzzy operator to process it." I know 1st is 3 characters but I was wondering if this was really true.
Because it doesn't seem to work correctly when I have 3 (1st) but when I have 4(11st) it seems to work.
Some of the code:
---------------------------
create table temptbl(
stoid varchar(15),
stname varchar(200)
)
INSERT INTO STOID TEMPTBL VALUES( '1','ST')
INSERT INTO STOID TEMPTBL VALUES( '2','1ST ST')
INSERT INTO STOID TEMPTBL VALUES( '3','FIST ST')
INSERT INTO STOID TEMPTBL VALUES( '5','ND')
INSERT INTO STOID TEMPTBL VALUES( '4','2ND ND')
INSERT INTO STOID TEMPTBL VALUES( '7','3ND ND')
INSERT INTO STOID TEMPTBL VALUES( '8','HELLOND ND')
INSERT INTO STOID TEMPTBL VALUES( '9','BIST ST')
INSERT INTO STOID TEMPTBL VALUES( '10','BIST BIST')
INSERT INTO STOID TEMPTBL VALUES( '8','34ND ND')
INSERT INTO STOID TEMPTBL VALUES( '8','11ST ST')
drop index temptbl_IDX
CREATE INDEX temptbl_IDX on TEMPTBL(stname) INDEXTYPE IS CTXSYS.CONTEXT
PARAMETERS('Wordlist PREF_SOUNDEX LEXER MAF_LEXER STOPLIST CTXSYS.EMPTY_STOPLIST')
begin
ctx_ddl.create_preference('PREF_SOUNDEX', 'BASIC_WORDLIST');
ctx_ddl.set_attribute('PREF_SOUNDEX','PREFIX_INDEX','TRUE');
ctx_ddl.set_attribute('PREF_SOUNDEX','PREFIX_MIN_LENGTH',3);
ctx_ddl.set_attribute('PREF_SOUNDEX','PREFIX_MAX_LENGTH',15);
ctx_ddl.set_attribute('PREF_SOUNDEX','SUBSTRING_INDEX','TRUE');
ctx_ddl.create_preference('MAF_LEXER','BASIC_LEXER');
ctx_ddl.set_attribute('MAF_LEXER','printjoins','-#&');
ctx_ddl.set_attribute('MAF_LEXER','skipjoins','(.),');
end;
/
Any answers are greatly appreciated?
Thanks
Jeff