Hello to everybody in this community,
im new to this and I got a question which belongs to Oracle Text 10g.
My Setup:
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit
8 Cores with each 2,5 GHz
64 GB RAM
What I'd like to do:
I'd like to compare a large amount of row sets with each other in a way that human caused mistakes (eg spelling, typing mistakes) will be tolerated.
So my TEXT CONTEXT setup is as follows:
MULTI_COLUMN_DATASTORE with each Column to compare.
begin
ctx_ddl.create_preference('my_datastore', 'MULTI_COLUMN_DATASTORE');
ctx_ddl.set_attribute('my_datastore', 'columns', 'column1, ...');
end;
BASIC_LEXER - with GERMAN settings:
begin
ctx_ddl.create_preference('my_lexer', 'BASIC_LEXER');
ctx_ddl.set_attribute('my_lexer', 'index_themes', 'NO');
ctx_ddl.set_attribute('my_lexer', 'index_text', 'YES');
ctx_ddl.set_attribute('my_lexer', 'alternate_spelling', 'GERMAN');
ctx_ddl.set_attribute('my_lexer', 'composite', 'GERMAN');
ctx_ddl.set_attribute('my_lexer', 'index_stems', 'GERMAN');
ctx_ddl.set_attribute('my_lexer', 'new_german_spelling', 'YES');
end;
BASIC_WORDLIST - with GERMAN settings:
begin
ctx_ddl.create_preference('my_wordlist', 'BASIC_WORDLIST');
ctx_ddl.set_attribute('my_wordlist','FUZZY_MATCH','GERMAN');
ctx_ddl.set_attribute('my_wordlist','FUZZY_SCORE','60'); --defaults
ctx_ddl.set_attribute('my_wordlist','FUZZY_NUMRESULTS','100'); --defaults
--ctx_ddl.set_attribute('my_wordlist','SUBSTRING_INDEX','TRUE'); --uncommented due to long creation time of index
ctx_ddl.set_attribute('my_wordlist','STEMMER','GERMAN');
end;
And a BASIC_SECTION_GROUP with a field_section for each column.
begin
ctx_ddl.create_section_group(
group_name => 'my_section_group',
group_type => 'BASIC_SECTION_GROUP'
);
ctx_ddl.add_field_section(
group_name => 'my_section_group',
section_name => 'column1',
tag => 'column1'
);
...
end;
I create the index with
create index idx_myfulltextindex on fulltexttest(column1)
indextype is ctxsys.context
parameters ('datastore my_datastore
section group my_section_group
lexer my_lexer
wordlist my_wordlist
stoplist ctxsys.empty_stoplist')
Everything works functionally fine.
In my test scenario i got a table with around 100.000 Rows which has a primary key which is not in the CONTEXT index.
The Problem:
I do a query like:
SELECT SCORE(1), a.*
FROM fulltexttest a
WHERE CONTAINS(a.column1, 'FUZZY(({TEST}),,,W) WITHIN COUMN1', 1)
AND a.primkey BETWEEN 1000 AND 4000
This will do a fulltext search in a set of 3000 rows. The response time here is nearly immediate. Maybe a second.
If I do the same in a cursor for many times (>1000) with different search terms, it is takes a long time ofcourse. In the average it does 1 query per second.
I thought this could not be that slow and i tested the same with:
SELECT SCORE(1), a.*
FROM fulltexttest a
WHERE CONTAINS(a.column1, '({TEST}) WITHIN COUMN1', 1)
AND a.primkey BETWEEN 1000 AND 4000
NOTE there is no Fuzzy search anymore...
With this it is up to 20 times faster.
The cpu of the server reaches about 15% load while processing the fuzzy query.
So:
If I do a fuzzy search, it seems not to access the index. I thought I was telling the index to compute the results of 100 expansions in advance.
Am I doing it wrong? Or is it not possible to build an Index especially for fuzzy search ?
Are there any suggestions to increase the performance? Note that I read the guide (7 Tuning Oracle Text) already. None of the hints caused remedy.
I would appreciate if anyone is able to help me in this case... Or just give a hint.
Thank you,
Dominik