Skip to Main Content

Database Software

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

text 10g fuzzy search performance

dmnkDec 1 2014 — edited Dec 4 2014

Hello to everybody in this community,

im new to this and I got a question which belongs to Oracle Text 10g.

My Setup:

Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit

8 Cores with each 2,5 GHz

64 GB RAM

What I'd like to do:

I'd like to compare a large amount of row sets with each other in a way that human caused mistakes (eg spelling, typing mistakes) will be tolerated.

So my TEXT CONTEXT setup is as follows:

MULTI_COLUMN_DATASTORE with each Column to compare.

begin

  ctx_ddl.create_preference('my_datastore', 'MULTI_COLUMN_DATASTORE');

  ctx_ddl.set_attribute('my_datastore', 'columns', 'column1, ...');

end;

BASIC_LEXER - with GERMAN settings:

begin

   ctx_ddl.create_preference('my_lexer', 'BASIC_LEXER');

   ctx_ddl.set_attribute('my_lexer', 'index_themes', 'NO');

   ctx_ddl.set_attribute('my_lexer', 'index_text', 'YES');

   ctx_ddl.set_attribute('my_lexer', 'alternate_spelling', 'GERMAN');

   ctx_ddl.set_attribute('my_lexer', 'composite', 'GERMAN');

   ctx_ddl.set_attribute('my_lexer', 'index_stems', 'GERMAN');

   ctx_ddl.set_attribute('my_lexer', 'new_german_spelling', 'YES');

end;

BASIC_WORDLIST - with GERMAN settings:

begin

   ctx_ddl.create_preference('my_wordlist', 'BASIC_WORDLIST');

   ctx_ddl.set_attribute('my_wordlist','FUZZY_MATCH','GERMAN');

   ctx_ddl.set_attribute('my_wordlist','FUZZY_SCORE','60'); --defaults

   ctx_ddl.set_attribute('my_wordlist','FUZZY_NUMRESULTS','100'); --defaults

   --ctx_ddl.set_attribute('my_wordlist','SUBSTRING_INDEX','TRUE'); --uncommented due to long creation time of index

   ctx_ddl.set_attribute('my_wordlist','STEMMER','GERMAN');

end;

And a BASIC_SECTION_GROUP with a field_section for each column.

begin

  ctx_ddl.create_section_group(

    group_name => 'my_section_group',

    group_type => 'BASIC_SECTION_GROUP'

  );

  ctx_ddl.add_field_section(

    group_name   => 'my_section_group',

    section_name => 'column1',

    tag          => 'column1'

  );

...

end;

I create the index with

create index idx_myfulltextindex on fulltexttest(column1)

indextype is ctxsys.context

parameters ('datastore my_datastore

             section group my_section_group

             lexer my_lexer

             wordlist my_wordlist

             stoplist ctxsys.empty_stoplist')

Everything works functionally fine.

In my test scenario i got a table with around 100.000 Rows which has a primary key which is not in the CONTEXT index.

The Problem:

I do a query like:

SELECT SCORE(1), a.*

FROM fulltexttest a

WHERE CONTAINS(a.column1, 'FUZZY(({TEST}),,,W) WITHIN COUMN1', 1)

  AND a.primkey BETWEEN 1000 AND 4000

This will do a fulltext search in a set of 3000 rows. The response time here is nearly immediate. Maybe a second.

If I do the same in a cursor for many times (>1000) with different search terms, it is takes a long time ofcourse. In the average it does 1 query per second.

I thought this could not be that slow and i tested the same with:

SELECT SCORE(1), a.*

FROM fulltexttest a

WHERE CONTAINS(a.column1, '({TEST}) WITHIN COUMN1', 1)

  AND a.primkey BETWEEN 1000 AND 4000

NOTE there is no Fuzzy search anymore...

With this it is up to 20 times faster.

The cpu of the server reaches about 15% load while processing the fuzzy query.

So:

If I do a fuzzy search, it seems not to access the index. I thought I was telling the index to compute the results of 100 expansions in advance.

Am I doing it wrong? Or is it not possible to build an Index especially for fuzzy search ?

Are there any suggestions to increase the performance? Note that I read the guide (7 Tuning Oracle Text) already. None of the hints caused remedy.

I would appreciate if anyone is able to help me in this case... Or just give a hint.

Thank you,

Dominik

This post has been answered by Barbara Boehmer on Dec 3 2014
Jump to Answer
Comments
Locked Post
New comments cannot be posted to this locked post.
Post Details
Locked on Jan 1 2015
Added on Dec 1 2014
11 comments
2,923 views