Database Software

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

SUBSTRING_INDEX and PREFIX_INDEX appear to produce identical results

FredCSep 11 2008 — edited Sep 11 2008

My database is 11.1 EE.

I am having trouble understanding how the performance of wildcard searches is improved using the PREFIX_INDEX and SUBSTRING_INDEX attributes of the BASIC_WORDLIST preference in Oracle Text. Basically, I understand PREFIX_INDEX, but SUBSTRING_INDEX does not seem to be working-- or else I just don't understand something.

Here is the relevant documentation:
http://download.oracle.com/docs/cd/B28359_01/text.111/b28303/ind.htm#CCAPP9091

If I create the index using PREFIX_INDEX=TRUE, PREFIX_MIN_LENGTH=3, and PREFIX_MAX_LENGTH=5, then given a token ALBATROSS, Oracle creates truncated tokens ALB, ALBA, and ALBAT. Thus the following left-truncated predicate will be faster because it can find the truncated tokens in the token table:

where contains( s,'%TROSS')

My understanding is that if I want to use a right-truncated predicate, such as

where contains( s,'ALBAT%')

...then I have to set the SUBSTRING_INDEX attribute when I build the index. I would then expect to see ALBATROS, ALBATRO, and ALBATR in the token table.

However, I do not see them. In fact, the token table is identical, whether I set the SUBSTRING_INDEX attribute or not.

Here is a case you can use to recreate my findings. Possibly, I am misunderstanding the concept of substring and prefix index altogether?

Test 1: create the index with no support for wildcard searches:

create table t (s clob);
insert into t values ('albatross');
commit;

create index t_x on t(s)
  indextype is ctxsys.context
/

SQL> select token_text, token_count from dr$t_x$i;
 
TOKEN_TEXT             TOKEN_COUNT
---------------------- -----------
ALBATROSS                        1

Test 2: recreate the index with just the PREFIX_INDEX attribute:

begin
  ctx_ddl.create_preference('t_x_wdl','BASIC_WORDLIST');
  ctx_ddl.set_attribute('t_x_wdl','PREFIX_INDEX','TRUE');
  ctx_ddl.set_attribute('t_x_wdl','PREFIX_MIN_LENGTH','3');
  ctx_ddl.set_attribute('t_x_wdl','PREFIX_MAX_LENGTH','5');
end;
/

drop index t_x;
create index t_x on t(s)
  indextype is ctxsys.context
  parameters( 'WORDLIST t_x_wdl')
/

-- alter index t_x rebuild parameters ('replace wordlist t_x_wdl');
-- (The result is the same whether I drop and recreate the index or rebuild it.)

SQL> select token_text, token_count from dr$t_x$i;
 
TOKEN_TEXT             TOKEN_COUNT
---------------------- -----------
ALB                              1
ALBA                             1
ALBAT                            1
ALBATROSS                        1

Test 3: recreate the index with both PREFIX_INDEX and SUBSTRING_INDEX:

begin
  ctx_ddl.drop_preference('t_x_wdl');
  ctx_ddl.create_preference('t_x_wdl','BASIC_WORDLIST');
  ctx_ddl.set_attribute('t_x_wdl','PREFIX_INDEX','TRUE');
  ctx_ddl.set_attribute('t_x_wdl','PREFIX_MIN_LENGTH','3');
  ctx_ddl.set_attribute('t_x_wdl','PREFIX_MAX_LENGTH','5');
  ctx_ddl.set_attribute('t_x_wdl','SUBSTRING_INDEX', 'YES');
end;
/

drop index t_x;
create index t_x on t(s)
  indextype is ctxsys.context
  parameters( 'WORDLIST t_x_wdl')
/

SQL> select token_text, token_count from dr$t_x$i;
 
TOKEN_TEXT             TOKEN_COUNT
---------------------- -----------
ALB                              1
ALBA                             1
ALBAT                            1
ALBATROSS                        1

I expect to see more tokens in the token table, but the token table is identical.

This post has been answered by Roger Ford-Oracle on Sep 11 2008

Jump to Answer

Locked Post

New comments cannot be posted to this locked post.

Locked on Oct 9 2008

Added on Sep 11 2008

#database-key-features, #text

2 comments

5,748 views