My database is 11.1 EE.
I am having trouble understanding how the performance of wildcard searches is improved using the PREFIX_INDEX and SUBSTRING_INDEX attributes of the BASIC_WORDLIST preference in Oracle Text. Basically, I understand PREFIX_INDEX, but SUBSTRING_INDEX does not seem to be working-- or else I just don't understand something.
Here is the relevant documentation:
http://download.oracle.com/docs/cd/B28359_01/text.111/b28303/ind.htm#CCAPP9091
If I create the index using PREFIX_INDEX=TRUE, PREFIX_MIN_LENGTH=3, and PREFIX_MAX_LENGTH=5, then given a token ALBATROSS, Oracle creates truncated tokens ALB, ALBA, and ALBAT. Thus the following left-truncated predicate will be faster because it can find the truncated tokens in the token table:
where contains( s,'%TROSS')
My understanding is that if I want to use a right-truncated predicate, such as
where contains( s,'ALBAT%')
...then I have to set the SUBSTRING_INDEX attribute when I build the index. I would then expect to see ALBATROS, ALBATRO, and ALBATR in the token table.
However, I do not see them. In fact, the token table is identical, whether I set the SUBSTRING_INDEX attribute or not.
Here is a case you can use to recreate my findings. Possibly, I am misunderstanding the concept of substring and prefix index altogether?
Test 1: create the index with no support for wildcard searches:
create table t (s clob);
insert into t values ('albatross');
commit;
create index t_x on t(s)
indextype is ctxsys.context
/
SQL> select token_text, token_count from dr$t_x$i;
TOKEN_TEXT TOKEN_COUNT
---------------------- -----------
ALBATROSS 1
Test 2: recreate the index with just the PREFIX_INDEX attribute:
begin
ctx_ddl.create_preference('t_x_wdl','BASIC_WORDLIST');
ctx_ddl.set_attribute('t_x_wdl','PREFIX_INDEX','TRUE');
ctx_ddl.set_attribute('t_x_wdl','PREFIX_MIN_LENGTH','3');
ctx_ddl.set_attribute('t_x_wdl','PREFIX_MAX_LENGTH','5');
end;
/
drop index t_x;
create index t_x on t(s)
indextype is ctxsys.context
parameters( 'WORDLIST t_x_wdl')
/
-- alter index t_x rebuild parameters ('replace wordlist t_x_wdl');
-- (The result is the same whether I drop and recreate the index or rebuild it.)
SQL> select token_text, token_count from dr$t_x$i;
TOKEN_TEXT TOKEN_COUNT
---------------------- -----------
ALB 1
ALBA 1
ALBAT 1
ALBATROSS 1
Test 3: recreate the index with both PREFIX_INDEX and SUBSTRING_INDEX:
begin
ctx_ddl.drop_preference('t_x_wdl');
ctx_ddl.create_preference('t_x_wdl','BASIC_WORDLIST');
ctx_ddl.set_attribute('t_x_wdl','PREFIX_INDEX','TRUE');
ctx_ddl.set_attribute('t_x_wdl','PREFIX_MIN_LENGTH','3');
ctx_ddl.set_attribute('t_x_wdl','PREFIX_MAX_LENGTH','5');
ctx_ddl.set_attribute('t_x_wdl','SUBSTRING_INDEX', 'YES');
end;
/
drop index t_x;
create index t_x on t(s)
indextype is ctxsys.context
parameters( 'WORDLIST t_x_wdl')
/
SQL> select token_text, token_count from dr$t_x$i;
TOKEN_TEXT TOKEN_COUNT
---------------------- -----------
ALB 1
ALBA 1
ALBAT 1
ALBATROSS 1
I expect to see more tokens in the token table, but the token table is identical.