One of our apps uses a Text CONTEXT index. Some of the terms looked for are numbers, specifically numbers with decimal points. I see this behaviour below and was wondering if someone can explain the logic behind why it works this way:
SQL> select * from numbers;
ID TEXT
----- ----------------
1 10.56
2 10.99
3 11.99
4 12.99
5 13.99
6 14.99
7 15.99
8 16.99
9 17.99
SQL> exec ctxsys.ctx_ddl.create_preference('my_lexer','WORLD_LEXER'); -- we have to use this lexer...no option
SQL> create index numbersi on numbers(text) indextype is ctxsys.context parameters ('lexer my_lexer');
SQL> select token_type, token_text from DR$NUMBERSI$I;
TOKEN_TYPE TOKEN_TEXT
---------- ----------------------------------------------------------------
0 10.56
0 10.99
0 11.99
0 12.99
0 13.99
0 14.99
0 15.99
0 16.99
0 17.99
-- A user wants to find rows where someone paid $10.xx for stuff. Notice the "." in the search term, and no white space before the "%".
SQL> select * from numbers where contains(text,'10.%')>0;
no rows selected
-- get rid of the decimal point...
SQL> select * from numbers where contains(text,'10%')>0;
ID TEXT
---------- ----------------------------------------------------------------
1 10.56
2 10.99
-- EXPLAIN shows this splitting of the search terms in the 2 cases above:
SQL> exec ctx_query.explain(index_name => 'numbersi', text_query => '10.%', explain_table => 'test_explain');
PL/SQL procedure successfully completed.
SQL> select id, parent_id, operation, object_name from test_explain;
ID PARENT_ID OPERATION OBJECT_NAME
---------- ---------- ------------------------------ ------------------------------
1 0 PHRASE
2 1 WORD 10
3 1 WORD %
SQL> exec ctx_query.explain(index_name => 'numbersi', text_query => '10%', explain_table => 'test_explain');
PL/SQL procedure successfully completed.
SQL> select id, parent_id, operation, object_name from test_explain;
ID PARENT_ID OPERATION OBJECT_NAME
---------- ---------- ------------------------------ ------------------------------
1 0 EQUIVALENCE 10%
2 1 WORD 10.56
3 1 WORD 10.99
My theory is that a trailing decimal point, aka period, is treated as an end of sentence character, which is punctuation, and it gets replaced by a blank; this makes a search expression'10.%' look like '10 %', which is the phrase with word "10" followed by any other word. Since there are no following words in my TEXT column, nothing matches this phrase. I see the same behaviour if there is an explicit blank character in the expression:
SQL> exec ctx_query.explain(index_name => 'numbersi', text_query => '10. %', explain_table => 'test_explain'); -- <<<< notice the space AFTER the "."
PL/SQL procedure successfully completed.
SQL> select id, parent_id, operation, object_name from test_explain;
ID PARENT_ID OPERATION OBJECT_NAME
---------- ---------- ------------------------------ ------------------------------
1 0 PHRASE
2 1 WORD 10
3 1 WORD %
Why is that period removed in such a manner that the expression gets split into 2 terms when there is no whitespace between the "." and the "%"?