Skip to Main Content

Query relaxation with wildcard and SYN function

dmak2709Apr 21 2020 — edited Apr 21 2020

Hi,

I found some trouble with setting up the query relaxation with SYN and wildcards in Oracle Text.

First let me show how the table and index was created:

BEGIN

   /*

   BEGIN

        ctx_ddl.drop_preference('INVENTAR_DEV_LEXER');

        ctx_ddl.drop_preference('INV_DEV_WORDLIST');

   END;

   */

  

   /**

    LEXER

   */

   ctx_ddl.create_preference('INVENTAR_DEV_LEXER','BASIC_LEXER');

  

   ctx_ddl.set_attribute('INVENTAR_DEV_LEXER', 'numgroup',',');

   ctx_ddl.set_attribute('INVENTAR_DEV_LEXER', 'numjoin','.');

   ctx_ddl.set_attribute('INVENTAR_DEV_LEXER', 'printjoins','.-_%:;/,()?!*+');

   /**

    FUZZY

   */

   ctx_ddl.create_preference('INV_DEV_WORDLIST', 'BASIC_WORDLIST');

   ctx_ddl.set_attribute('INV_DEV_WORDLIST','FUZZY_MATCH','GENERIC');

   ctx_ddl.set_attribute('INV_DEV_WORDLIST','FUZZY_SCORE','70');

   ctx_ddl.set_attribute('INV_DEV_WORDLIST','FUZZY_NUMRESULTS','10');

    

   ctx_ddl.set_attribute('INV_DEV_WORDLIST','SUBSTRING_INDEX','FALSE');

   ctx_ddl.set_attribute('INV_DEV_WORDLIST','STEMMER','NULL');

   ctx_ddl.set_attribute('INV_DEV_WORDLIST','PREFIX_INDEX','TRUE');

   ctx_ddl.set_attribute('INV_DEV_WORDLIST','PREFIX_MIN_LENGTH',3);

   ctx_ddl.set_attribute('INV_DEV_WORDLIST','PREFIX_MAX_LENGTH',7);

   /**

    THESAURUS

   */

   ctx_thes.create_thesaurus('inventar_thes', FALSE);

   CTX_THES.CREATE_RELATION('inventar_thes','el','SYN','electric');

END;

/**

TABLE

*/

CREATE TABLE inventar_dev (id VARCHAR2(16), name VARCHAR2(255));

/**

INDEX

*/

CREATE INDEX inventar_dev_idx ON inventar_dev(name)

        INDEXTYPE IS CTXSYS.CONTEXT

        PARAMETERS (

            'DATASTORE CTXSYS.DEFAULT_DATASTORE FILTER CTXSYS.NULL_FILTER LEXER INVENTAR_DEV_LEXER WORDLIST  INV_DEV_WORDLIST STOPLIST CTXSYS.EMPTY_STOPLIST'

        );

And know when I run the following query, it works fine (I left only one <seq>).

SELECT /*+ FIRST_ROWS(150) */

     i.name

FROM inventar_dev i

                WHERE contains(i.name,

            '<query>

                <textquery grammar="context">

                    <progression>

                        <seq>SYN({el}, inventar_thes)</seq>

                    </progression>

                </textquery>

                <score datatype="FLOAT" algorithm="DEFAULT"/>

                <order>

                    <orderkey> Score DESC </orderkey>

                </order>

            </query>', 1) > 0;

But if I add one word in the <seq> as follow:

SELECT /*+ FIRST_ROWS(150) */

OPIS

FROM david.inventar_dev i

                WHERE contains(i.opis,

            '<query>

                <textquery grammar="context">

                    <progression>

                        <seq>SYN({el}, inventar_thes) {kos}</seq>

                    </progression>

                </textquery>

                <score datatype="FLOAT" algorithm="DEFAULT"/>

                <order>

                    <orderkey> Score DESC </orderkey>

                </order>

            </query>', 1) > 0;

I throws an exception:

ORA-29902: error in executing ODCIIndexStart() routine

ORA-20000: Oracle Text error:

DRG-50920: part of phrase not itself a phrase or equivalence

29902. 00000 -  "error in executing ODCIIndexStart() routine"

*Cause:    The execution of ODCIIndexStart routine caused an error.

*Action:   Examine the error messages produced by the indextype code and

           take appropriate action.

Also % doesn't seems to work:

SELECT /*+ FIRST_ROWS(150) */

OPIS

FROM david.inventar_dev i

                WHERE contains(i.opis,

            '<query>

                <textquery grammar="context">

                    <progression>

                        <seq>{el}%</seq>

                    </progression>

                </textquery>

                <score datatype="FLOAT" algorithm="DEFAULT"/>

                <order>

                    <orderkey> Score DESC </orderkey>

                </order>

            </query>', 1) > 0;

Update

I realized that SYN({something}, thes) doesn't work when there are multiple words devided by spaces. So you need to add an operator between those words.

The query works with the SYN if you remove the foolowing line from the text:

<seq><rewrite>transform((TOKENS, "FUZZY(SYN({", "}, inv_thes), 70, 10, weight)", " "))</rewrite></seq> 

But I'm still not sure what could be the reason.

In fact I have also removed the TRANSFORM and REWRITE function and wrote my custom query parsing function.

Comments
Post Details
Added on Apr 21 2020
0 comments
67 views