Thread: Building index from multiple tables for text search


Permlink Replies: 12 - Pages: 1 - Last Post: Jun 8, 2006 6:13 PM Last Post By: Barbara Boehmer
user491360

Posts: 5
Registered: 02/28/06
Building index from multiple tables for text search
Posted: Feb 28, 2006 7:15 AM
Click to report abuse...   Click to reply to this thread Reply
Hallo,

I had a look at how to build a index table to optimize search with Oracle text.
At the moment we are using a JOIN of tables to search for a text in several fields located in different tables.

My question is:

Is it possible to create that index table from several tables?
If yes, can you point me out to any links or give me an example.

So far, all the examples I read were about 1 table only.

Thanks

Elisabeth
Barbara Boehmer

Posts: 2,840
Registered: 04/28/00
Re: Building index from multiple tables for text search
Posted: Feb 28, 2006 7:54 AM   in response to: user491360 in response to: user491360
Click to report abuse...   Click to reply to this thread Reply
Here is an example using two tables. You can expand it to use as many related tables as you like, by just adding additional nested loops to the procedure.

scott@ORA92> CREATE TABLE customers
2 (customer_id NUMBER,
3 first_name VARCHAR2(15),
4 last_name VARCHAR2(15),
5 dummy VARCHAR2(1),
6 CONSTRAINT customers_pk PRIMARY KEY (customer_id))
7 /

Table created.

scott@ORA92> CREATE TABLE addresses
2 (customer_id NUMBER,
3 street VARCHAR2(15),
4 city VARCHAR2(15),
5 state VARCHAR2(2),
6 CONSTRAINT addresses_fk FOREIGN KEY (customer_id)
7 REFERENCES customers (customer_id))
8 /

Table created.

scott@ORA92> CONNECT CTXSYS/ctxsys_password
Connected.
scott@ORA92> @ LOGIN
scott@ORA92> SET ECHO OFF

GLOBAL_NAME

ctxsys@ORA92

ctxsys@ORA92> CREATE OR REPLACE PROCEDURE concat_cols
2 (p_rowid IN ROWID,
3 p_clob IN OUT CLOB)
4 AS
5 v_clob CLOB;
6 BEGIN
7 FOR c1 IN
8 (SELECT customer_id, first_name || ' ' || last_name AS data
9 FROM scott.customers
10 WHERE ROWID = p_rowid)
11 LOOP
12 v_clob := v_clob || c1.data;
13 FOR c2 IN
14 (SELECT ' ' || street || ' ' || city || ' ' || state AS data
15 FROM scott.addresses a
16 WHERE a.customer_id = c1.customer_id)
17 LOOP
18 v_clob := v_clob || c2.data;
19 END LOOP;
20 END LOOP;
21 p_clob := v_clob;
22 END concat_cols;
23 /

Procedure created.

ctxsys@ORA92> SHOW ERRORS
No errors.
ctxsys@ORA92> GRANT EXECUTE ON concat_cols TO scott
2 /

Grant succeeded.

ctxsys@ORA92> CONNECT scott/tiger
Connected.
ctxsys@ORA92> @ LOGIN
ctxsys@ORA92> SET ECHO OFF

GLOBAL_NAME

scott@ORA92

scott@ORA92> BEGIN
2 CTX_DDL.CREATE_PREFERENCE ('concat_cols_datastore', 'USER_DATASTORE');
3 CTX_DDL.SET_ATTRIBUTE ('concat_cols_datastore', 'PROCEDURE', 'concat_cols');
4 END;
5 /

PL/SQL procedure successfully completed.

scott@ORA92> INSERT ALL
2 INTO customers VALUES (1, 'Bob', 'Smith', NULL)
3 INTO customers VALUES (2, 'Bob', 'Jones', NULL)
4 SELECT * FROM DUAL
5 /

2 rows created.

scott@ORA92> INSERT ALL
2 INTO addresses VALUES (1, 'Noplace', 'Nowhere', 'CA')
3 INTO addresses VALUES (2, 'Smith St.', 'Somewhere', 'CA')
4 SELECT * FROM DUAL
5 /

2 rows created.

scott@ORA92> CREATE INDEX customer_text_idx ON customers (dummy)
2 INDEXTYPE IS CTXSYS.CONTEXT
3 PARAMETERS ('datastore concat_cols_datastore')
4 /

Index created.

scott@ORA92> SELECT c.first_name, c.last_name, a.street, a.city
2 FROM customers c, addresses a
3 WHERE c.customer_id = a.customer_id
4 AND CONTAINS (C.dummy, 'Smith') > 0
5 /

FIRST_NAME LAST_NAME STREET CITY

---------------
---------------
Bob Smith Noplace Nowhere
Bob Jones Smith St. Somewhere

scott@ORA92>
user491360

Posts: 5
Registered: 02/28/06
Re: Building index from multiple tables for text search
Posted: Mar 1, 2006 9:40 AM   in response to: Barbara Boehmer in response to: Barbara Boehmer
Click to report abuse...   Click to reply to this thread Reply
Thanks a lot for your example.

I ve been doing some research and I found that about CTXCAT:
"Use this index type to index small text fragments such as item names, prices and descriptions that are stored across columns"

I have to do a search of keywords in different columns in different tables.

According to what is said above, would CTXCAT be the best index type to use in my case?

As far as I understood, CONTEXT would allow a search within documents (doc, html,xml...) but not on the containt of a columns cell in the database.Am I wrong?

Thanks for the clarification

Cheers

Elisabeth

Barbara Boehmer

Posts: 2,840
Registered: 04/28/00
Re: Building index from multiple tables for text search
Posted: Mar 1, 2006 10:20 AM   in response to: user491360 in response to: user491360
Click to report abuse...   Click to reply to this thread Reply
CTXCAT is good if the columns are in the same table. The demonstration that I already provided, using a CONTEXT index, searched across columns in two different tables in the database. There were no documents involved.
mroessler

Posts: 55
Registered: 07/26/00
Re: Building index from multiple tables for text search
Posted: Apr 19, 2006 8:59 PM   in response to: Barbara Boehmer in response to: Barbara Boehmer
Click to report abuse...   Click to reply to this thread Reply
Can this type of single index be created if one of the columns is a blob? Say, for example, the CITY column in the ADDRESSES table were a blob, while the datatypes of the other columns remain unchanged?
Barbara Boehmer

Posts: 2,840
Registered: 04/28/00
Re: Building index from multiple tables for text search
Posted: Apr 20, 2006 3:37 PM   in response to: mroessler in response to: mroessler
Click to report abuse...   Click to reply to this thread Reply
You cannot concatenate a blob column, so you would not be able to use a procedure like the one that I demonstrated previously, but you can use a materialized view and a multi_column_datastore, as demonstrated below.

scott@10gXE> CREATE TABLE customers
2 (customer_id NUMBER,
3 first_name VARCHAR2(15),
4 last_name VARCHAR2(15),
5 CONSTRAINT customers_pk PRIMARY KEY (customer_id))
6 /

Table created.

scott@10gXE> CREATE TABLE addresses
2 (customer_id NUMBER,
3 street VARCHAR2(15),
4 city BLOB,
5 state VARCHAR2(2),
6 CONSTRAINT addresses_fk FOREIGN KEY (customer_id)
7 REFERENCES customers (customer_id))
8 /

Table created.

scott@10gXE> INSERT ALL
2 INTO customers VALUES (1, 'Bob', 'Smith')
3 INTO customers VALUES (2, 'Bob', 'Jones')
4 INTO customers VALUES (3, 'Bob', 'williams')
5 SELECT * FROM DUAL
6 /

3 rows created.

scott@10gXE> INSERT ALL
2 INTO addresses VALUES (1, 'Noplace', UTL_RAW.CAST_TO_RAW ('REDWOOD SHORES'), 'CA')
3 INTO addresses VALUES (2, 'Smith St.', UTL_RAW.CAST_TO_RAW ('RIVERSIDE'), 'CA')
4 INTO addresses VALUES (3, 'Redwood St.', UTL_RAW.CAST_TO_RAW ('RIVERSIDE'), 'CA')
5 SELECT * FROM DUAL
6 /

3 rows created.

scott@10gXE> CREATE MATERIALIZED VIEW test_view AS
2 SELECT c.first_name, c.last_name,
3 a.street, a.city, a.state,
4 CAST (NULL AS VARCHAR2(1)) AS dummy
5 FROM customers c, addresses a
6 WHERE c.customer_id = a.customer_id
7 /

Materialized view created.

scott@10gXE> CONNECT CTXSYS/ctxsys_password
Connected.

GLOBAL_NAME

ctxsys@10gXE

ctxsys@10gXE> @ LOGIN
ctxsys@10gXE> SET ECHO OFF

GLOBAL_NAME

ctxsys@10gXE

ctxsys@10gXE> EXEC CTX_DDL.DROP_PREFERENCE ('my_multi');

PL/SQL procedure successfully completed.

ctxsys@10gXE> BEGIN
2 CTX_DDL.CREATE_PREFERENCE ('my_multi', 'MULTI_COLUMN_DATASTORE');
3 CTX_DDL.SET_ATTRIBUTE
4 ('my_multi', 'COLUMNS',
5 'first_name, last_name, street, city, state');
6 END;
7 /

PL/SQL procedure successfully completed.

ctxsys@10gXE> CONNECT scott/tiger
Connected.

GLOBAL_NAME

scott@10gXE

scott@10gXE> @ LOGIN
scott@10gXE> SET ECHO OFF

GLOBAL_NAME

scott@10gXE

scott@10gXE> CREATE INDEX customer_text_idx ON test_view (dummy)
2 INDEXTYPE IS CTXSYS.CONTEXT
3 PARAMETERS ('DATASTORE CTXSYS.my_multi')
4 /

Index created.

scott@10gXE> SELECT token_text FROM dr$customer_text_idx$i
2 /

TOKEN_TEXT

BOB
CA
CITY
FIRST
JONES
LAST
NAME
NOPLACE
REDWOOD
RIVERSIDE
SHORES
SMITH
ST
STATE
STREET
WILLIAMS

16 rows selected.

scott@10gXE> SELECT first_name, last_name, street
2 FROM test_view
3 WHERE CONTAINS (dummy, 'Smith') > 0
4 /

FIRST_NAME LAST_NAME STREET

---------------
Bob Smith Noplace
Bob Jones Smith St.

scott@10gXE> SELECT first_name, last_name, street
2 FROM test_view
3 WHERE CONTAINS (dummy, 'Redwood') > 0
4 /

FIRST_NAME LAST_NAME STREET

---------------
Bob Smith Noplace
Bob williams Redwood St.

scott@10gXE>
mroessler

Posts: 55
Registered: 07/26/00
Re: Building index from multiple tables for text search
Posted: Apr 21, 2006 7:14 PM   in response to: Barbara Boehmer in response to: Barbara Boehmer
Click to report abuse...   Click to reply to this thread Reply
Thank you Barbara, these examples are really excellent.

Michael
504181

Posts: 8
Registered: 04/18/06
Re: Building index from multiple tables for text search
Posted: Apr 24, 2006 1:25 PM   in response to: user491360 in response to: user491360
Click to report abuse...   Click to reply to this thread Reply
Barbara, thanks for your examples. It really makes understanding easier.

Based on the example below we are able to search for 'Smith' and ' Redwood' search queries. I did notice that when creating the view we have added a Dummy column and also the fact that the entire view is being indexed on the dummy column..any idea why the dummy column??

Also if I use the following query

SQL> SELECT first_name, last_name, street
FROM test_view
WHERE CONTAINS (dummy, 'where') > 0

the results returned are 0 because 'where' is not an exact match for the city serach.. any pointers how to fix the above problem?

Thanks for your help..

Barbara Boehmer

Posts: 2,840
Registered: 04/28/00
Re: Building index from multiple tables for text search
Posted: Apr 24, 2006 3:47 PM   in response to: 504181 in response to: 504181
Click to report abuse...   Click to reply to this thread Reply
When you create the multi_column_datastore, you specify all of the columns there that you want to be tokenized, indexed, and searchable. When you create the index, you can create it on any single column, then you must use that same single column in your contains queries. Out of habit, I named it dummy. You might want to give it a more meaningful name, like customers_and_addresses.

I don't understand what you mean about searching for "where". Do you mean that you are searching for "where" and expecting to find "somewhere", "everywhere", "nowhere", and "anywhere". If so, then use a wildcard search. If you mean that you are searching for "where" and expecting to return a row containing "this where that", then it may not return any rows if you are using a stoplist that contains the word "where". If you do not specify a stoplist, then Oracle uses the default stoplist, which includes words like "and" and "the" and "of" and "or" that you would not typically want to be indexed or searched on. But, if you want to be able to search for such a word, then you can remove it from the stoplist or specify an empty stoplist or your own stoplist.
504181

Posts: 8
Registered: 04/18/06
Re: Building index from multiple tables for text search
Posted: Apr 24, 2006 4:40 PM   in response to: Barbara Boehmer in response to: Barbara Boehmer
Click to report abuse...   Click to reply to this thread Reply
As mentioned above, when you create a multi_column_datastore, all the columns that are part of that datastore are tokenized, indexed, and searchable. In such a case does it really matter which column is mentioned at the time of index creation ??
Barbara Boehmer

Posts: 2,840
Registered: 04/28/00
Re: Building index from multiple tables for text search
Posted: Apr 24, 2006 5:38 PM   in response to: 504181 in response to: 504181
Click to report abuse...   Click to reply to this thread Reply
No, it doesn't matter. You could use any of the pre-existing columns. With an index on just one table, it is common practice to create a dummy column and update that column for index synchronization, but that doesn't apply with a view of joined tables.
Cod'ead

Posts: 158
Registered: 01/30/06
Re: Building index from multiple tables for text search
Posted: Jun 8, 2006 3:50 AM   in response to: Barbara Boehmer in response to: Barbara Boehmer
Click to report abuse...   Click to reply to this thread Reply
Hi,

I have followed the steps above to create an index on muliple columns in many tables. The index works fine, however when i make a change in the data and sync the index the changes are not picked up. Only when i drop the index and create it again using

CTX_DDL.CREATE_PREFERENCE ('concat_cols_datastore', 'USER_DATASTORE');
CTX_DDL.SET_ATTRIBUTE ('concat_cols_datastore', 'PROCEDURE', 'concat_cols');

are the changes picked up. Could you provide me with the code to refresh an index. Also when you sync an index is the full index synced or just the changes made.

Thankyou

Kevin mann
Barbara Boehmer

Posts: 2,840
Registered: 04/28/00
Re: Building index from multiple tables for text search
Posted: Jun 8, 2006 6:13 PM   in response to: Cod'ead in response to: Cod'ead
Click to report abuse...   Click to reply to this thread Reply
The following is an extension of the original example. It uses ctx_ddl.sync_index to synchronize the index and shows the changes in one of the index tables and shows that the query finds the newly synchronized data. It also shows how this only happens when the column that the index is on is updated. In this example, the index is on the dummy column. The first update does not update the dummy column, so the ctx_ddl.sync_index command does not synchronize the new data. The second update does update the related dummy column, so ctx_ddl.sync_index does synchronize the new data. Lastly, I showed what happens to one of the index tables when you rebuild the index. Notice the reduction in rows in the index table after the rebuild process. You could also rebuild online or drop and recreate the index.

SCOTT@10gXE> DROP TABLE addresses
2 /

Table dropped.

SCOTT@10gXE> DROP TABLE customers
2 /

Table dropped.

SCOTT@10gXE> CREATE TABLE customers
2 (customer_id NUMBER,
3 first_name VARCHAR2(15),
4 last_name VARCHAR2(15),
5 dummy VARCHAR2(1),
6 CONSTRAINT customers_pk PRIMARY KEY (customer_id))
7 /

Table created.

SCOTT@10gXE> CREATE TABLE addresses
2 (customer_id NUMBER,
3 street VARCHAR2(15),
4 city VARCHAR2(15),
5 state VARCHAR2(2),
6 CONSTRAINT addresses_fk FOREIGN KEY (customer_id)
7 REFERENCES customers (customer_id))
8 /

Table created.

SCOTT@10gXE> GRANT SELECT ON customers TO ctxsys
2 /

Grant succeeded.

SCOTT@10gXE> GRANT SELECT ON addresses TO ctxsys
2 /

Grant succeeded.

SCOTT@10gXE> CONNECT CTXSYS/ctxsys_password
Connected.
CTXSYS@10gXE>
CTXSYS@10gXE> CREATE OR REPLACE PROCEDURE concat_cols
2 (p_rowid IN ROWID,
3 p_clob IN OUT CLOB)
4 AS
5 v_clob CLOB;
6 BEGIN
7 FOR c1 IN
8 (SELECT customer_id, first_name || ' ' || last_name AS data
9 FROM scott.customers
10 WHERE ROWID = p_rowid)
11 LOOP
12 v_clob := v_clob || c1.data;
13 FOR c2 IN
14 (SELECT ' ' || street || ' ' || city || ' ' || state AS data
15 FROM scott.addresses a
16 WHERE a.customer_id = c1.customer_id)
17 LOOP
18 v_clob := v_clob || c2.data;
19 END LOOP;
20 END LOOP;
21 p_clob := v_clob;
22 END concat_cols;
23 /

Procedure created.

CTXSYS@10gXE> SHOW ERRORS
No errors.
CTXSYS@10gXE> GRANT EXECUTE ON concat_cols TO scott
2 /

Grant succeeded.

CTXSYS@10gXE> CONNECT scott/tiger
Connected.
SCOTT@10gXE>
SCOTT@10gXE> EXEC CTX_DDL.DROP_PREFERENCE ('concat_cols_datastore')

PL/SQL procedure successfully completed.

SCOTT@10gXE> BEGIN
2 CTX_DDL.CREATE_PREFERENCE ('concat_cols_datastore', 'USER_DATASTORE');
3 CTX_DDL.SET_ATTRIBUTE ('concat_cols_datastore', 'PROCEDURE', 'ctxsys.concat_cols');
4 END;
5 /

PL/SQL procedure successfully completed.

SCOTT@10gXE> CREATE INDEX customer_text_idx ON customers (dummy)
2 INDEXTYPE IS CTXSYS.CONTEXT
3 PARAMETERS ('datastore concat_cols_datastore')
4 /

Index created.

SCOTT@10gXE> INSERT INTO customers VALUES (1, 'Bob', 'Smith', NULL)
2 /

1 row created.

SCOTT@10gXE> INSERT INTO addresses VALUES (1, 'Noplace', 'Nowhere', 'CA')
2 /

1 row created.

SCOTT@10gXE> INSERT INTO customers VALUES (2, 'Bob', 'Jones', NULL)
2 /

1 row created.

SCOTT@10gXE> INSERT INTO addresses VALUES (2, 'Smith St.', 'Somewhere', 'CA')
2 /

1 row created.

SCOTT@10gXE> EXEC CTX_DDL.SYNC_INDEX ('customer_text_idx')

PL/SQL procedure successfully completed.

SCOTT@10gXE> COLUMN token_text FORMAT A30
SCOTT@10gXE> SELECT token_text, token_type, token_first, token_last, token_count
2 FROM dr$customer_text_idx$i
3 /

TOKEN_TEXT TOKEN_TYPE TOKEN_FIRST TOKEN_LAST TOKEN_COUNT

----------
----------
BOB 0 1 2 2
CA 0 1 2 2
JONES 0 2 2 1
NOPLACE 0 1 1 1
NOWHERE 0 1 1 1
SMITH 0 1 2 2
SOMEWHERE 0 2 2 1
ST 0 2 2 1

8 rows selected.

SCOTT@10gXE> SELECT c.first_name, c.last_name, a.street, a.city
2 FROM customers c, addresses a
3 WHERE c.customer_id = a.customer_id
4 AND CONTAINS (C.dummy, 'Smith') > 0
5 /

FIRST_NAME LAST_NAME STREET CITY

---------------
---------------
Bob Smith Noplace Nowhere
Bob Jones Smith St. Somewhere

SCOTT@10gXE> -- dummy is not updated, so the index is not synchronized:
SCOTT@10gXE> UPDATE addresses
2 SET city = 'Anywhere'
3 WHERE city = 'Somewhere'
4 /

1 row updated.

SCOTT@10gXE> EXEC CTX_DDL.SYNC_INDEX ('customer_text_idx')

PL/SQL procedure successfully completed.

SCOTT@10gXE> SELECT token_text, token_type, token_first, token_last, token_count
2 FROM dr$customer_text_idx$i
3 /

TOKEN_TEXT TOKEN_TYPE TOKEN_FIRST TOKEN_LAST TOKEN_COUNT

----------
----------
BOB 0 1 2 2
CA 0 1 2 2
JONES 0 2 2 1
NOPLACE 0 1 1 1
NOWHERE 0 1 1 1
SMITH 0 1 2 2
SOMEWHERE 0 2 2 1
ST 0 2 2 1

8 rows selected.

SCOTT@10gXE> SELECT c.first_name, c.last_name, a.street, a.city
2 FROM customers c, addresses a
3 WHERE c.customer_id = a.customer_id
4 AND CONTAINS (C.dummy, 'Anywhere') > 0
5 /

no rows selected

SCOTT@10gXE> -- once dummy is updated, the index is synchronized:
SCOTT@10gXE> UPDATE customers
2 SET dummy = NULL
3 WHERE customer_id = 2
4 /

1 row updated.

SCOTT@10gXE> EXEC CTX_DDL.SYNC_INDEX ('customer_text_idx')

PL/SQL procedure successfully completed.

SCOTT@10gXE> SELECT token_text, token_type, token_first, token_last, token_count
2 FROM dr$customer_text_idx$i
3 /

TOKEN_TEXT TOKEN_TYPE TOKEN_FIRST TOKEN_LAST TOKEN_COUNT

----------
----------
BOB 0 1 2 2
CA 0 1 2 2
JONES 0 2 2 1
NOPLACE 0 1 1 1
NOWHERE 0 1 1 1
SMITH 0 1 2 2
SOMEWHERE 0 2 2 1
ST 0 2 2 1
ANYWHERE 0 3 3 1
BOB 0 3 3 1
CA 0 3 3 1
JONES 0 3 3 1
SMITH 0 3 3 1
ST 0 3 3 1

14 rows selected.

SCOTT@10gXE> SELECT c.first_name, c.last_name, a.street, a.city
2 FROM customers c, addresses a
3 WHERE c.customer_id = a.customer_id
4 AND CONTAINS (C.dummy, 'Anywhere') > 0
5 /

FIRST_NAME LAST_NAME STREET CITY

---------------
---------------
Bob Jones Smith St. Anywhere

SCOTT@10gXE> -- notice the changes if the index is rebuilt or dropped and recreated,
SCOTT@10gXE> ALTER INDEX customer_text_idx REBUILD
2 /

Index altered.

SCOTT@10gXE> SELECT token_text, token_type, token_first, token_last, token_count
2 FROM dr$customer_text_idx$i
3 /

TOKEN_TEXT TOKEN_TYPE TOKEN_FIRST TOKEN_LAST TOKEN_COUNT

----------
----------
ANYWHERE 0 2 2 1
BOB 0 1 2 2
CA 0 1 2 2
JONES 0 2 2 1
NOPLACE 0 1 1 1
NOWHERE 0 1 1 1
SMITH 0 1 2 2
ST 0 2 2 1

8 rows selected.

SCOTT@10gXE> SELECT c.first_name, c.last_name, a.street, a.city
2 FROM customers c, addresses a
3 WHERE c.customer_id = a.customer_id
4 AND CONTAINS (C.dummy, 'Anywhere') > 0
5 /

FIRST_NAME LAST_NAME STREET CITY

---------------
---------------
Bob Jones Smith St. Anywhere

SCOTT@10gXE>
Legend
Guru Guru : 2500 - 1000000 pts
Expert Expert : 1000 - 2499 pts
Pro Pro : 500 - 999 pts
Journeyman Journeyman : 200 - 499 pts
Newbie Newbie : 0 - 199 pts
Oracle ACE Director
Oracle ACE Member
Oracle Employee ACE
Helpful Answer (5 pts)
Correct Answer (10 pts)

Point your RSS reader here for a feed of the latest messages in all forums