Hello everybody,
Sorry to bother you with my questions again. I hope someone could help and provide some feedback. I'm trying to get my hands on Oracle Text. To make it easy for you to understand, I'd like first to put in place a couple of tables and some sample data.
drop table docs;
drop table doc_names;
drop table doc_refs;
drop procedure doc_user_ds_proc;
drop index doc_ftidx;
exec ctx_ddl.drop_preference('doc_uds');
exec ctx_ddl.drop_preference('global_lexer');
exec ctx_ddl.drop_preference('english_lexer');
exec ctx_ddl.drop_preference('french_lexer');
exec ctx_ddl.drop_preference('custom_filter');
CREATE TABLE docs
(
id number,
status varchar2(100),
cat varchar2(100)
);
CREATE TABLE doc_names
(
id number,
doc_id number,
lang varchar2(3),
name varchar2(100)
);
CREATE TABLE doc_refs
(
id number,
doc_id NUMBER,
ref varchar2(100)
);
insert into docs(id, status, cat) values(1, 'active', 'pdf');
insert into docs(id, status, cat) values(2, 'inactive', 'word');
insert into doc_names(id, doc_id, lang, name) values(1, 1, 'eng', 'name of doc1 in english');
insert into doc_names(id, doc_id, lang, name) values(2, 1, 'jpn', 'name of doc1 in japanese');
insert into doc_names(id, doc_id, lang, name) values(3, 1, 'fre', 'name of doc1 in french');
insert into doc_names(id, doc_id, lang, name) values(4, 2, 'eng', 'name of doc2 in english');
insert into doc_refs(id, doc_id, ref) values(1, 1, 'doc-1-ref1');
insert into doc_refs(id, doc_id, ref) values(2, 1, 'doc-1-ref2');
insert into doc_refs(id, doc_id, ref) values(2, 2, 'doc-2-ref1');
commit;
I have a table with some documents that have an id, a status and a category. I also have some tables that are child tables(names and references). I would like to create an index allowing me to search on a set of fields from one on these three tables.
So, I've created a procedure to create an xml from the tables; something like this:
CREATE OR REPLACE PROCEDURE doc_user_ds_proc
(
rid in rowid,
tlob in out nocopy clob
)
IS
BEGIN
SELECT XMLELEMENT("docs",
XMLAGG(
XMLELEMENT("doc",
XMLATTRIBUTES(d.id AS "id"),
XMLFOREST(
d.id AS "id",
d.status AS "status",
d.cat AS "category",
(SELECT XMLAGG(
XMLFOREST(
ref as "ref"
)
)
FROM doc_refs r
WHERE r.doc_id = d.id
) AS "refs",
(SELECT XMLAGG(
XMLFOREST(
n.lang as "lang",
name as "name"
)
) AS "na"
FROM doc_names n
WHERE n.doc_id = d.id
) AS "names"
)
)
)
).getClobVal()
INTO tlob
FROM docs d;
END;
/
Then I create my user datastore with the procedure:
begin
ctx_ddl.create_preference('doc_uds', 'user_datastore');
ctx_ddl.set_attribute('doc_uds', 'procedure', 'doc_user_ds_proc');
ctx_ddl.set_attribute('doc_uds', 'output_type', 'CLOB');
end;
/
And some preferences...
BEGIN
ctx_ddl.create_preference('english_lexer','basic_lexer');
ctx_ddl.set_attribute('english_lexer','index_themes','yes');
ctx_ddl.set_attribute('english_lexer','theme_language','english');
ctx_ddl.create_preference('french_lexer','basic_lexer');
ctx_ddl.set_attribute('french_lexer','index_themes','yes');
ctx_ddl.set_attribute('french_lexer','theme_language','french');
ctx_ddl.create_preference('global_lexer', 'multi_lexer');
ctx_ddl.add_sub_lexer('global_lexer','default','english_lexer');
ctx_ddl.add_sub_lexer('global_lexer', 'french', 'french_lexer','fre');
ctx_ddl.create_preference('custom_filter', 'NULL_FILTER');
END;
/
Then I create my context index:
CREATE INDEX doc_ftidx ON docs(status)
INDEXTYPE IS CTXSYS.CONTEXT
PARAMETERS (
'DATASTORE doc_uds
FILTER custom_filter');
This is all fine. If I query the table DR$DOC_FTIDX$I, I see keywords in there. And, in the $K table, I see two rowids. Now, I was wondering...
1- As explained, I want to create an index on multiple tables and multiple columns. So, to me, it feels wrong creating an index on status (or any text field in my docs table). So, is this a common approach?
2- Imagine I want to only index active documents? Would it be fine if I add in my procedure doc_user_ds_proc a WHERE clause (where status = 'active')? What happens for documents that change status (from active to inactive)? How index will know to remove some keywords?
3- How can I specify the languages for the names? If I put in my index
LEXER global_lexer language column lang
I get an exception like DRG-10582: column LANG does not exist in table DOCS. So, the language should be in the "base" table? Is there any way to specify it that it's coming from another table?
4- Is it possible to have snippets with this approach? When I try the following query, it produces an error... Is there a way to debug the query? And understand why it fails?
SELECT id, status, SCORE(99),
ctx_doc.snippet(
index_name => 'doc_ftidx',
textkey => rowid,
text_query => 'french',
starttag => '<b>',
endtag => '</b>'
)
FROM docs
WHERE contains(status, 'fuzzy(french) INPATH (/)', 99) > 0
ORDER BY SCORE(99) DESC;
5- Another question would be... Imagine I want to have a results by category. I saw we can have facets and it seems that it does the trick. But, before I dig this, I was wondering if I could do that with context indexes or I need a different type of index?
I know it's a lot of questions! I've been reading documentation (https://docs.oracle.com/en/database/oracle/oracle-database/21/ccapp/understanding-oracle-text-application-development.html#GUID-CF13C01A-F5E6-4EF5-839B-C09CF0024D5E) but the examples in documentation are not providing me answers. Any advice on how to proceed? All suggestion is welcome.
I'm using Oracle 19c and this is all new to me... So, please, be forgiving if my questions may look stupid.