Skip to Main Content

Database Software

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Oracle Text 11 - Auto Lexer Issue, Language Detection, Alternate Spelling

880175Aug 5 2011 — edited Aug 10 2011
Hello,
I'm trying to set up an auto lexer to index documents in English, French, German, Italian and maybe Spanish. I encounter several problems however, even though some might be to misuse of Oracle Text as I'm still learning. I would greatly welcome advice and help from the fellow expert around.

-- Version
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
PL/SQL Release 11.2.0.2.0 - Production
CORE 11.2.0.2.0 Production
TNS for Solaris: Version 11.2.0.2.0 - Production
NLSRTL Version 11.2.0.2.0 - Production

--Create a Table
create table mytest(id number primary key, docs clob, lang VARCHAR2(30));

--Populate the Table with Data
INSERT INTO mytest VALUES(1, 'Je vais mourir', 'french');
INSERT INTO mytest VALUES(2, 'Le chien est mort', 'french');
INSERT INTO mytest VALUES(3, 'Le chien était mort', 'french');
INSERT INTO mytest VALUES(4, 'Il est content', 'french');
INSERT INTO mytest VALUES(5, 'Nous sommes heureux', 'french');
INSERT INTO mytest VALUES(6, 'Il fait beau aujourd''hui', 'french');
INSERT INTO mytest VALUES(7, 'Rotes Auto', 'german');
INSERT INTO mytest VALUES(8, 'Roter Zug', 'german');
INSERT INTO mytest VALUES(9, 'Grün, Blau, Rot', 'german');
INSERT INTO mytest VALUES(10, 'Ich bin zufrieden', 'german');
INSERT INTO mytest VALUES(11, 'des seins', 'french');
INSERT INTO mytest VALUES(12, 'Hauptbahnhof', 'german');
INSERT INTO mytest VALUES(13, 'Lokomotivführer', 'german');
commit;

-- Create Index
begin
ctx_ddl.create_preference('single_lexer','auto_lexer');
ctx_ddl.set_attribute('single_lexer','mixed_case','no');
ctx_ddl.set_attribute('single_lexer','base_letter','no');
ctx_ddl.set_attribute('single_lexer','base_letter_type','SPECIFIC');
ctx_ddl.set_attribute('single_lexer', 'index_stems', 'YES');
ctx_ddl.set_attribute('single_lexer', 'german_decompound', 'YES');
ctx_ddl.set_attribute('single_lexer', 'alternate_spelling', 'GERMAN');
end;

drop index myindex;
create index myindex on mytest(docs)
indextype is ctxsys.context
parameters ('LANGUAGE COLUMN lang LEXER single_lexer');

-- Problems
1) Id#7 is indexed under the Spanish word 'rotar' instead of the German word 'rot' analog to Id#8 and Id#9. I believed in specifying a language column I would force the lexer to use the correct stemming dictionary. Is it a bug? Or did I miss something?

2) Query not giving expected result
SELECT SCORE(1), id, lang, docs
FROM mytest
WHERE CONTAINS(docs, '<query><textquery lang="german">Gruen</textquery></query>', 1) > 0;
--> No result
This query should output 1 result (namely Id#9), according to the alternate_spelling=GERMAN parameter. Is it a bug? Or did I miss something?

3) Composite German words in Id#12(Hauptbahnhof) und Id#13(Lokomotivführer) should be decomposed in "Haupt", "Bahnhof", "Lokomotiv", and "Führer" according to the german_decompound=YES parameter. In fact only Id#13(Lokomotivführer) is correctly decomposed. Previously I had no problem decomposing both words when using the composite=GERMAN parameter of the BASIC_LEXER. Is it a bug? Or did I miss something?

Thanks you very much for any hints or answers that would make me closer to the solution.

Frederic
Comments
Locked Post
New comments cannot be posted to this locked post.
Post Details
Locked on Sep 7 2011
Added on Aug 5 2011
5 comments
1,187 views