Skip to Main Content

Database Software

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Is it possible to get tags correctly with snippet while indexing html document ?

thibault daucourtJun 27 2017 — edited Sep 4 2017

Good day sir,

I’m making search application using Oracle Text. This application search in HTML documents

I want to obtain an extract containing the queried word and the other words close to him. The objective is to give a context to the users.

The documents used contain custom fonts to display the custom characters. Ideally, I want to find a way to keep the tagging in the extract.

To achieve this, I wanted to use CTX_DOC.SNIPPET which seems to play this role in the Oracle Text logic.

After several tests with my documents, snippet seems not be up to the task. It often returns almost only tags and the query word. Worst the tags aren’t complete or even open and close correctly.

Sometimes there are no tags.

the following image show one of the result:

pastedImage_0.png

Objectively it can’t be used like this in the application.

Later, I discovered I can force snippet to ignore the tags by using section group mechanism in my index.

So my problem is the following: Is there a way to get both the words and the tags correctly with snippet?

By the way, I work with Oracle XE 11g.

I use the following codes :

/*Lexer*/

EXEC ctx_ddl.drop_preference('lexerTB') ;

EXEC CTX_DDL.CREATE_PREFERENCE(' lexerTB ', 'BASIC_LEXER');

EXEC ctx_ddl.set_attribute(' lexerTB ', 'BASE_LETTER', 'YES');

EXEC ctx_ddl.set_attribute(' lexerTB ', 'MIXED_CASE', 'NO');

EXEC ctx_ddl.set_attribute(' lexerTB ', 'INDEX_THEMES', 'NO');

EXEC ctx_ddl.set_attribute(' lexerTB ', 'INDEX_TEXT', 'YES');

/* base index*/

create index indexTB on articles (article)

indextype is ctxsys.context

parameters ('DATASTORE ctxsys.default_datastore

              LEXER lexerTB

              FILTER ctxsys.AUTO\_FILTER');

/*Section*/

EXEC ctx_ddl.create_section_group('htmgroup','HTML_SECTION_GROUP');

EXEC ctx_ddl.add_zone_section('htmgroup', 'span', 'span');

/*second index */

create index indexTB2 on articles (article)

indextype is ctxsys.context

parameters ('DATASTORE ctxsys.default_datastore

              LEXER lexerTB

              FILTER ctxsys.null\_filter

              section group htmgroup') ;
This post has been answered by Roger Ford-Oracle on Jun 27 2017
Jump to Answer
Comments
Locked Post
New comments cannot be posted to this locked post.
Post Details
Locked on Oct 2 2017
Added on Jun 27 2017
3 comments
370 views