Thread: Using Oracle Text with Apex


Permlink Replies: 23 - Pages: 2 [ Previous | 1 2 ] - Last Post: Mar 14, 2008 4:08 AM Last Post By: Sohil Bhavsar
VANJ

Posts: 6,048
Registered: 03/18/04
Re: Using Oracle Text with Apex
Posted: May 25, 2007 7:51 AM   in response to: dccase in response to: dccase
Click to report abuse...   Click to reply to this thread Reply
roll your own SNIPPET procedure using the offsets returned by CTX_DOC.HIGHLIGHT

Hm, maybe I am being dense, but I don't see how that can be done. Mind giving me another nudge in the right direction? Thanks!

[Ideally, I would display those snippets right on the main search results page itself, search engine style]
dccase

Posts: 1,051
Registered: 11/01/98
Re: Using Oracle Text with Apex
Posted: May 25, 2007 8:12 AM   in response to: VANJ in response to: VANJ
Click to report abuse...   Click to reply to this thread Reply
I haven't tried this, so you're on your own...

The output of CTX_DOC.HIGHLIGHT is a list of offsets (starting and ending positions) of the search results in your document.

You could probably loop through those offsets and grab a chunk of text starting X characters before and Y characters after each match.

SUBSTR can work on CLOB's in 9i.
Dimitri Gielis

Posts: 1,859
Registered: 04/21/07
Re: Using Oracle Text with Apex
Posted: May 27, 2007 2:03 AM   in response to: dccase in response to: dccase
Click to report abuse...   Click to reply to this thread Reply
This thread was spotted to be included in the weekly APEX forum wrap-up.
(in the most interesting topics part)

Thanks,
Dimitri
-- APEX Forum wrap-up --
@ http://dgielis.blogspot.com/search/label/apex%20forum
VANJ

Posts: 6,048
Registered: 03/18/04
Re: Using Oracle Text with Apex
Posted: May 29, 2007 6:07 AM   in response to: dccase in response to: dccase
Click to report abuse...   Click to reply to this thread Reply
Just for completeness (and since Oracle Text introduced the ctx_doc.snippet function in 10g), here is a DIY version of the function for use in pre-10g versions.

Basically, as Doug suggested, it just uses the builtin ctx_doc.highlight API and loops over the result and returns some text before and after the matches, with some optional parameters thrown in for good measure. The search string is highlighted, the "context" before and after it is not.

Works like a charm. Hope this helps.

    FUNCTION snippet (
p_clob IN CLOB,
p_index_name IN VARCHAR2,
p_textkey IN VARCHAR2,
p_text_query IN VARCHAR2,
p_start_context IN INTEGER DEFAULT 25,
p_end_context IN INTEGER DEFAULT 25,
p_max_matches IN INTEGER DEFAULT 5,
p_starttag IN VARCHAR2 DEFAULT '',
p_endtag IN VARCHAR2 DEFAULT '
',
p_delimiter IN VARCHAR2 DEFAULT '
'
)
RETURN VARCHAR2
IS
l_restab ctx_doc.highlight_tab;
l_snippet VARCHAR2(32767);
l_start INTEGER;
l_length INTEGER;
l_clob_size INTEGER;
l_matched VARCHAR2(32767);
BEGIN
ctx_doc.set_key_type('PRIMARY_KEY');

ctx_doc.highlight (
index_name => p_index_name,
textkey => p_textkey,
text_query => p_text_query,
restab => l_restab
);

l_clob_size := dbms_lob.getlength(p_clob);
FOR i IN 1..least(l_restab.count,p_max_matches)
LOOP
l_start := l_restab(i).offset - p_start_context;
IF (l_start <= 0)
THEN
l_start := 1;
END IF;

l_length := l_restab(i).length + p_end_context;
IF (l_length + l_restab(i).offset >= l_clob_size)
THEN
l_length := l_clob_size - l_restab(i).offset + 1;
END IF;

l_matched := p_starttag || dbms_lob.substr(p_clob,l_restab(i).length,l_restab(i).offset) || p_endtag;
l_snippet := l_snippet || p_delimiter || dbms_lob.substr(p_clob,l_restab(i).offset-l_start-1,l_start) || l_matched || dbms_lob.substr(p_clob,l_length,l_restab(i).offset+l_restab(i).length);
IF (length(l_snippet) > 4000)
THEN
RETURN l_snippet;
END IF;
END LOOP;

RETURN l_snippet;
END snippet;

Hope this helps
abe50

Posts: 64
Registered: 09/18/06
Re: Using Oracle Text with Apex
Posted: Aug 11, 2007 5:12 PM   in response to: VANJ in response to: VANJ
Click to report abuse...   Click to reply to this thread Reply
Vikas,

When you have time, could you create some kind of example on your example application page. If not, could you tell us the steps on how to implement it.

Thanks,
-Abe
VANJ

Posts: 6,048
Registered: 03/18/04
Re: Using Oracle Text with Apex
Posted: Aug 13, 2007 7:55 AM   in response to: abe50 in response to: abe50
Click to report abuse...   Click to reply to this thread Reply
I cannot create an example on apex.oracle.com because it doesn't provide access to the server filesystem and/or create directory objects.

Here are the steps

1.

Create the table to store the documents. Load the documents into the table.

CREATE TABLE documents
(
REC_ID INTEGER PRIMARY KEY,
FILENAME VARCHAR2(100) UNIQUE,
doc_clob CLOB DEFAULT empty_clob()
);
insert into documents(rec_id,filename) values (1,'file1.txt');
insert into documents(rec_id,filename) values (2,'file2.txt');
insert into documents(rec_id,filename) values (3,'file3.txt');
BEGIN
FOR rec IN (SELECT filename,doc_clob FROM documents ORDER BY rec_id)
LOOP
lob_util.load_file('my_dir',rec.filename,rec.doc_clob);
END LOOP;
END;
/

2. The LOB_UTIL package contains a bunch of utility functions. The source is

CREATE OR REPLACE PACKAGE lob_util
AS
PROCEDURE load_file(
p_directory IN VARCHAR2,
p_filename IN VARCHAR2,
p_lob IN OUT CLOB
);

FUNCTION parse_search_string (p_search_str IN VARCHAR2)
RETURN VARCHAR2;

PROCEDURE markup (
p_index_name IN VARCHAR2,
p_textkey IN VARCHAR2,
p_text_query IN VARCHAR2,
p_restab IN OUT CLOB,
p_starttag IN VARCHAR2 DEFAULT '[span class="hl"]',
p_endtag IN VARCHAR2 DEFAULT '[/span]'
);

FUNCTION snippet (
p_clob IN CLOB,
p_index_name IN VARCHAR2,
p_textkey IN VARCHAR2,
p_text_query IN VARCHAR2,
p_start_context IN INTEGER DEFAULT 25,
p_end_context IN INTEGER DEFAULT 25,
p_max_matches IN INTEGER DEFAULT 5,
p_starttag IN VARCHAR2 DEFAULT '[span class="hl"]',
p_endtag IN VARCHAR2 DEFAULT '[/span]',
p_delimiter IN VARCHAR2 DEFAULT '<br/]'
)
RETURN VARCHAR2;
END lob_util;
/
CREATE OR REPLACE PACKAGE BODY lob_util
AS
PROCEDURE load_file(
p_directory IN VARCHAR2,
p_filename IN VARCHAR2,
p_lob IN OUT CLOB
)
IS
l_bfile BFILE;
l_dest_offset INTEGER := 1;
l_src_offset INTEGER := 1;
l_lang INTEGER := 0;
l_csid INTEGER := 0;
l_warning INTEGER := 0;
BEGIN
l_bfile := bfilename(upper(p_directory),p_filename);
dbms_lob.fileopen(l_bfile);
dbms_lob.loadclobfromfile(p_lob,l_bfile,dbms_lob.getlength(l_bfile),l_dest_offset,l_src_offset,l_csid,l_lang,l_warning);
dbms_lob.fileclose(l_bfile);
END load_file;

FUNCTION parse_search_string( p_search_str IN VARCHAR2 )
RETURN VARCHAR2
is
l_temp_value varchar2(32767);
l_temp_value2 varchar2(32767);
l_return_value varchar2(32767) := NULL;
l_start_token number := 1;
l_in_token boolean := FALSE;
l_num_tokens number := 0;
l_quotes number;
l_phrases dbms_sql.varchar2s;
n number;
begin
if nvl(length(p_search_str),0) = 0 then
return NULL;
end if;

l_quotes := length(p_search_str) - length(replace(p_search_str,'"', ''));
if ( l_quotes > 0 and mod(l_quotes,2) = 0 )
then
l_temp_value2 := lower(p_search_str);
for i in 1 .. l_quotes/2
loop
n := instr( l_temp_value2, '"' );
l_temp_value := l_temp_value || substr( l_temp_value2, 1, n-1 );
l_temp_value2 := substr( l_temp_value2, n+1 );
n := instr( l_temp_value2, '"' );
l_phrases(i) := substr( l_temp_value2, 1, n-1 );
l_temp_value2 := substr( l_temp_value2, n+1 );
end loop;
l_temp_value := l_temp_value || l_temp_value2;
else
l_temp_value := lower(p_search_str);
end if;

--
l_temp_value := trim(replace( l_temp_value, '{', ' '));
l_temp_value := trim(replace( l_temp_value, '}', ' '));
l_temp_value := trim(replace( l_temp_value, ':', ' '));
l_temp_value := trim(replace( l_temp_value, ';', ' '));
l_temp_value := trim(replace( l_temp_value, '"', ' '));
l_temp_value := trim(replace( l_temp_value, ':'',', ' '));
l_temp_value := trim(replace( l_temp_value, '(', ' '));
l_temp_value := trim(replace( l_temp_value, ')', ' '));
l_temp_value := trim(replace( l_temp_value, '!', ' '));
l_temp_value := trim(replace( l_temp_value, '&', ' '));
l_temp_value := trim(replace( l_temp_value, '+', ' '));
l_temp_value := trim(replace( l_temp_value, '\', ' '));
l_temp_value := trim(replace( l_temp_value, '-', ' '));
l_temp_value := trim(replace( l_temp_value, ',', ' '));
l_temp_value := trim(replace( l_temp_value, ' and ', ' ' ));
l_temp_value := trim(replace( l_temp_value, ' or ', ' ' ));
--
if length(l_temp_value) > 0 then
l_in_token := TRUE;
end if;
--
for i in 1..nvl(length(l_temp_value),0) loop
if substr(l_temp_value,i,1) = ' ' then
if l_in_token = TRUE then
l_return_value := l_return_value || '{' ||
substr(l_temp_value,l_start_token, i-l_start_token) || '}' || ' and ';
l_in_token := FALSE;
l_num_tokens := l_num_tokens + 1;
end if;
elsif l_in_token = FALSE then
l_in_token := TRUE;
l_start_token := i;
end if;
exit when l_num_tokens > 30;
end loop;
if l_in_token = TRUE then
l_return_value := l_return_value || '{' || substr(l_temp_value,l_start_token) || '} and ';
end if;

for i in 1 .. nvl(l_phrases.count,0)
loop
if ( l_phrases(i) is not null )
then
l_return_value := l_return_value || ' {' || l_phrases(i) || '} and ';
end if;
end loop;
return trim( substr( l_return_value, 1, length(l_return_value)-4 ) );
end parse_search_string;

PROCEDURE markup (
p_index_name IN VARCHAR2,
p_textkey IN VARCHAR2,
p_text_query IN VARCHAR2,
p_restab IN OUT CLOB,
p_starttag IN VARCHAR2 DEFAULT '[span class="hl"]',
p_endtag IN VARCHAR2 DEFAULT '[/span]'
)
IS
BEGIN
ctx_doc.set_key_type('PRIMARY_KEY');
ctx_doc.markup (
index_name => p_index_name,
textkey => p_textkey,
text_query => p_text_query,
restab => p_restab,
starttag => p_starttag,
endtag => p_endtag
);
END markup;

FUNCTION snippet (
p_clob IN CLOB,
p_index_name IN VARCHAR2,
p_textkey IN VARCHAR2,
p_text_query IN VARCHAR2,
p_start_context IN INTEGER DEFAULT 25,
p_end_context IN INTEGER DEFAULT 25,
p_max_matches IN INTEGER DEFAULT 5,
p_starttag IN VARCHAR2 DEFAULT '[span class="hl"]',
p_endtag IN VARCHAR2 DEFAULT '[/span]',
p_delimiter IN VARCHAR2 DEFAULT '<br/]'
)
RETURN VARCHAR2
IS
l_restab ctx_doc.highlight_tab;
l_snippet VARCHAR2(32767);
l_start INTEGER;
l_end INTEGER;
l_length INTEGER;
l_clob_size INTEGER;
l_matched VARCHAR2(32767);
l_clob CLOB;
BEGIN
ctx_doc.set_key_type('PRIMARY_KEY');

/*
ctx_doc.markup (
index_name => p_index_name,
textkey => p_textkey,
text_query => p_text_query,
restab => l_clob,
starttag => p_starttag,
endtag => p_endtag
);

l_clob_size := dbms_lob.getlength(l_clob);
FOR i IN 1..p_max_matches LOOP
l_start := dbms_lob.instr(l_clob,p_starttag,1,i);
IF (l_start = 0)
THEN
EXIT;
END IF;

l_start := l_start - p_start_context;
IF (l_start <= 0)
THEN
l_start := 1;
END IF;

l_end := dbms_lob.instr(l_clob,p_endtag,1,i) + length(p_endtag) + p_end_context;
IF (l_end > l_clob_size)
THEN
l_end := l_clob_size;
END IF;

l_snippet := l_snippet || p_delimiter || dbms_lob.substr(l_clob,l_end-l_start+1,l_start);
IF (length(l_snippet) > 4000)
THEN
RETURN ltrim(l_snippet,p_delimiter);
END IF;
END LOOP;

RETURN ltrim(l_snippet,p_delimiter);
*/

ctx_doc.highlight (
index_name => p_index_name,
textkey => p_textkey,
text_query => p_text_query,
restab => l_restab
);

l_clob_size := dbms_lob.getlength(p_clob);
FOR i IN 1..least(l_restab.count,p_max_matches)
LOOP
l_start := l_restab(i).offset - p_start_context;
IF (l_start <= 0)
THEN
l_start := 1;
END IF;

l_length := l_restab(i).length + p_end_context;
IF (l_length + l_restab(i).offset >= l_clob_size)
THEN
l_length := l_clob_size - l_restab(i).offset + 1;
END IF;

l_matched := p_starttag || dbms_lob.substr(p_clob,l_restab(i).length,l_restab(i).offset) || p_endtag;
l_snippet := l_snippet || p_delimiter || dbms_lob.substr(p_clob,l_restab(i).offset-l_start-1,l_start) || l_matched || dbms_lob.substr(p_clob,l_length,l_restab(i).offset+l_restab(i).length);
IF (length(l_snippet) > 4000)
THEN
RETURN l_snippet;
END IF;
END LOOP;

RETURN l_snippet;
END snippet;
BEGIN
NULL;
END lob_util;
/


3. The APEX report region query on the table is

select
rec_id,
score(1) score,
filename,
dbms_lob.getlength(doc_clob) filesize,
lob_util.snippet(doc_clob,'text_idx',rec_id,:P300_SEARCH_STRING,:P302_START_CONTEXT,:P302_END_CONTEXT) snippet
from documents
where 1=1
and contains(doc_clob,:P300_SEARCH_STRING,1) > 0
order by 2 desc,3 asc

text_idx is a simple Text index on the CLOB column created using

create index text_idx on documents(doc_clob) indextype is ctxsys.context;

Hope this helps.
abe50

Posts: 64
Registered: 09/18/06
Re: Using Oracle Text with Apex
Posted: Aug 13, 2007 9:15 AM   in response to: VANJ in response to: VANJ
Click to report abuse...   Click to reply to this thread Reply
Vikas,

Many Many Thanks. I can't wait to try it.

abe
Sohil Bhavsar

Posts: 12
Registered: 02/27/08
Re: Using Oracle Text with Apex
Posted: Mar 14, 2008 3:59 AM   in response to: dccase in response to: dccase
Click to report abuse...   Click to reply to this thread Reply
I have made a search application using oracle text's BFILE to search

from word documents. It will give doc filename as a result and then I will provide

facility to download it from server. It is working fine. But I want to display results

like GOOGLE i.e. by highlighting the search keyword in a paragraph.

Can it be possible using Oracle Text?

if possible plz give more details with example,
os:windows xp ,
oracle 10g.
Sohil Bhavsar

Posts: 12
Registered: 02/27/08
Re: Using Oracle Text
Posted: Mar 14, 2008 4:08 AM   in response to: dccase in response to: dccase
Click to report abuse...   Click to reply to this thread Reply
I have made a search application using oracle text's BFILE to search

from word documents. It will give doc filename as a result and then I will provide

facility to download it from server. It is working fine. But I want to display results

like GOOGLE i.e. by highlighting the search keyword in a paragraph.

Can it be possible using Oracle Text?

if possible plz give more details with example,
os:windows xp ,
oracle 10g.
Legend
Guru Guru : 2500 - 1000000 pts
Expert Expert : 1000 - 2499 pts
Pro Pro : 500 - 999 pts
Journeyman Journeyman : 200 - 499 pts
Newbie Newbie : 0 - 199 pts
Oracle ACE Director
Oracle ACE Member
Oracle Employee ACE
Helpful Answer (5 pts)
Correct Answer (10 pts)

Point your RSS reader here for a feed of the latest messages in all forums