Thread: Weighting the results of a text search


Permlink Replies: 5 - Pages: 1 - Last Post: Feb 4, 2008 10:01 AM Last Post By: janelson@hormel...
janelson@hormel...

Posts: 39
Registered: 09/24/02
Weighting the results of a text search
Posted: Jan 31, 2008 12:26 PM
Click to report abuse...   Click to reply to this thread Reply
I have an application that is using Oracle Text to search the database for similar customer names. What I am trying to do is give the developers a result-set with the most relavent matches listed first, but I cannot figure out how to score/weight the results.
Example:
Here is my query:
select * from operator z
where contains
(concat,'hilton hotel within {the_operator_name}',1 ) > 0

It returns the following records in order:
OPERATOR_ID OPERATOR_NAME

----------------------------------
1000084 Hilton Hotel
1001397 HILTON HOTEL AIRPORT
1001472 Hilton Hotel & Conference Ctr.
1004356 Hilton Hotel/Avendra Eugene
1005660 Hilton Hotel

I would like both "Hilton Hotel" results (the first and last in the list) to return before any of the other results, as these are a perfect match and are most likely to be picked by the user.

Thanks in advance.
Barbara Boehmer

Posts: 2,840
Registered: 04/28/00
Re: Weighting the results of a text search
Posted: Feb 2, 2008 9:05 PM   in response to: janelson@hormel... in response to: janelson@hormel...
Click to report abuse...   Click to reply to this thread Reply
You could apply utl_match.edit_distance to the result set and order by that.
janelson@hormel...

Posts: 39
Registered: 09/24/02
Re: Weighting the results of a text search
Posted: Feb 4, 2008 6:48 AM   in response to: Barbara Boehmer in response to: Barbara Boehmer
Click to report abuse...   Click to reply to this thread Reply
Barbara, thanks for your response. Do you have a link to any examples of this, or to some good documentation that explains this?

Thanks again
Barbara Boehmer

Posts: 2,840
Registered: 04/28/00
Re: Weighting the results of a text search
Posted: Feb 4, 2008 9:34 AM   in response to: janelson@hormel... in response to: janelson@hormel...
Click to report abuse...   Click to reply to this thread Reply
Here is a link that provides a brief explanation:

http://www.psoug.org/reference/utl_match.html

If you search this forum, there are various examples. Here is one that I put together that should be similar to your situation.


SCOTT@orcl_11g> CREATE TABLE operator
2 (operator_id NUMBER,
3 operator_name VARCHAR2 (60),
4 concat VARCHAR2 (1))
5 /

Table created.

SCOTT@orcl_11g> SET DEFINE OFF SCAN OFF
SCOTT@orcl_11g> INSERT ALL
2 INTO operator VALUES (1000084, 'Hilton Hotel', NULL)
3 INTO operator VALUES (1001397, 'HILTON HOTEL AIRPORT', NULL)
4 INTO operator VALUES (1001472, 'Hilton Hotel & Conference Ctr.', NULL)
5 INTO operator VALUES (1004356, 'Hilton Hotel/Avendra Eugene', NULL)
6 INTO operator VALUES (1005660, 'Hilton Hotel', NULL)
7 INTO operator VALUES (9999999, 'some other operator name', NULL)
8 SELECT * FROM DUAL
9 /

6 rows created.

SCOTT@orcl_11g> BEGIN
2 CTX_DDL.CREATE_PREFERENCE ('your_datastore', 'MULTI_COLUMN_DATASTORE');
3 CTX_DDL.SET_ATTRIBUTE ('your_datastore', 'COLUMNS', 'operator_id, operator_name');
4 CTX_DDL.CREATE_SECTION_GROUP ('your_sec_group', 'BASIC_SECTION_GROUP');
5 CTX_DDL.ADD_FIELD_SECTION ('your_sec_group', 'the_operator_id', 'operator_id', TRUE);
6 CTX_DDL.ADD_FIELD_SECTION ('your_sec_group', 'the_operator_name', 'operator_name', TRUE);
7 END;
8 /

PL/SQL procedure successfully completed.

SCOTT@orcl_11g> CREATE INDEX your_index ON operator (concat)
2 INDEXTYPE IS CTXSYS.CONTEXT
3 PARAMETERS
4 ('DATASTORE your_datastore
5 SECTION GROUP your_sec_group')
6 /

Index created.

SCOTT@orcl_11g> COLUMN concat NOPRINT
SCOTT@orcl_11g> COLUMN operator_name FORMAT A30
SCOTT@orcl_11g> -- original query:
SCOTT@orcl_11g> select * from operator z
2 where contains
3 (concat,'hilton hotel within {the_operator_name}',1 ) > 0
4 /

OPERATOR_ID OPERATOR_NAME

------------------------------
1000084 Hilton Hotel
1001397 HILTON HOTEL AIRPORT
1001472 Hilton Hotel & Conference Ctr.
1004356 Hilton Hotel/Avendra Eugene
1005660 Hilton Hotel

SCOTT@orcl_11g> -- ordering by utl_match.edit_distance:
SCOTT@orcl_11g> select z.*, UTL_MATCH.EDIT_DISTANCE ('hilton hotel', LOWER (operator_name)) edit_dist
2 from operator z
3 where contains (concat,'hilton hotel within {the_operator_name}',1 ) > 0
4 ORDER BY edit_dist
5 /

OPERATOR_ID OPERATOR_NAME EDIT_DIST

------------------------------
1000084 Hilton Hotel 0
1005660 Hilton Hotel 0
1001397 HILTON HOTEL AIRPORT 8
1004356 Hilton Hotel/Avendra Eugene 15
1001472 Hilton Hotel & Conference Ctr. 18

SCOTT@orcl_11g> -- ordering by utl_match.edit_distance_similarity:
SCOTT@orcl_11g> select z.*, UTL_MATCH.EDIT_DISTANCE_SIMILARITY ('hilton hotel', LOWER (operator_name)) edit_dist_sim
2 from operator z
3 where contains (concat,'hilton hotel within {the_operator_name}',1 ) > 0
4 ORDER BY edit_dist_sim DESC
5 /

OPERATOR_ID OPERATOR_NAME EDIT_DIST_SIM

------------------------------
1000084 Hilton Hotel 100
1005660 Hilton Hotel 100
1001397 HILTON HOTEL AIRPORT 60
1004356 Hilton Hotel/Avendra Eugene 45
1001472 Hilton Hotel & Conference Ctr. 40

SCOTT@orcl_11g> -- ordering without selecting the column:
SCOTT@orcl_11g> select * from operator z
2 where contains (concat,'hilton hotel within {the_operator_name}',1 ) > 0
3 ORDER BY UTL_MATCH.EDIT_DISTANCE ('hilton hotel', LOWER (operator_name))
4 /

OPERATOR_ID OPERATOR_NAME

------------------------------
1000084 Hilton Hotel
1005660 Hilton Hotel
1001397 HILTON HOTEL AIRPORT
1004356 Hilton Hotel/Avendra Eugene
1001472 Hilton Hotel & Conference Ctr.

SCOTT@orcl_11g> select * from operator z
2 where contains (concat,'hilton hotel within {the_operator_name}',1 ) > 0
3 ORDER BY UTL_MATCH.EDIT_DISTANCE_SIMILARITY ('hilton hotel', LOWER (operator_name)) DESC
4 /

OPERATOR_ID OPERATOR_NAME

------------------------------
1000084 Hilton Hotel
1005660 Hilton Hotel
1001397 HILTON HOTEL AIRPORT
1004356 Hilton Hotel/Avendra Eugene
1001472 Hilton Hotel & Conference Ctr.

SCOTT@orcl_11g>
Barbara Boehmer

Posts: 2,840
Registered: 04/28/00
Re: Weighting the results of a text search
Posted: Feb 4, 2008 9:44 AM   in response to: janelson@hormel... in response to: janelson@hormel...
Click to report abuse...   Click to reply to this thread Reply
I can't tell from what you provided, what type of datastore or section group you have. So, here is another example that uses a different type of datastore and different type of section group, but it really doesn't matter what you are using as the ordering by utl_match.edit_distance is the same.

SCOTT@orcl_11g> CREATE TABLE operator
2 (operator_id NUMBER,
3 operator_name VARCHAR2 (60),
4 concat VARCHAR2 (1))
5 /

Table created.

SCOTT@orcl_11g> SET DEFINE OFF SCAN OFF
SCOTT@orcl_11g> INSERT ALL
2 INTO operator VALUES (1000084, 'Hilton Hotel', NULL)
3 INTO operator VALUES (1001397, 'HILTON HOTEL AIRPORT', NULL)
4 INTO operator VALUES (1001472, 'Hilton Hotel & Conference Ctr.', NULL)
5 INTO operator VALUES (1004356, 'Hilton Hotel/Avendra Eugene', NULL)
6 INTO operator VALUES (1005660, 'Hilton Hotel', NULL)
7 INTO operator VALUES (9999999, 'some other operator name', NULL)
8 SELECT * FROM DUAL
9 /

6 rows created.

SCOTT@orcl_11g> CREATE OR REPLACE PROCEDURE concat_proc
2 (p_rowid IN ROWID,
3 p_clob IN OUT NOCOPY CLOB)
4 AS
5 BEGIN
6 FOR r IN (SELECT * FROM operator WHERE ROWID = p_rowid) LOOP
7 DBMS_LOB.WRITEAPPEND (p_clob, 17, '<the_operator_id>');
8 DBMS_LOB.WRITEAPPEND (p_clob, LENGTH (r.operator_id), r.operator_id);
9 DBMS_LOB.WRITEAPPEND (p_clob, 18, '</the_operator_id>');
10 DBMS_LOB.WRITEAPPEND (p_clob, 19, '<the_operator_name>');
11 DBMS_LOB.WRITEAPPEND (p_clob, LENGTH (r.operator_name), r.operator_name);
12 DBMS_LOB.WRITEAPPEND (p_clob, 20, '</the_operator_name>');
13 END LOOP;
14 END concat_proc;
15 /

Procedure created.

SCOTT@orcl_11g> SHOW ERRORS
No errors.
SCOTT@orcl_11g> EXEC CTX_DDL.CREATE_PREFERENCE ('your_datastore', 'USER_DATASTORE')

PL/SQL procedure successfully completed.

SCOTT@orcl_11g> EXEC CTX_DDL.SET_ATTRIBUTE ('your_datastore', 'PROCEDURE', 'concat_proc')

PL/SQL procedure successfully completed.

SCOTT@orcl_11g> CREATE INDEX your_index ON operator (concat)
2 INDEXTYPE IS CTXSYS.CONTEXT
3 PARAMETERS
4 ('DATASTORE your_datastore
5 SECTION GROUP CTXSYS.AUTO_SECTION_GROUP')
6 /

Index created.

SCOTT@orcl_11g> COLUMN concat NOPRINT
SCOTT@orcl_11g> COLUMN operator_name FORMAT A30
SCOTT@orcl_11g> -- original query:
SCOTT@orcl_11g> select * from operator z
2 where contains
3 (concat,'hilton hotel within {the_operator_name}',1 ) > 0
4 /

OPERATOR_ID OPERATOR_NAME

------------------------------
1000084 Hilton Hotel
1001397 HILTON HOTEL AIRPORT
1001472 Hilton Hotel & Conference Ctr.
1004356 Hilton Hotel/Avendra Eugene
1005660 Hilton Hotel

SCOTT@orcl_11g> -- ordering by utl_match.edit_distance:
SCOTT@orcl_11g> select z.*, UTL_MATCH.EDIT_DISTANCE ('hilton hotel', LOWER (operator_name)) edit_dist
2 from operator z
3 where contains (concat,'hilton hotel within {the_operator_name}',1 ) > 0
4 ORDER BY edit_dist
5 /

OPERATOR_ID OPERATOR_NAME EDIT_DIST

------------------------------
1000084 Hilton Hotel 0
1005660 Hilton Hotel 0
1001397 HILTON HOTEL AIRPORT 8
1004356 Hilton Hotel/Avendra Eugene 15
1001472 Hilton Hotel & Conference Ctr. 18

SCOTT@orcl_11g> -- ordering by utl_match.edit_distance_similarity:
SCOTT@orcl_11g> select z.*, UTL_MATCH.EDIT_DISTANCE_SIMILARITY ('hilton hotel', LOWER (operator_name)) edit_dist_sim
2 from operator z
3 where contains (concat,'hilton hotel within {the_operator_name}',1 ) > 0
4 ORDER BY edit_dist_sim DESC
5 /

OPERATOR_ID OPERATOR_NAME EDIT_DIST_SIM

------------------------------
1000084 Hilton Hotel 100
1005660 Hilton Hotel 100
1001397 HILTON HOTEL AIRPORT 60
1004356 Hilton Hotel/Avendra Eugene 45
1001472 Hilton Hotel & Conference Ctr. 40

SCOTT@orcl_11g> -- ordering without selecting the column:
SCOTT@orcl_11g> select * from operator z
2 where contains (concat,'hilton hotel within {the_operator_name}',1 ) > 0
3 ORDER BY UTL_MATCH.EDIT_DISTANCE ('hilton hotel', LOWER (operator_name))
4 /

OPERATOR_ID OPERATOR_NAME

------------------------------
1000084 Hilton Hotel
1005660 Hilton Hotel
1001397 HILTON HOTEL AIRPORT
1004356 Hilton Hotel/Avendra Eugene
1001472 Hilton Hotel & Conference Ctr.

SCOTT@orcl_11g> select * from operator z
2 where contains (concat,'hilton hotel within {the_operator_name}',1 ) > 0
3 ORDER BY UTL_MATCH.EDIT_DISTANCE_SIMILARITY ('hilton hotel', LOWER (operator_name)) DESC
4 /

OPERATOR_ID OPERATOR_NAME

------------------------------
1000084 Hilton Hotel
1005660 Hilton Hotel
1001397 HILTON HOTEL AIRPORT
1004356 Hilton Hotel/Avendra Eugene
1001472 Hilton Hotel & Conference Ctr.

SCOTT@orcl_11g>
janelson@hormel...

Posts: 39
Registered: 09/24/02
Re: Weighting the results of a text search
Posted: Feb 4, 2008 10:01 AM   in response to: janelson@hormel... in response to: janelson@hormel...
Click to report abuse...   Click to reply to this thread Reply
Barbara, that is exactly what I needed! Thank you so much.
Legend
Guru Guru : 2500 - 1000000 pts
Expert Expert : 1000 - 2499 pts
Pro Pro : 500 - 999 pts
Journeyman Journeyman : 200 - 499 pts
Newbie Newbie : 0 - 199 pts
Oracle ACE Director
Oracle ACE Member
Oracle Employee ACE
Helpful Answer (5 pts)
Correct Answer (10 pts)

Point your RSS reader here for a feed of the latest messages in all forums