|
Replies:
5
-
Pages:
1
-
Last Post:
Feb 4, 2008 10:01 AM
Last Post By: janelson@hormel...
|
|
|
Posts:
39
Registered:
09/24/02
|
|
|
|
Weighting the results of a text search
Posted:
Jan 31, 2008 12:26 PM
|
|
|
|
I have an application that is using Oracle Text to search the database for similar customer names. What I am trying to do is give the developers a result-set with the most relavent matches listed first, but I cannot figure out how to score/weight the results.
Example:
Here is my query:
select * from operator z
where contains
(concat,'hilton hotel within {the_operator_name}',1 ) > 0
It returns the following records in order:
OPERATOR_ID OPERATOR_NAME
----------------------------------
1000084 Hilton Hotel
1001397 HILTON HOTEL AIRPORT
1001472 Hilton Hotel & Conference Ctr.
1004356 Hilton Hotel/Avendra Eugene
1005660 Hilton Hotel
I would like both "Hilton Hotel" results (the first and last in the list) to return before any of the other results, as these are a perfect match and are most likely to be picked by the user.
Thanks in advance.
|
|
|
Posts:
2,829
Registered:
04/28/00
|
|
|
|
Re: Weighting the results of a text search
Posted:
Feb 2, 2008 9:05 PM
in response to: janelson@hormel...
|
|
|
|
You could apply utl_match.edit_distance to the result set and order by that.
|
|
|
Posts:
39
Registered:
09/24/02
|
|
|
|
Re: Weighting the results of a text search
Posted:
Feb 4, 2008 6:48 AM
in response to: Barbara Boehmer
|
|
|
|
Barbara, thanks for your response. Do you have a link to any examples of this, or to some good documentation that explains this?
Thanks again
|
|
|
Posts:
2,829
Registered:
04/28/00
|
|
|
|
Re: Weighting the results of a text search
Posted:
Feb 4, 2008 9:34 AM
in response to: janelson@hormel...
|
|
|
Here is a link that provides a brief explanation:
http://www.psoug.org/reference/utl_match.html
If you search this forum, there are various examples. Here is one that I put together that should be similar to your situation.
SCOTT@orcl_11g> CREATE TABLE operator
2 (operator_id NUMBER,
3 operator_name VARCHAR2 (60),
4 concat VARCHAR2 (1))
5 /
Table created.
SCOTT@orcl_11g> SET DEFINE OFF SCAN OFF
SCOTT@orcl_11g> INSERT ALL
2 INTO operator VALUES (1000084, 'Hilton Hotel', NULL)
3 INTO operator VALUES (1001397, 'HILTON HOTEL AIRPORT', NULL)
4 INTO operator VALUES (1001472, 'Hilton Hotel & Conference Ctr.', NULL)
5 INTO operator VALUES (1004356, 'Hilton Hotel/Avendra Eugene', NULL)
6 INTO operator VALUES (1005660, 'Hilton Hotel', NULL)
7 INTO operator VALUES (9999999, 'some other operator name', NULL)
8 SELECT * FROM DUAL
9 /
6 rows created.
SCOTT@orcl_11g> BEGIN
2 CTX_DDL.CREATE_PREFERENCE ('your_datastore', 'MULTI_COLUMN_DATASTORE');
3 CTX_DDL.SET_ATTRIBUTE ('your_datastore', 'COLUMNS', 'operator_id, operator_name');
4 CTX_DDL.CREATE_SECTION_GROUP ('your_sec_group', 'BASIC_SECTION_GROUP');
5 CTX_DDL.ADD_FIELD_SECTION ('your_sec_group', 'the_operator_id', 'operator_id', TRUE);
6 CTX_DDL.ADD_FIELD_SECTION ('your_sec_group', 'the_operator_name', 'operator_name', TRUE);
7 END;
8 /
PL/SQL procedure successfully completed.
SCOTT@orcl_11g> CREATE INDEX your_index ON operator (concat)
2 INDEXTYPE IS CTXSYS.CONTEXT
3 PARAMETERS
4 ('DATASTORE your_datastore
5 SECTION GROUP your_sec_group')
6 /
Index created.
SCOTT@orcl_11g> COLUMN concat NOPRINT
SCOTT@orcl_11g> COLUMN operator_name FORMAT A30
SCOTT@orcl_11g> -- original query:
SCOTT@orcl_11g> select * from operator z
2 where contains
3 (concat,'hilton hotel within {the_operator_name}',1 ) > 0
4 /
OPERATOR_ID OPERATOR_NAME
------------------------------
1000084 Hilton Hotel
1001397 HILTON HOTEL AIRPORT
1001472 Hilton Hotel & Conference Ctr.
1004356 Hilton Hotel/Avendra Eugene
1005660 Hilton Hotel
SCOTT@orcl_11g> -- ordering by utl_match.edit_distance:
SCOTT@orcl_11g> select z.*, UTL_MATCH.EDIT_DISTANCE ('hilton hotel', LOWER (operator_name)) edit_dist
2 from operator z
3 where contains (concat,'hilton hotel within {the_operator_name}',1 ) > 0
4 ORDER BY edit_dist
5 /
OPERATOR_ID OPERATOR_NAME EDIT_DIST
------------------------------
1000084 Hilton Hotel 0
1005660 Hilton Hotel 0
1001397 HILTON HOTEL AIRPORT 8
1004356 Hilton Hotel/Avendra Eugene 15
1001472 Hilton Hotel & Conference Ctr. 18
SCOTT@orcl_11g> -- ordering by utl_match.edit_distance_similarity:
SCOTT@orcl_11g> select z.*, UTL_MATCH.EDIT_DISTANCE_SIMILARITY ('hilton hotel', LOWER (operator_name)) edit_dist_sim
2 from operator z
3 where contains (concat,'hilton hotel within {the_operator_name}',1 ) > 0
4 ORDER BY edit_dist_sim DESC
5 /
OPERATOR_ID OPERATOR_NAME EDIT_DIST_SIM
------------------------------
1000084 Hilton Hotel 100
1005660 Hilton Hotel 100
1001397 HILTON HOTEL AIRPORT 60
1004356 Hilton Hotel/Avendra Eugene 45
1001472 Hilton Hotel & Conference Ctr. 40
SCOTT@orcl_11g> -- ordering without selecting the column:
SCOTT@orcl_11g> select * from operator z
2 where contains (concat,'hilton hotel within {the_operator_name}',1 ) > 0
3 ORDER BY UTL_MATCH.EDIT_DISTANCE ('hilton hotel', LOWER (operator_name))
4 /
OPERATOR_ID OPERATOR_NAME
------------------------------
1000084 Hilton Hotel
1005660 Hilton Hotel
1001397 HILTON HOTEL AIRPORT
1004356 Hilton Hotel/Avendra Eugene
1001472 Hilton Hotel & Conference Ctr.
SCOTT@orcl_11g> select * from operator z
2 where contains (concat,'hilton hotel within {the_operator_name}',1 ) > 0
3 ORDER BY UTL_MATCH.EDIT_DISTANCE_SIMILARITY ('hilton hotel', LOWER (operator_name)) DESC
4 /
OPERATOR_ID OPERATOR_NAME
------------------------------
1000084 Hilton Hotel
1005660 Hilton Hotel
1001397 HILTON HOTEL AIRPORT
1004356 Hilton Hotel/Avendra Eugene
1001472 Hilton Hotel & Conference Ctr.
SCOTT@orcl_11g>
|
|
|
Posts:
2,829
Registered:
04/28/00
|
|
|
|
Re: Weighting the results of a text search
Posted:
Feb 4, 2008 9:44 AM
in response to: janelson@hormel...
|
|
|
I can't tell from what you provided, what type of datastore or section group you have. So, here is another example that uses a different type of datastore and different type of section group, but it really doesn't matter what you are using as the ordering by utl_match.edit_distance is the same.
SCOTT@orcl_11g> CREATE TABLE operator
2 (operator_id NUMBER,
3 operator_name VARCHAR2 (60),
4 concat VARCHAR2 (1))
5 /
Table created.
SCOTT@orcl_11g> SET DEFINE OFF SCAN OFF
SCOTT@orcl_11g> INSERT ALL
2 INTO operator VALUES (1000084, 'Hilton Hotel', NULL)
3 INTO operator VALUES (1001397, 'HILTON HOTEL AIRPORT', NULL)
4 INTO operator VALUES (1001472, 'Hilton Hotel & Conference Ctr.', NULL)
5 INTO operator VALUES (1004356, 'Hilton Hotel/Avendra Eugene', NULL)
6 INTO operator VALUES (1005660, 'Hilton Hotel', NULL)
7 INTO operator VALUES (9999999, 'some other operator name', NULL)
8 SELECT * FROM DUAL
9 /
6 rows created.
SCOTT@orcl_11g> CREATE OR REPLACE PROCEDURE concat_proc
2 (p_rowid IN ROWID,
3 p_clob IN OUT NOCOPY CLOB)
4 AS
5 BEGIN
6 FOR r IN (SELECT * FROM operator WHERE ROWID = p_rowid) LOOP
7 DBMS_LOB.WRITEAPPEND (p_clob, 17, '<the_operator_id>');
8 DBMS_LOB.WRITEAPPEND (p_clob, LENGTH (r.operator_id), r.operator_id);
9 DBMS_LOB.WRITEAPPEND (p_clob, 18, '</the_operator_id>');
10 DBMS_LOB.WRITEAPPEND (p_clob, 19, '<the_operator_name>');
11 DBMS_LOB.WRITEAPPEND (p_clob, LENGTH (r.operator_name), r.operator_name);
12 DBMS_LOB.WRITEAPPEND (p_clob, 20, '</the_operator_name>');
13 END LOOP;
14 END concat_proc;
15 /
Procedure created.
SCOTT@orcl_11g> SHOW ERRORS
No errors.
SCOTT@orcl_11g> EXEC CTX_DDL.CREATE_PREFERENCE ('your_datastore', 'USER_DATASTORE')
PL/SQL procedure successfully completed.
SCOTT@orcl_11g> EXEC CTX_DDL.SET_ATTRIBUTE ('your_datastore', 'PROCEDURE', 'concat_proc')
PL/SQL procedure successfully completed.
SCOTT@orcl_11g> CREATE INDEX your_index ON operator (concat)
2 INDEXTYPE IS CTXSYS.CONTEXT
3 PARAMETERS
4 ('DATASTORE your_datastore
5 SECTION GROUP CTXSYS.AUTO_SECTION_GROUP')
6 /
Index created.
SCOTT@orcl_11g> COLUMN concat NOPRINT
SCOTT@orcl_11g> COLUMN operator_name FORMAT A30
SCOTT@orcl_11g> -- original query:
SCOTT@orcl_11g> select * from operator z
2 where contains
3 (concat,'hilton hotel within {the_operator_name}',1 ) > 0
4 /
OPERATOR_ID OPERATOR_NAME
------------------------------
1000084 Hilton Hotel
1001397 HILTON HOTEL AIRPORT
1001472 Hilton Hotel & Conference Ctr.
1004356 Hilton Hotel/Avendra Eugene
1005660 Hilton Hotel
SCOTT@orcl_11g> -- ordering by utl_match.edit_distance:
SCOTT@orcl_11g> select z.*, UTL_MATCH.EDIT_DISTANCE ('hilton hotel', LOWER (operator_name)) edit_dist
2 from operator z
3 where contains (concat,'hilton hotel within {the_operator_name}',1 ) > 0
4 ORDER BY edit_dist
5 /
OPERATOR_ID OPERATOR_NAME EDIT_DIST
------------------------------
1000084 Hilton Hotel 0
1005660 Hilton Hotel 0
1001397 HILTON HOTEL AIRPORT 8
1004356 Hilton Hotel/Avendra Eugene 15
1001472 Hilton Hotel & Conference Ctr. 18
SCOTT@orcl_11g> -- ordering by utl_match.edit_distance_similarity:
SCOTT@orcl_11g> select z.*, UTL_MATCH.EDIT_DISTANCE_SIMILARITY ('hilton hotel', LOWER (operator_name)) edit_dist_sim
2 from operator z
3 where contains (concat,'hilton hotel within {the_operator_name}',1 ) > 0
4 ORDER BY edit_dist_sim DESC
5 /
OPERATOR_ID OPERATOR_NAME EDIT_DIST_SIM
------------------------------
1000084 Hilton Hotel 100
1005660 Hilton Hotel 100
1001397 HILTON HOTEL AIRPORT 60
1004356 Hilton Hotel/Avendra Eugene 45
1001472 Hilton Hotel & Conference Ctr. 40
SCOTT@orcl_11g> -- ordering without selecting the column:
SCOTT@orcl_11g> select * from operator z
2 where contains (concat,'hilton hotel within {the_operator_name}',1 ) > 0
3 ORDER BY UTL_MATCH.EDIT_DISTANCE ('hilton hotel', LOWER (operator_name))
4 /
OPERATOR_ID OPERATOR_NAME
------------------------------
1000084 Hilton Hotel
1005660 Hilton Hotel
1001397 HILTON HOTEL AIRPORT
1004356 Hilton Hotel/Avendra Eugene
1001472 Hilton Hotel & Conference Ctr.
SCOTT@orcl_11g> select * from operator z
2 where contains (concat,'hilton hotel within {the_operator_name}',1 ) > 0
3 ORDER BY UTL_MATCH.EDIT_DISTANCE_SIMILARITY ('hilton hotel', LOWER (operator_name)) DESC
4 /
OPERATOR_ID OPERATOR_NAME
------------------------------
1000084 Hilton Hotel
1005660 Hilton Hotel
1001397 HILTON HOTEL AIRPORT
1004356 Hilton Hotel/Avendra Eugene
1001472 Hilton Hotel & Conference Ctr.
SCOTT@orcl_11g>
|
|
|
Posts:
39
Registered:
09/24/02
|
|
|
|
Re: Weighting the results of a text search
Posted:
Feb 4, 2008 10:01 AM
in response to: janelson@hormel...
|
|
|
|
Barbara, that is exactly what I needed! Thank you so much.
|
|
|
|
Legend
|
|
Guru : 2500
- 1000000
pts
|
|
Expert : 1000
- 2499
pts
|
|
Pro : 500
- 999
pts
|
|
Journeyman : 200
- 499
pts
|
|
Newbie : 0
- 199
pts
|
|
Oracle ACE Director
|
|
Oracle ACE Member
|
|
Oracle Employee ACE
|
|
Helpful Answer
(5 pts)
|
|
Correct Answer
(10 pts)
|
|