Search logic strategies
Using Oracle 9.2.0.8 on AIX, Oracle text. .Net front-end
I have a very demanding client, requiring accurate (high relevance) results. The want to offer a simple search interface to end users (ala Google), but leverage more advanced logic in the back end. We deal with French and English content only... mostly indexing CLOBs and varchar2s.
The dilemma I am faced with, is the end user isn't familiar with Oracle advanced search syntax, but still want specific, relevant results... The advanced search screen offers 3 text boxes, "All words", Exact Phrase" and "Any words". There is also a "part of word" checkbox, when enabled, tells stored proc to do a wildcard match on each keyword.
Based on which text box is chosen, the stored proc injects the proper boolean operators, and wraps each term with a FUZZY operator (70 score, weighted)...
The current search implementation, according to them, sucks.
I have been tasked with improving the search results... so far I have implemented the following:
- base_letter conversion
- stemming (index stems)
- fuzzy match enabled
- skipjoins for hyphens (to handle words like post-secondary)
- index_themes, prove_themes (about queries)
- index substrings (for wildcard matching)
I have also created a user_datastore to combine 3 columns (one varchar2, and two clobs) which were previously indexed separately, and ordered by scores on the varchar2 and one clob column.
I have also substituted the FUZZY operator searches for "ABOUT" queries using the built-in knowledge base.
With the above changes, they find results have improved, but I think more can be done.
The stored proc does strip out any special characters (used by Oracle Text to drive logic) found in the query terms, and injects it's own operators and boolean logic based on which text box was used to submit the query.
Is there a better way to do this? It seems to be a delicate balance between keeping the search interface simple, while still leveraging advanced search features on the back-end.
Some scenarios I have been playing wiith:
- wrap each query term with an ABOUT, OR'ing or AND'ing each term, depending if user entered terms in "any words" or "all words". Problem here is that ABOUT ignores special characters... even a hyphen that is part of a word, even though I escape it using "\" or "{}"... base don query feedback, words like "post-secondary" will be searched as:
CRT Search for ALL Words,1,0,ACCUMULATE,,,1
CRT Search for ALL Words,2,1,ABOUT,,posting,1
CRT Search for ALL Words,3,1,ABOUT,,secondariness,2
How does Oracle handle search? Asktom?
Thanks for the tips and advice
Stephane