Skip to Main Content

Java Programming

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Getting more accurate suggestions for Lucene's SpellChecker

807591Mar 12 2008 — edited Mar 12 2008
I apologize if this is not the correct forum to post a topic on Lucene, but I need some expert advice. I currently have Lucene integrated as our search API, and I'm looking for more accurate suggestions for the "did you mean" feature using the SpellChecker class. For instance, consider the following snippet:
String[] allKeywords = keyword.split( " " );
StringBuffer suggestedString = new StringBuffer();

for ( int i = 0; i < allKeywords.length; i++ ) {
   String[] suggested = spellChecker.suggestSimilar( allKeywords[ i ], 1, indexReader, "DESCRIPTION", true ); // Get 1 suggestion per word.
   if ( suggested.length == 0 ) {
      suggestedString.append( allKeywords[ i ] ).append( " " ); // The word was spelled correctly.
   } else {
      suggestedString.append( suggested[ 0 ] ).append( " " ); // Add the suggested word instead.
   }
}
results.add( results.size(), suggestedString.toString().trim() );
This suggests a similar word for each keyword in the search phrase (keyword variable), and then builds a new suggested phrase. If I change the boolean parameter on suggestSimilar to false, then on a search of "car seet" I get a recommendation of "cap seat". Changing it to true I get a recommendation of "can seat" because the word "can" is more popular than "car". Thinking a similar suggestion on the entire phrase instead of each individual word might correct the problem, I did the following:
String[] suggested = spellChecker.suggestSimilar( keyword, 1, indexReader, "DESCRIPTION", true );
results.add( results.size(), ( suggested == null ? null : suggested[0] ) );
Unfortunately, a search on "car seet" now returns a suggestion of "carpet". Am I missing something here? The search phrase "car seat" is extremely popular on our Web site, and therefore our index, so I can't understand why I can't get it to recommend that. Also, when the word "car" is spelled correctly, and it sees that it exists in our index, then why does it even make a suggestion? Any help is greatly appreciated.

If there is a better place to post this please let me know and I will do so.
Comments
Locked Post
New comments cannot be posted to this locked post.
Post Details
Locked on Apr 9 2008
Added on Mar 12 2008
4 comments
440 views