I apologize if this is not the correct forum to post a topic on Lucene, but I need some expert advice. I currently have Lucene integrated as our search API, and I'm looking for more accurate suggestions for the "did you mean" feature using the SpellChecker class. For instance, consider the following snippet:
String[] allKeywords = keyword.split( " " );
StringBuffer suggestedString = new StringBuffer();
for ( int i = 0; i < allKeywords.length; i++ ) {
String[] suggested = spellChecker.suggestSimilar( allKeywords[ i ], 1, indexReader, "DESCRIPTION", true ); // Get 1 suggestion per word.
if ( suggested.length == 0 ) {
suggestedString.append( allKeywords[ i ] ).append( " " ); // The word was spelled correctly.
} else {
suggestedString.append( suggested[ 0 ] ).append( " " ); // Add the suggested word instead.
}
}
results.add( results.size(), suggestedString.toString().trim() );
This suggests a similar word for each keyword in the search phrase (keyword variable), and then builds a new suggested phrase. If I change the boolean parameter on suggestSimilar to
false, then on a search of "car seet" I get a recommendation of "cap seat". Changing it to
true I get a recommendation of "can seat" because the word "can" is more popular than "car". Thinking a similar suggestion on the entire phrase instead of each individual word might correct the problem, I did the following:
String[] suggested = spellChecker.suggestSimilar( keyword, 1, indexReader, "DESCRIPTION", true );
results.add( results.size(), ( suggested == null ? null : suggested[0] ) );
Unfortunately, a search on "car seet" now returns a suggestion of "carpet". Am I missing something here? The search phrase "car seat" is extremely popular on our Web site, and therefore our index, so I can't understand why I can't get it to recommend that. Also, when the word "car" is spelled correctly, and it sees that it exists in our index, then why does it even make a suggestion? Any help is greatly appreciated.
If there is a better place to post this please let me know and I will do so.