I have code that is to highlight search terms found within a given string.
I searched within a given string "Java, C++, C, etc." for the term "C++" and have really strange results.
Below is a snippet.
// Now the highlighting portion
/** The temporary start tag, just some uncommon Unicode character. */
final String START_TAG = Character.toString( (char)0x0141 );
/** The temporary end tag. */
final String END_TAG = Character.toString( (char)0x0142 );
String highlighted = result.toString();
for (int m = 0; m < foundPhrases.size(); m++) {
String phrase = foundPhrases.get(m);
String pattern = "[^" + START_TAG + "](" + phrase + ")";
Pattern firstMatchedPattern = Pattern.compile( pattern, Pattern.CASE_INSENSITIVE );
Matcher matcher = firstMatchedPattern.matcher( highlighted );
while( matcher.find( ) ) {
String foundPhrase = matcher.group( 1 );
StringBuffer hilitebuffer = new StringBuffer( START_TAG );
hilitebuffer.append( foundPhrase ).append( END_TAG );
System.out.println("hilitebuffer = " + hilitebuffer.toString());
//highlighted = matcher.replaceAll(hilitebuffer.toString());
highlighted = highlighted.replaceAll( foundPhrase, Matcher.quoteReplacement(hilitebuffer.toString( ))); also cannot handle +
System.out.println("highlighted= " + highlighted);
}
highlighted = highlighted.replaceAll( START_TAG,
CoreServices.highlightTagStart );
highlighted = highlighted.replaceAll( END_TAG, CoreServices.highlightTagEnd );
}
System.out.println( "highlightMatchingPhrases.return: " + highlighted );
return highlighted;
Using String's replaceAll to assign highlighted, I get:
hilitebuffer = ?C++?
highlighted= Java, ?C++?++, ?C++?, etc.
highlightMatchingPhrases.return: Java, <b style="color:black;background-color:#ffff66">C++</b>++, <b style="color:black;background-color:#ffff66">C++</b>, etc. Note the extra ++ and C++
Using Matcher's replaceAll, I get:
hilitebuffer = ?C++?
highlighted= Java,?C++?, C, etc.
highlightMatchingPhrases.return: Java,<b style="color:black;background-color:#ffff66">C++</b>, C, etc.
Problem here is very subtle, the space preceeding the text C++ has been replaced as well. So, if the text string was "rollerblading" and the search term was "blading" the resulting highlighted text would be rolleblading, missing the r. Not good.
Reproducible every time.
Ideas?
Thanks!
Ginni