Oracle text search - special characters issue
716687Aug 11 2009 — edited May 28 2013Hi.
I'm facing a real annoying problem with text search query, and everything I've tried failed...
I have a table with a varchar column indexed by text index. The column contains special characters like '&', ',' and mainly- '-'. Since I want to disregard these special characters for searches I have created a basic lexer of type skipjoins for the column index. So now, the phrase 'aaa-bbb something'. for example, can be searched without '-', like this: 'aaabbb'. But I want to make it possible for this phrase to be searched with and without '-'. So, that when the user enters 'aaabbb' he will get the same results as when he enters 'aaa-bbb'.
In other words, This condition:
WHERE CONTAINS(column, '<query> <textquery grammar="context"> <progression><seq>'
||'aaabbb'
||'</seq></progression> </textquery> </query> ' ,1)> 2
Will return the same results as this condition:
WHERE CONTAINS(r.POI_NAME, '<query> <textquery grammar="context"> <progression><seq>'
||'aaa-bbb'
||'</seq></progression> </textquery> </query> ' ,1)> 2
Since text query treats the '-' sign as a minus sign and searches for 'aaa' which doesn't contain 'bbb', the only way I found to fix this was to wrap the search text with {}. like this:
WHERE CONTAINS(r.POI_NAME, '<query> <textquery grammar="context"> <progression><seq>'
||'{aaa-bbb}'
||'</seq></progression> </textquery> </query> ' ,1)> 2
This all went very well, until I wanted to create a relaxation query. like this:
WHERE CONTAINS(r.POI_NAME, '<query> <textquery grammar="context"> <progression><seq>'
||'{aaab}'
||'</seq><seq>'
||'{aaab}'
||'%</seq></progression> </textquery> </query> ' ,1)> 2
In this case, I would expect the first part of the query to return no results (since it's not the whole word) but the second part, using '%' should have returned the record of 'aaa-bbb'. It doesn't. It will only return my result if I remove the '{}' for the second part. I can't do that, because the exact same search, when containing '-', will not return the expected results when I remove the braces (the sign is treated as minus sign):
WHERE CONTAINS(r.POI_NAME, '<query> <textquery grammar="context"> <progression><seq>'
||'{aaab}'
||'</seq><seq>'
||'aaa-b'
||'%</seq></progression> </textquery> </query> ' ,1)> 2
So I now have no solution. My question is- How can I create a query that will disregard the minus sign and treat it as a regular sign, but would still handle percentage sign as a special sign. So that I could run a query like the last example and will get the results of searching the phrase 'aaa-b%'?
In short, and to simplify my question, I'm looking for a way to escape all characters (not only the minus sign) except for a specific character. Kind of like 'unescaping' a specific character (the '%' sign) within braces {}. Or, another way would be to remove the space that is added to the phrase inside the braces at the end of the word, preventing me from adding "%" at the end of the word, outside the braces.
Thanks you,
Nili