Mixed Language Identification in oracle
SK KMar 23 2012 — edited Mar 26 2012Hi All,
Oracle DB version: 10.2.0.1.0
I have customer table which stores customer name (column name is custname) in either English language or French language. If the table has 100 rows, then 50 rows have customer name in french language and 50 customer name in English language. But there is no possibility that customer name stored in both (English and french ) languages.
Only one column, "custname" can have either french or English name. All other columns stored in English language.
there is no other columns in table that can have french language ( All other columns are stored in English language only)
When I say column value is stored in French, I actually mean it is stored in French language script.
Problem started when I had to separate french language and English language records into separate tables. i.e. French name records should occupy in table CUSM_FR and English name records should occupy table CUSM_EN. There is no way to distinguish whether given record is in french or English language based on any other columns in table. Only language script makes sense that given record is French language.
We have column 'lang' which is suppose to store language code, but it stores only NULL values.
So I have decided to identify ASCII value of the name. If the ASCII value falls between 65 to 90 or 97 to 122 then it is English name, otherwise it is French name.
I have written below query:
SELECT *
FROM customers
WHERE (ascii(SUBSTR(custname, 1, 1)) BETWEEN 65 AND 90)
OR (ascii(SUBSTR(custname, 1, 1)) BETWEEN 97 AND 122)
This will gives rise to records with English names. If I exclude the above range of ASCII values, I will get Non- English characters, i.e French language in my case.
My question is, Is there any other way of identifying records (Any other logic) which gives good performance than the above logic.
Thanks in advance..
Regards,
SK