## 1-Do you expect Oracle's UTF-8 to remain as CESU-8?
Do not mix UTF-8 with UTF8. UTF-8 is a term defined by Unicode. UTF8 is the character set name in Oracle.
Oracle's UTF8 will remain Unicode's CESU-8 with Unicode 3.0 repertiore of characters. It is not planned to change.
Oracle's AL32UTF8 is Unicode's UTF-8 and will be enhanced if the character repertoire of Unicode is enhanced (Oracle10gR2 uses the Unicode 4.0 repertoire).
## 2-Since we must support some 12 different languages and we want to do so
## in a single database UTF-8 is our only option, however, we must disseminate
## our content to various exchanges and so must we label our data as CESU-8
## or can we allow it to be auto-detected?
If you use AL32UTF8 as the database character set (recommended for all environments that use Oracle9i or newer software only), then you should mark the data as 'utf-8' (if we talk about MIME tags).
If you use UTF8 as the database character set (recommended only if 8.0 or 8i clients or databases exist in the environment), you should use either 'utf-8' or 'cesu-8'. If your database contains no surrogate pairs, which is usually the case, use 'utf-8'. If you have surrogates, then theoretically you should use 'cesu-8'. But, as your receiving applications may not recognize this MIME tag (it is not widely known), you may have to use 'utf-8' instead.
## 3-We assume that Oracle uses UTF-8 as it's database character set
## within it's own internal databases as well as within the Oracle applicaton
## suite. In those cases when content is disseminated is that content labeled
## CESU-8? What is Oracle's position.
As far as I know, we usually assume that there are no surrogates in the database and we use 'utf-8'. But strictly speaking, if database is UTF8 and not AL32UTF8, 'cesu-8' would be the correct tag. Unfortunately, many applications may be unable to recognize it.
Best regards,
Sergiusz