problem with encoding of xml document
843834May 7 2005 — edited Dec 31 2007while parsing an xml document with SAX parser, i found that encoding of the xml document received as input stream is "ISO-8859-1" . After parsing certain fields has to be stored in the mysql table where table character set is "utf8" . Now what i found that ceratin characters in the original XML document are stored as question mark (?) in the database.
1. I am using mysql 4.1.7 with system variable character_set_database as "utf8". So all my tables have charset as "utf8".
2. I am parsing some xml file as inputsream using SAX parser api (org.apache.xerces.parsers.SAXParser ) with encoding "iso-8859-1". After parsing certain fields have to be stored in mysql database.
3. Some XML files contain a "iso-8859-1" character with character code 146 which appears like apostrophes but actually it is : - � and the problem is that words like can�t are shown as can?t by database.
4. I notiicied that parsing is going on well and character code is 146 while parsing. But when i reterive it from the database using jdbc it shows character code as 63.
5. I am using jdbc to prepared statement to insert parsed xml in the database. It seems that while inserting some problem occurs what is this i don't know.
6. I tried to convert iso-8859-1 to utf-8 before storing into database, by using
utfString = new String(isoString.getBytes("ISO-8859-1"),"UTF-8");
But still when i retreive it from the databse it shows caharcter code as 63.
7. I also tried to retrieve it using , description = new String(rs.getBytes(1),"UTF-8");
But it also shows that description contains character with code 63 instead of 146 and it is also showing can�t as can?t
help me out where is the problem in parsing or while storing and retreiving from database. Sorry for any spelling mistakes if any.