Accented characters(ã,ä) causing special character while wrting in JavaUTF8
923468Apr 18 2012 — edited Apr 18 2012Accented characters(ã,ä) in source XML are causing special characters(�) while wrting it in target xml using Java code with UTF8 encoding.
1. The source xml file is present with UTF-8 encoding.
2. While reading the file and writing it in UTF-8 format, some of the charaters (ã,ä) are not retained as expected, its written as '�'.
3. The reading and writing is performed by Java code.
4. The same Java code is working fine if the source and target xml are present in Windows server.
5. Java version used is Java 1.5.0
Source XML is like-
*<?xml version="1.0" encoding="UTF-8" ?>*
*<!DOCTYPE source_item SYSTEM "http://www.extranet.xyz.com/dtdi/promis/promis_313.dtd">*
*<source_item>*
*<record sequence_number="5386" creation_date="Fri Mar 30 11:48:09 2012" />*
*</PUB>*
*<PUB pubstyle="product" IDT="12345">*
*<PUBLDES>Gertrude Käsebier,Sebastião Salgado, Brazil</PUBLDES>*
*</PUB></source_item>*
On reading this file and writing it to another XML(with UTF-8 encoding), the words Käsebier, Sebastião are appearing as 'K�sebier,Sebasti�o'.
The java code used to read and write is as follows-
BufferedReader source = new BufferedReader(new InputStreamReader(new FileInputStream(sourceFile),"UTF-8"));
BufferedWriter target = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(targetFile),"UTF-8"));
This code works fine if the source and target xml are present in Windows. Whereas, if its present in the UNIX server its causing the above said issue.
Please provide your inputs in resolving this. Thanks!