Problems with transforming special characters
843834Jul 28 2002 — edited Jul 30 2002Hi,
I develop a small educational application ( http://sourceforge.net/projects/pauker/ ). I work with JDK-1.4.0 on Mandrake Linux 8.2. At first I used serialized objects to save the lessons to a file. This worked well until I wanted to change some public members of the involved classes. That's why I switched over to the new and shiny XML. Now I have a different problem!
Pauker saves its lessons in gziped XML files. Users from all over the world can create lessons containing very different characters. There are European characters like ������� and asian characters. Loading this lessons on a system with a different encoding works fine. Saving such a lesson on a system with a different encoding can destroy the lesson.
Example:
On a german system a user creates a lesson with the letter � on a card side and saves it. A different user working on an english system loads this lesson. The character "�" is displayed correctly. The english user saves the lesson. The character "�" will be replaced by a question mark in the xml file. Next time the english user loads the lesson she will not see "�" but "?" on the display.
Here is a little example program that does the transformation in exactly the way Pauker does. Please test it out.
-----------------------------------
import java.io.*;
import javax.xml.parsers.*;
import javax.xml.transform.*;
import javax.xml.transform.dom.*;
import javax.xml.transform.stream.*;
import org.w3c.dom.*;
public class XMLTest {
public XMLTest() {
try {
// create document
DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder();
Document document = documentBuilder.newDocument();
// fill document
Element element = document.createElement("Element");
document.appendChild(element);
element.appendChild(document.createTextNode("�������"));
// transform to XML
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
//transformer.setOutputProperty(OutputKeys.ENCODING, "ISO-8859-1");
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
DOMSource source = new DOMSource(document);
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
StreamResult result = new StreamResult(outputStream);
transformer.transform(source, result);
System.out.println("original: �������");
System.out.println(outputStream.toString());
} catch (Exception e) {
e.printStackTrace();
}
System.out.println();
}
public static void main(String[] args) {
new XMLTest();
}
}
-----------------------------------
So what do I have to do to fix this problem? I have a lot of new features waiting in the CVS but this bug is still open and discussed. I dont want to release a new version with such a gaping hole in it...
Thanks a lot!
If you prefer to reply peronally, please use Ronny.Standtke at gmx.de