Skip to Main Content

Java EE (Java Enterprise Edition) General Discussion

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Problems with transforming special characters

843834Jul 28 2002 — edited Jul 30 2002
Hi,

I develop a small educational application ( http://sourceforge.net/projects/pauker/ ). I work with JDK-1.4.0 on Mandrake Linux 8.2. At first I used serialized objects to save the lessons to a file. This worked well until I wanted to change some public members of the involved classes. That's why I switched over to the new and shiny XML. Now I have a different problem!
Pauker saves its lessons in gziped XML files. Users from all over the world can create lessons containing very different characters. There are European characters like ������� and asian characters. Loading this lessons on a system with a different encoding works fine. Saving such a lesson on a system with a different encoding can destroy the lesson.

Example:
On a german system a user creates a lesson with the letter � on a card side and saves it. A different user working on an english system loads this lesson. The character "�" is displayed correctly. The english user saves the lesson. The character "�" will be replaced by a question mark in the xml file. Next time the english user loads the lesson she will not see "�" but "?" on the display.

Here is a little example program that does the transformation in exactly the way Pauker does. Please test it out.

-----------------------------------
import java.io.*;

import javax.xml.parsers.*;
import javax.xml.transform.*;
import javax.xml.transform.dom.*;
import javax.xml.transform.stream.*;

import org.w3c.dom.*;

public class XMLTest {

public XMLTest() {
try {
// create document
DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder();
Document document = documentBuilder.newDocument();

// fill document
Element element = document.createElement("Element");
document.appendChild(element);
element.appendChild(document.createTextNode("�������"));

// transform to XML
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
//transformer.setOutputProperty(OutputKeys.ENCODING, "ISO-8859-1");
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");

DOMSource source = new DOMSource(document);
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();

StreamResult result = new StreamResult(outputStream);
transformer.transform(source, result);

System.out.println("original: �������");
System.out.println(outputStream.toString());

} catch (Exception e) {
e.printStackTrace();
}
System.out.println();
}

public static void main(String[] args) {
new XMLTest();
}
}

-----------------------------------

So what do I have to do to fix this problem? I have a lot of new features waiting in the CVS but this bug is still open and discussed. I dont want to release a new version with such a gaping hole in it...

Thanks a lot!

If you prefer to reply peronally, please use Ronny.Standtke at gmx.de
Comments
Locked Post
New comments cannot be posted to this locked post.
Post Details
Locked on Aug 27 2002
Added on Jul 28 2002
4 comments
657 views