Skip to Main Content

Java EE (Java Enterprise Edition) General Discussion

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Parsing XML string into DOM with Unicode entities fails on Linux only

843834Jan 31 2008 — edited Feb 1 2008
Hi all

I have a web controller receiving XML as a HTTP request parameter. The XML will look something like this:

<?xml version="1.0" encoding="ISO-8859-1"?>
<delta><invoice id="2112"><htmlfooter>&lt;P&gt;Aucune vaccination n&#8217;est exig�e. en cas d'interdiction d'entr�e</htmlfooter></invoice></delta>

This string is parsed into a DOM using
DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(new InputSource(new StringReader(xml)))
On my local Windows developement machine, this runs fine. On our Linux-Server however, the unicode entity &#8217; becomes a '?'. As the XML contains various Latin1 characters which are parsed correctly, I guess the XML encoding itself is ok.
I have debug statement first writing out the plain string, which look good. Then I have debug statement writing the getTextContent() of the XML nodes, there the n&#8217;est becomes n?est.

Any ideas where to look into?
Thanks for your help
Simon
Comments
Locked Post
New comments cannot be posted to this locked post.
Post Details
Locked on Feb 29 2008
Added on Jan 31 2008
42 comments
2,312 views