Skip to Main Content

Java EE (Java Enterprise Edition) General Discussion

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

XML Parsing attributes with encoded ampersand causes wrong order

843834Mar 31 2008 — edited Apr 9 2008
Hi all,

I am writing in the forum first (because it could be that i am doing something wrong.... but i think it is a bug. Nonetheless, i thought i'd write my problem up here first.

I am using Java 6, and this has been reproduced on both windows and linux.
java version "1.6.0_03"

Problem:
read XML file into org.w3c.dom.Document.
XML File has some attributes which contain ampersand. These are escaped as (i think) is prescribed by the rule of XML. For example:
<?xml version="1.0" encoding="UTF-8"?>
	<lang>
		<text dna="8233" ro="chisturi de plex coroid (&gt;=1.5 mm)" it="Cisti del plesso corioideo(&gt;=1.5mm)" tr="Koro&#305;d pleksus kisti (&gt;=1.5 mm)" pt_br="Cisto do plexo cor&oacute;ide (&gt;=1,5 mm)" de="Choroidplexus Zyste (&gt;=1,5 mm)" el="&Kappa;&#973;&sigma;&tau;&epsilon;&iota;&sigmaf; &chi;&omicron;&rho;&omicron;&epsilon;&iota;&delta;&omicron;&#973;&sigmaf; &pi;&lambda;&#941;&gamma;&mu;&alpha;&tau;&omicron;&sigmaf; (&gt;= 1.5 mm)" zh_cn="&#33033;&#32476;&#33180;&#22218;&#32959;&#65288;&gt;= 1.5 mm&#65289;" pt="Quisto do plexo coroideu (&gt;=1,5 mm)" bg="&#1050;&#1080;&#1089;&#1090;&#1072; &#1085;&#1072; &#1093;&#1086;&#1088;&#1080;&#1086;&#1080;&#1076;&#1085;&#1080;&#1103; &#1087;&#1083;&#1077;&#1082;&#1089;&#1091;&#1089; (&gt;= 1.5 mm)" fr="Kystes du plexus choroide (&gt;= 1,5 mm)" en="Choroid plexus cysts (&gt;=1.5 mm)" ru="&#1082;&#1080;&#1089;&#1090;&#1099; &#1089;&#1086;&#1089;&#1091;&#1076;&#1080;&#1089;&#1090;&#1099;&#1093; &#1089;&#1087;&#1083;&#1077;&#1090;&#1077;&#1085;&#1080;&#1081; (&gt;=1.5 mm)" es="Quiste del plexo coroideo (&gt;=1.5 mm)" ja="&#33032;&#32097;&#33180;&#22178;&#32990;&#65288;&gt;=1.5mm&#65289;" nl="Plexus choroidus cyste (&gt;= 1,5 mm)" />
</lang>
As you might understand, we need to have the fixed text '>' for later processing. (not the greater than symbol '>' but the escaped version of it).
Therefore, I escape the ampersand (encode?) and leave the rest of the text as is. And so my > becomes >

All ok?

Symptom:

in fetching attributes, for example by the getAttribute("en") type call, the wrong attribute values are fetched.
Not only that, if i only read to Document instance, and write back to file, the attributes are shown mixed up.

eg:
dna: 8233, ro=chisturi de plex coroid (>=1.5 mm), en=&#1082;&#1080;&#1089;&#1090;&#1099; &#1089;&#1086;&#1089;&#1091;&#1076;&#1080;&#1089;&#1090;&#1099;&#1093; &#1089;&#1087;&#1083;&#1077;&#1090;&#1077;&#1085;&#1080;&#1081; (>=1, de=Choroidplexus Zyste (>=1,5 mm)
Here you can see that 'en' is shown holding what looks like greek, ... (what is ru as a country-code anyway?) where it should have obviously had the english text that originally was associated with the attribute 'en'

This seems very strange and unexpected to me. I would have thought that in escaping (encoding) the ampersand, i have fulfilled all requirements of me, and that should be that.
There is also no error that seems to occur.... we simply get the wrong order when fetching attributes.

Am I doing something wrong? Or is this a bug that should be submitted?

Kind Regards, and thanks to all responders/readers.

Sean

p.s. previously I had not been escaping the ampersand. This meant that I lost my ampersand in fetching attributes, AND the attribute order was ALSO WRONG!
In fact, the wrong order was what led me to read about how to correctly encode ampersand at all. I had been hoping that correctly encoding would fix the order problem, but it didn't.

Edited by: svaens on Mar 31, 2008 6:21 AM
Comments
Locked Post
New comments cannot be posted to this locked post.
Post Details
Locked on May 7 2008
Added on Mar 31 2008
10 comments
2,666 views