Skip to Main Content

Java EE (Java Enterprise Edition) General Discussion

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Problem with parsing large XML files chunked over HTTP

user13784414Jan 5 2015 — edited Jan 19 2015

I'm trying to isolate a bug that was introduced when upgrading the JRE in use from Java 7u51 to 7u71 without changing any code. The problem appears to be very similar to: Bug ID: JDK-8027359 XML parser returns incorrect parsing results.

Further investigation showed that it was also introduced in the same versions (7u71) where that fix was applied. Unlike that bug though, my XML is marked as version 1.0. It also appears to be with only large XML files, on the order of 10MB or so.

The closest I've been able to narrow it down to is the code is using JAXB to unmarshall a stream that the debugger tells me is a org.apache.http.com.EofSensorInputStream / org.apache.http.impl.io.ChunkedInputStream. The exception I get is not consistent, but typically appears to be from chunks being overwritten or shuffled, resulting in letters appearing in attributes that are actually numbers, or like the following where an attribute "testAttribute" gets partially overwritten by the end of a timestamp that was in a different section of the XML.

javax.xml.bind.UnmarshalException

- with linked exception:

[javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,98748]

Message: Attribute name "testAttribu00Z" associated with an element type "testElement" must be followed by the ' = ' character.]

  at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.handleStreamException(UnmarshallerImpl.java:421)

  at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal0(UnmarshallerImpl.java:357)

  at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal(UnmarshallerImpl.java:334)

Caused by: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,98748]

Message: Attribute name "testAttribu00Z" associated with an element type "testElement" must be followed by the ' = ' character.

  at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:598)

  at com.sun.xml.internal.bind.v2.runtime.unmarshaller.StAXStreamConnector.bridge(StAXStreamConnector.java:181)

  at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal0(UnmarshallerImpl.java:355)

  ... 6 more

Here's some code that seems to reproduce it if you can connect to an XML server that returns a large chunked XML file:

  SchemeRegistry registry = new SchemeRegistry();

  registry.register(

                new Scheme("http", 80, PlainSocketFactory.getSocketFactory()));

  HttpClient client = new DefaultHttpClient(new BasicClientConnectionManager(registry));

  String url = "http://someUrlReturningAlargeChunkedXML";

  HttpGet method = new HttpGet(url);

  HttpResponse response = client.execute(method);

  InputStream inputStream = response.getEntity().getContent();

  XMLStreamReader responseReader = factory.createXMLStreamReader(inputStream);

  JAXBElement<JaxBObjectOfResponse> wot = unmarshaller.unmarshal(responseReader, JaxBObjectOfResponse.class);

If you connect using URL.openStream() to the same service there is no error. If I read bytes directly and write to a file, there is no error. The error only happens when I try to unmarshal it, and it's large, and I'm using Java 7u71 (or later). It can be consistently repeated with the jsp webapp that I'm using, but didn't show the error when I used the same code with a Wikipedia dump XML file.

How can I unmarshal in a different way to avoid this problem? Or, how can I better isolate the bug so it can be posted to the appropriate bug system?

This post has been answered by user13784414 on Jan 19 2015
Jump to Answer
Comments
Locked Post
New comments cannot be posted to this locked post.
Post Details
Locked on Feb 16 2015
Added on Jan 5 2015
2 comments
3,865 views