Skip to Main Content

Java EE (Java Enterprise Edition) General Discussion

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Read only part of a document with Stax

843834Jul 8 2008 — edited Mar 11 2009
Hi,

I have some huge documents (~5GB) and I use Stax to read them.
My problem: I want to load only a part of the document.

I know the location that I should put the inputStream, so I skip half of the file.
Then I push data using xmlReader.hasNext(). After the first iteration though, I get the exception ->

javax.xml.stream.XMLStreamException: ParseError at [row,col]:[34,4]
Message: The markup in the document following the root element must be well-formed.


The original xml is like that:
<root>
 <element id=1>
 </element>
 <element id=2>
 </element>
 <element id=3>
 </element>
</root>
And I pass to the xmlStreamReader
 <element id=2>
 </element> 
 <element id=3>
 </element>
So, I know why I get it. Because I include in the input stream only a part.

When it tries to read the element with id=3 , it says not well formed document.
which on one hand is correct, but on the other hand not important for me.

any possible solutions? How to disable the check of xmlstream reader or I don't what.

no, I cannot wrap a part of a 5Gb file to something else...That's not the point. It will be to slow...
That why I want to skip so much data in first place, to make it quick.

The problem is so annoying and a little bit stupid.

A solution would be to write my own parser, instead of using the XMLStreamReader, but then again, this is stupid, dirty, and duplicate of efforts...


-------part of the code--------
FileInputStream inputStream = new FileInputStream(filename);
       
inputStream.skip(skipBytes);
xmlReader = xmlif.createXMLStreamReader(filename, inputStream);
        
        while (xmlReader.hasNext() && parsingComplete == false) {
            xmlReader.next();

            if (xmlReader.isStartElement()) {
                parseStartElement(xmlReader);
                continue;
            }
        }
Thanks for the help and any opinions.

Andreas
Comments
Locked Post
New comments cannot be posted to this locked post.
Post Details
Locked on Apr 8 2009
Added on Jul 8 2008
5 comments
501 views