Skip to Main Content

Java EE (Java Enterprise Edition) General Discussion

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

SAX vs. Stax vs. XOM

arshadnoorNov 14 2010
I have a need to use an XML parsing framework for processing some very large-sized documents. What is unusual about the documents are that, while the number of elements within them are few - perhaps less than a dozen - one of the elements can have content that can be many gigabytes large and constitutes 99.99% of the XML document. To add to the complexity, the large element's contents are not only encrypted, but Base64-encoded AND appended to a 16-byte byte-array that is just Base64-encoded. This is the XML Encryption (XENC) standard in case someone is wondering what this is: http://www.w3.org/TR/xmlenc-core/. (While the standard does allow for having an external file that is referenced from inside the XML document, it is desirable to try to keep this within the single XML document, if possible).

While small XENC documents (100M or less) work well with DOM or Stax, very large documents pose some problems. The stream of data must pass through two filters: a Base64InputStream + CipherInputStream (to decode and decrypt content, respectively), and a DigestInputStream (to verify its message-digest), and must also ensure that the first 16-bytes of the stream (Initialization Vector) are only Base64-decoded and captured separately from the remaining content in the element for decryption.

In researching the web and this forum, I've found that there are three - perhaps more? - different ways to approach this: using SAX, Stax or XOM. Can I request expert opinon on what others' experiences have been when trying to balance the following goals?

- Keep memory requirements low
- Read very large files where one element's content makes up more than 99.99% of the file-size
- Parse content from a single element's content and push it up through multiple input streams for processing
- Keep the programming effort to a reasonably simple level by leveraging a parsing framework that meets the required goals.

Thanks, in advance, for your suggestions
Comments
Locked Post
New comments cannot be posted to this locked post.
Post Details
Locked on Dec 12 2010
Added on Nov 14 2010
0 comments
810 views