Skip to Main Content

Java EE (Java Enterprise Edition) General Discussion

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Error parsing XML:Invalid byte 1 of 1-byte UTF-8 sequenc

843834Dec 5 2008 — edited Dec 5 2008
Hi All,


I am trying to parse an XML file using DOM parser, I have tried two different ways to parse it where one way is giving me the error :


java.io.UTFDataFormatException: Invalid byte 1 of 1-byte UTF-8 sequenc


But other is working fine.


My XML file does not contain any XML header (and I don't know in such cases whats the default encoding it uses).


Way1: (Which gives error):
org.apache.xerces.parsers.DOMParser parser=new org.apache.xerces.parsers.DOMParser();
org.xml.sax.InputSource isrc = new org.xml.sax.InputSource(in); // where in is the ByteArrayInputStream for XML
parser.parse(isrc);
document=parser.getDocument();
It gives me java.io.UTFDataFormatException: Invalid byte 1 of 1-byte UTF-8 sequenc


Way2: (working fine):
import org.w3c.dom.DOMImplementationSource;
import org.w3c.dom.DOMImplementationList;
import org.w3c.dom.DOMImplementation;
import java.io.InputStream;
import java.io.BufferedReader;
import java.io.InputStreamReader;


DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance();
	
DOMImplementationLS impl =  (DOMImplementationLS)registry.getDOMImplementation("LS");
	
LSParser builder = impl.createLSParser(DOMImplementationLS.MODE_SYNCHRONOUS, null);
builder.getDomConfig().setParameter("cdata-sections", false);
			
LSInput lsi = impl.createLSInput();
lsi.setByteStream(in); // where in is InputStream for XML
	
Document doc = builder.parse(lsi);
I have following libraries in my classpath:
xalan.jar
xercesImpl.jar
xmlParserAPIs.jar


I want to know whats wrong with the way one? I think it's giving error due to some character encoding mismatch. I am not getting it why its working fine with other way.

I don't have much idea about these XML APIs and I have get this second way of coding from net.

Can anyone please explain me what the basic difference in both the ways of XML parsing and which way is preffered one (esp. in cases where XML may contain UTF8 characters).
Comments
Locked Post
New comments cannot be posted to this locked post.
Post Details
Locked on Jan 2 2009
Added on Dec 5 2008
2 comments
241 views