Exception:java.io.UTFDataFormatException: invalid byte 2 of 2-byte UTF-8 se
843834Dec 25 2003 — edited Dec 25 2003While feeding the parser with the xml string returned , the present data found contains some invalid UTF-8 characters(e.g, "�"). This would amount to a "java.io.UTFDataFormatException" exception.
This is the code where I initialize the parser & feed the InputSource:
public class ISEGeneralSearchResultsParser extends DefaultHandler{
private static final String DEFAULT_PARSER_NAME = "org.apache.xerces.parsers.SAXParser";
private static final String VALIDATION = "http://xml.org/sax/features/validation";
private static final String NAMESPACE = "http://xml.org/sax/features/namespaces";
private static final String SCHEMA = "http://apache.org/xml/features/validation/schema";
protected static boolean cbSetValidation = false;
protected static boolean cbSetNameSpaces = false;
protected static boolean cbSetSchemaSupport = false;
public void toRecord(String asXML)
throws Exception
{
try
{
ByteArrayInputStream byteStream = new ByteArrayInputStream(asXML.getBytes());
String lsParserName = DEFAULT_PARSER_NAME;
try
{
lParser = (XMLReader)Class.forName(lsParserName).newInstance();
lParser.setFeature( VALIDATION, cbSetValidation);
lParser.setFeature( NAMESPACE, cbSetNameSpaces);
lParser.setFeature( SCHEMA, cbSetSchemaSupport);
lParser.setContentHandler((DefaultHandler)this);
lParser.setErrorHandler((DefaultHandler)this);
}
catch(Exception parserException)
{
throw new SNETException("ISE011");
}
/****/
lParser.parse(new InputSource((InputStream)byteStream));
}
catch (Exception xmlStringException)
{
throw new SNETException("ISE012");
}
}
}
My question how would I make the parser understand that the characters allowed could be also � for example?
Thanking you in anitcipation!!
Cheers