Oracle error when parsing XML file with HTML escape characters
495997Dec 14 2011 — edited Dec 15 2011Hi,
In Oracle 10g (10.2.x) and 11g (11.2.x), I receive the following error when trying to load an XML documents off a staging table:
ERROR:
ORA-31011: XML parsing failed
ORA-19202: Error occurred in XML processing
LPX-00217: invalid character 8211 (U+2013)
Error at line 1
ORA-06512: at "SYS.XMLTYPE", line 301
ORA-06512: at line 1
It seems Oracle is having an issue parsing HTML escape characters. The above error is easy to recreate with a simple example:
ADAVEY@easdev>select xmltype('<comment>Some text then & #8211; more text</comment>')
2 from dual;
So as it turns out Oracle is choking on the "& #8211;" string. NOTE that I have added a space between the & and # symbols so that this forum can display the string properly.
Is there any setting or flag that can be enabled before trying to load the document via the xmltype() function such that Oracle treats this (and similar) values as a literal? We don't have control over the contents of the files that we are receiving as they come from multiple vendors. I have validated the XML file with other parsers and none of them report any error with this data. So I'm not sure how to work around this issue with Oracle's XML parser.
Thanks,
Alan