Hi, I am reading in an xml file, then write the file to output. The problem is that the input file has a strange single quote character [ *�* ] - lets call it single quoate A, which is different from [ *'* ] , that is the key next to the [ ; ] key on an English keyboard - lets call it single quate B. And in fact there isnt a key to input single quote A, I guess the appearance of single quote A is due to encoding.
If I open the input xml file in browser, it works ok and displays the single quote A.
Once I read in the xml file into memory, by debugging I can still see that single quote A is corrected encoded;
However once I rewrite the same content to output, the single quote A character is changed, and if i open the file in browser, it says 'invalid character' because single quote A were changed when written to output and cannot be rendered.
Both input and output xml are using UTF-8 encoding. How can I solve this problem please?
The xml file looks:
<?xml version="1.0" encoding="UTF-8" ?>
<content>....1980�s (Peacock and Williams, 1986; Keay, 1984)</content>
My code for reading
String _xquery ="//content/text()";
Document _xmlDoc= DocumentBuilderFactory.newInstance().newDocumentBuilder().parse("myxml.xml");
XPath _xpath = XPathFactory.newInstance().newXPath();
NodeList nodes = (NodeList) _xpath.compile(query).evaluate(_xmlDoc, XPathConstants.NODESET);
List<String> res = new ArrayList<String>(nodes.getLength());
for (int i = 0; i < nodes.getLength(); i++) {
res.add(nodes.item(i).getNodeValue());
}
String valueToOuput=res.toString() //this is the value to be output to xml, it shoud look like "[....1980�s (Peacock and Williams, 1986; Keay, 1984)]"
my code for writing xml
Element root=new Element("root");;
Element content= new Element("output-content")
content.setText(valueToOutput);
root.addContent(content);
PrintWriter writer = new PrintWriter(new FileWriter(f));
new XMLOutputter().output(domDocument, writer);
writer.close();