[http://www.onjava.com/lpt/a/5554|http://www.onjava.com/lpt/a/5554]
Hi All,
I am trying to extract/parse the content of the following nested HTML Patient table using JDK 6.1 XPath Class without success:
<table border="0" cellpadding="0" cellspacing="0" width="782" id="main-content">
<tr>
<td valign="top" class="top">
<table border="0" cellpadding="0" cellspacing="0">
<tr>
<td valign="top" class="top">
<!-- un-delay results 14/10/2004 .................................. --->
<div class="greyBorder">
<table border="0" cellspacing="0" cellpadding="2" width="100%">
<tr>
<td class="propType"> </td>
<td class="propType"><b>Patient</b></td>
<td class="propType"><b>Firstname</b></td>
<td class="propType"><b>Surname</b></td>
<td class="propType" align="right"><b>Date of birth</b></td>
<td class="propType">Sex</td>
</tr>
<tr class="smallnarrow">
<td class="even" width="10" align="left"></td>
<td class="even" style="vertical-align: middle;">Clinic</td>
<td class="even" style="vertical-align: middle;">John</td>
<td class="even" style="vertical-align: middle;">Smith</td>
<td class="even" align="right" style="vertical-align: middle;">10/02/1940</td>
<td class="even" width="10" style="vertical-align: middle;">M</td>
</tr>
</table>
</div>
<div style="margin-top:10px;">
<br> <br>
<br>
</div>
<div align="center" style="margin-bottom: 20px;">
.........
</td></tr></table></td></tr></table>
Below is the content of XPathEvaluator.java used to extract/parse the above HTML file:
import javax.xml.xpath.*;
import java.io.*;
import org.w3c.dom.*;
import org.xml.sax.InputSource;
import org.apache.xpath.NodeSet;
public class XPathEvaluator{
public void evaluateDocument(File xmlDocument){
try
{
XPathFactory factory=XPathFactory.newInstance();
XPath xPath=factory.newXPath();
InputSource inputSource=new InputSource(new FileInputStream(xmlDocument));
XPathExpression xPathExpression=xPath.compile("/table[@id='main-content']");
String expression = "/table[@id='main-content']";
inputSource=new InputSource(new FileInputStream(xmlDocument));
NodeList shows = (NodeList) xPath.evaluate(expression, inputSource, XPathConstants.NODESET);
for (int i = 0; i < shows.getLength(); i++)
{
Element show = (Element) shows.item(i);
System.out.println("The value of show.getTagName(): " + show.getTagName());
System.out.println("The value of show.getTextContent(): " + show.getTextContent());
}
}
catch(IOException e){}
catch(XPathExpressionException e){}
}
public static void main(String[] argv)
{
XPathEvaluator evaluator=new XPathEvaluator();
File xmlDocument = new File("C:/Temp/HTMLTable.txt");
evaluator.evaluateDocument(xmlDocument);
}
}
This code has worked successfully on catalogue.xml from http://www.onjava.com/lpt/a/5554 tutorial but generated the following error when trying to parse the above HTML file:
*[Fatal Error] :1:78: White spaces are required between publicId and systemId.*
Am I using the wrong tool? I have used the htmlparser in the past but could not achieve the same objective.
Any suggestion would be appreciated.
Thanks,
Jack