Hi!
I have some problems with the performance of the XPath interface that was introduced in JDK 1.5. When evaluating XPath expressions for nodes within the document structure, the necessary amount of time required for evaluating even simple XPath expressions grows linearily with the position of that node inside the document.
I've build a very simple test case to illustrate this:
import java.util.Date;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathFactory;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.ParserConfigurationException;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
import org.w3c.dom.Node;
import org.w3c.dom.Element;
public class XPathTest {
public static void main (String[] args) throws Exception {
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance ();
DocumentBuilder builder = builderFactory.newDocumentBuilder ();
Document doc = builder.parse ("java-xpath-test.xml");
NodeList children = doc.getDocumentElement ().getChildNodes ();
XPath xpath = XPathFactory.newInstance ().newXPath ();
int length = children.getLength ();
long avg = 0;
for (int i = 0; i < length; i++) {
Node node = children.item (i);
long start = new Date ().getTime ();
for (int j = 0; j < 200; j++) {
String result = xpath.evaluate ("concat(name, ':', value)", node);
}
long end = new Date ().getTime ();
avg += (end - start);
if (i % 10 == 9) {
System.out.println ("Time: " + (avg / 10));
avg = 0;
}
}
}
}
You can find the 'java-xpath-test.xml' file I used here:
[http://www.christian-seiler.de/temp/java-xpath-test.xml]
Since the XPath expression here is really simple and takes very little time, I had to repeat the evaluation and calculate the average over several runs. But if you run my test program, you will see (apart from the first two measurements which are probably anomalies) that the required time grows the further the node's distance is from the start of the document. If I invert the loop (i.e. for (int i = length - 1; i >= 0; i--)) I see that the required time decreases gradually.
My example may be a little braindead but I encountered the problem while processing an XML document with a few MB of size with probably a few million nodes where for about 17000 of them an XPath expression is evaluated. The processing time for each section of that document increases drastically, making my code unbearably slow when trying to process the whole document.
Since my expression only accesses child nodes of the context element (and these are the same for all these elements) I would assume that the time required to evaluate the same expression but on a different would be more or less constant.
(For the record, I tried this code with JDK 1.6.0_04-b12 under x86 Linux)
Is there any reason for this phenomenon? If it isn't possible to circumvent with the javax.xml.xpath API, is there any other option?
Thanks,
Christian