I created a benchmark similar to the one that VTD-XML uses. Basically, since most xml processing is mutation, this benchmark parses an input xml file, executes various xpaths on the file, modifying the document in 2 instances, and then serializes the new document. The steps are listed below:
- Parse blog.xml, preparing to query the resulting document
- Perform the following xpath queries, or their equivalents, once each:
- count(//*) (10390 for this document)
- //item (a list of those 10390 items)
- /blog/item (similar to the previous, except you know the path)
- //text() (all text nodes)
- count(//item)
- count(/blog/item)
- /blog/item[@num=’a781′]
- /blog/item/body/p/a
- Mutate the document by removing the resulting nodes from the last 2 queries (performed inline with the queries)
- serialize the modified document back out
I created this benchmark for 4 products (the ones that have xpath or xpath-like support, if you know of another one, please submit me some code, and I will be happy to run and aggregate the results):
- DOM4J - TransformDOM4J.java
- Java6 DOM - TransformDOM.java
- VTD-XML - TransformVTD.java
- Tango Document - transformtango.d
After the run, I take the average cycle time, and turn that into the followin graph showing cycles per second. blog.xml is 1.3MB, so you can multiply these numbers by 1.3 to get the Megabytes per second number for each tool.

Some notes of the implementations:
- Tango, while not actually having an actual xpath parser, has the requisite power in its query language to be able to pull this off with aplomb
- You will note that the VTD code does NOT delete the /blog/item[@num=’a781′] node, because its XMLModifier is unable to perform deletes inside a delete. If someone knows how to fix this, please let me know
Would also note that these benchmarks were run on an Intel Q6700 quad core machine at 2.66 GHz, with 4GB of RAM, running Ubunu Linux.
Popularity: 76%
March 15th, 2008 at 3:51 pm e
Interesting results but a few questions:
1. do you run the test using server JVM?
2. have you considered precompile XPath? instead of compile them again and again in the loop?
3. What do you mean by delete in a delete? It doesn’t make sense to me…
if you do all those things, I will be shocked that VTD will under perform tango D because tango D needs to repetitively serialize and parse, while VTD-XML is incremental…
Cheers,
Jimmy Zhang
March 15th, 2008 at 6:19 pm e
Jimmy,
1. I did use the server vm, yes.
2. Can you post a diff against my VTD example? I know you know the VTD API better, so just show me what to change, and I will change it, no problem.
3. If you look at the example, look at the commented out remove call. uncomment and run that, and you will get an exception.
As for your last comment, I specifically force a parse each time, rather than an index load, because I am trying to compare a real-world xml appliance sort of scenario, where you see many different documents once. On the serialize front, Tango is actually as fast serializing from scratch as it is keeping a cache of the input to spit back out.
March 19th, 2008 at 4:48 pm e
I updated the vtd test to do the first delete, output the result, re-parse, and then do the final deletes, and throughput drops to 30.3 operations per second.
It is worth noting that since VTD-XML indexes everything by position, an edit that adds/removes data requires a re-parse, because all the indexes are now invalid.
Even with that overhead, VTD is head and shoulders above DOM and DOM4J.