I have started writing this post as a sidebar in comparing the parsers in my benchmarks. I will post what I know, and add more to it as I am informed by the community. Consider this a living post. Where something is just a fact, I list it as a Pro, such as language developed.
| Product |
Pros |
Cons |
| Tango PullParser (pull) |
- Written in the D programming language
- Tango devs are very aware of cost of allocation, and try to avoid it as often as possible.
- Extremely fast, extremely memory efficient
|
- Beta level code
- Interfaces may change, since Tango is not yet 1.0
- NOT W3C XML compliant (ignores DOCTYPE, etc)
|
| Tango SaxParser (SAX) |
- Written in the D programming language, on top of Tango’s PullParser.
- Straight port of Java SAX code, with a small amount of D flavor
- Useful for porting existing SAX-based code
|
- Beta level code
- Interfaces may change, since Tango is not yet 1.0
- As shown in the benchmarks, virtual calls (SAX does a lot of them) cost quite dearly
- NOT W3C XML compliant
|
| Tango Document (DOM) |
- Written in the D programming language, and a DOM-style tree of xml to manipulate
- Faster than all non-tree code tested so far
- Not DOM compliant
- Integrated query language, inspired by XPath
|
- Beta level code
- Interfaces may change, since Tango is not yet 1.0
- Not DOM compliant
- NOT W3C XMLcompliant
|
| Phobos std.xml (DOM) |
- Written in the D programming language
- Shipped in D 2.0’s standard library
- DOM-style tree object model
- Not DOM compliant
|
- Not DOM compliant
- Requires previous knowledge of the structure of the xml being parsed. Cannot parse arbitrary XML
- NOT W3C compliant
|
| RapidXml (DOM) |
- Written in C++, with ultimate performance in mind
- Highly configurable, use only the featureset you need.
- Not DOM compliant
|
- Not DOM compliant
- Not W3C XML compliant (ignores DOCTYPE)
|
| libxml2 (SAX) |
- Written in C
- extremely robust - passes all 1800 tests from the OASIS XML Tests Suite
|
|
| VTD-XML (DOM) |
- Written in Java, also availabe in C, C#
- Indexes the XML for super fast querying
- XPath Support
|
|
| Java SAX (SAX) |
|
|
| Java DOM (DOM) |
- Written in Java
- W3C DOM compliant
- W3C XML compliant
- XPath support
|
|
| Java StaX parsers (pull)(includes Aalto, Woodstox, and javolution) |
|
|
| DOM4J (DOM) |
- Written in Java
- XPath Support
|
|
Popularity: 76%
Aaron was kind enough to help me out with the RapidXml test. RapidXml is written in highly-tuned C++, and does give Tango a run for the money. I am really glad we are starting to add some non-Java alternatives, so we can see what native code can do. Without further ado, the code is bench_rapidxml.cpp, which was compiled via:
g++ bench_rapidxml.cpp -O2 -o bencn
Results for hamlet.xml:
stonecobra@jeff-home:~/xmlbench$ vi bench_rapidxml.cpp
stonecobra@jeff-home:~/xmlbench$ g++ bench_rapidxml.cpp -O2 -o bench
stonecobra@jeff-home:~/xmlbench$ ./bench
Document Length: 279628 bytes
Data Length: 279629 bytes
Fastest:313.362203 MB/s
Fastest:312.956579 MB/s
Fastest:313.055406 MB/s
Fastest:301.303166 MB/s
Fastest:310.668081 MB/s
Fastest:310.523743 MB/s
Fastest:310.924893 MB/s
Fastest:310.434819 MB/s
Fastest:310.868351 MB/s
Fastest:310.745189 MB/s
Default:172.539398 MB/s
Default:172.309405 MB/s
Default:172.501116 MB/s
Default:172.385035 MB/s
Default:172.386038 MB/s
Default:172.455936 MB/s
Default:172.498550 MB/s
Default:172.357293 MB/s
Default:172.331007 MB/s
Default:172.326775 MB/s
strlen:3543.806666 MB/s
strlen:3589.165483 MB/s
strlen:3590.035209 MB/s
strlen:3560.508898 MB/s
strlen:3587.427295 MB/s
strlen:3590.035209 MB/s
strlen:3573.965308 MB/s
strlen:3589.551976 MB/s
strlen:3590.276875 MB/s
strlen:3565.793459 MB/s
Average parsing speed: 310.48 MB/sec in fastest mode, 172.41 MB/sec in default mode.
Results for soap_mid.xml:
stonecobra@jeff-home:~/xmlbench$ vi bench_rapidxml.cpp
stonecobra@jeff-home:~/xmlbench$ g++ bench_rapidxml.cpp -O2 -o bench
stonecobra@jeff-home:~/xmlbench$ ./bench
Document Length: 134334 bytes
Data Length: 134335 bytes
Fastest:197.352607 MB/s
Fastest:197.097866 MB/s
Fastest:196.779684 MB/s
Fastest:197.276936 MB/s
Fastest:197.096047 MB/s
Fastest:188.870551 MB/s
Fastest:197.026330 MB/s
Fastest:197.164297 MB/s
Fastest:197.156408 MB/s
Fastest:196.966655 MB/s
Default:121.320212 MB/s
Default:121.256024 MB/s
Default:121.385734 MB/s
Default:121.286215 MB/s
Default:121.236746 MB/s
Default:121.340896 MB/s
Default:121.295172 MB/s
Default:121.264861 MB/s
Default:121.311711 MB/s
Default:121.360322 MB/s
strlen:3608.479264 MB/s
strlen:3586.658061 MB/s
strlen:3619.080745 MB/s
strlen:3613.568366 MB/s
strlen:3619.694270 MB/s
strlen:3615.812122 MB/s
strlen:3615.403959 MB/s
strlen:3609.495937 MB/s
strlen:3615.914177 MB/s
strlen:3612.651269 MB/s
Average parsing speed: 196.28 MB/sec in fastest mode, 121.31 MB/sec in default mode.
Popularity: 69%