Sunday
9
Mar 2008

XML Benchmarks - pros and cons of each library

(7:04 pm) Tags: [Software, Projects, D Programming Language]

I have started writing this post as a sidebar in comparing the parsers in my benchmarks. I will post what I know, and add more to it as I am informed by the community. Consider this a living post. Where something is just a fact, I list it as a Pro, such as language developed.

Product Pros Cons
Tango PullParser (pull)
  • Written in the D programming language
  • Tango devs are very aware of cost of allocation, and try to avoid it as often as possible.
  • Extremely fast, extremely memory efficient
  • Beta level code
  • Interfaces may change, since Tango is not yet 1.0
  • NOT W3C XML compliant (ignores DOCTYPE, etc)
Tango SaxParser (SAX)
  • Written in the D programming language, on top of Tango’s PullParser.
  • Straight port of Java SAX code, with a small amount of D flavor
  • Useful for porting existing SAX-based code
  • Beta level code
  • Interfaces may change, since Tango is not yet 1.0
  • As shown in the benchmarks, virtual calls (SAX does a lot of them) cost quite dearly
  • NOT W3C XML compliant
Tango Document (DOM)
  • Written in the D programming language, and a DOM-style tree of xml to manipulate
  • Faster than all non-tree code tested so far
  • Not DOM compliant
  • Integrated query language, inspired by XPath
  • Beta level code
  • Interfaces may change, since Tango is not yet 1.0
  • Not DOM compliant
  • NOT W3C XMLcompliant
Phobos std.xml (DOM)
  • Written in the D programming language
  • Shipped in D 2.0’s standard library
  • DOM-style tree object model
  • Not DOM compliant
  • Not DOM compliant
  • Requires previous knowledge of the structure of the xml being parsed. Cannot parse arbitrary XML
  • NOT W3C compliant
RapidXml (DOM)
  • Written in C++, with ultimate performance in mind
  • Highly configurable, use only the featureset you need.
  • Not DOM compliant
  • Not DOM compliant
  • Not W3C XML compliant (ignores DOCTYPE)
libxml2 (SAX)
  • Written in C
  • extremely robust - passes all 1800 tests from the OASIS XML Tests Suite
VTD-XML (DOM)
  • Written in Java, also availabe in C, C#
  • Indexes the XML for super fast querying
  • XPath Support
Java SAX (SAX)
  • Written in Java
Java DOM (DOM)
  • Written in Java
  • W3C DOM compliant
  • W3C XML compliant
  • XPath support
Java StaX parsers (pull)(includes Aalto, Woodstox, and javolution)
  • Written in Java
DOM4J (DOM)
  • Written in Java
  • XPath Support

Popularity: 76%

Comments: Comments Off

XML Benchmarks - RapidXml

(6:56 pm) Tags: [Software, Projects]

Aaron was kind enough to help me out with the RapidXml test. RapidXml is written in highly-tuned C++, and does give Tango a run for the money. I am really glad we are starting to add some non-Java alternatives, so we can see what native code can do. Without further ado, the code is bench_rapidxml.cpp, which was compiled via:

g++ bench_rapidxml.cpp -O2 -o bencn

Results for hamlet.xml:

stonecobra@jeff-home:~/xmlbench$ vi bench_rapidxml.cpp
stonecobra@jeff-home:~/xmlbench$ g++ bench_rapidxml.cpp -O2 -o bench
stonecobra@jeff-home:~/xmlbench$ ./bench
Document Length: 279628 bytes
Data Length: 279629 bytes
Fastest:313.362203 MB/s
Fastest:312.956579 MB/s
Fastest:313.055406 MB/s
Fastest:301.303166 MB/s
Fastest:310.668081 MB/s
Fastest:310.523743 MB/s
Fastest:310.924893 MB/s
Fastest:310.434819 MB/s
Fastest:310.868351 MB/s
Fastest:310.745189 MB/s
Default:172.539398 MB/s
Default:172.309405 MB/s
Default:172.501116 MB/s
Default:172.385035 MB/s
Default:172.386038 MB/s
Default:172.455936 MB/s
Default:172.498550 MB/s
Default:172.357293 MB/s
Default:172.331007 MB/s
Default:172.326775 MB/s
strlen:3543.806666 MB/s
strlen:3589.165483 MB/s
strlen:3590.035209 MB/s
strlen:3560.508898 MB/s
strlen:3587.427295 MB/s
strlen:3590.035209 MB/s
strlen:3573.965308 MB/s
strlen:3589.551976 MB/s
strlen:3590.276875 MB/s
strlen:3565.793459 MB/s

Average parsing speed: 310.48 MB/sec in fastest mode, 172.41 MB/sec in default mode.

Results for soap_mid.xml:

stonecobra@jeff-home:~/xmlbench$ vi bench_rapidxml.cpp
stonecobra@jeff-home:~/xmlbench$ g++ bench_rapidxml.cpp -O2 -o bench
stonecobra@jeff-home:~/xmlbench$ ./bench
Document Length: 134334 bytes
Data Length: 134335 bytes
Fastest:197.352607 MB/s
Fastest:197.097866 MB/s
Fastest:196.779684 MB/s
Fastest:197.276936 MB/s
Fastest:197.096047 MB/s
Fastest:188.870551 MB/s
Fastest:197.026330 MB/s
Fastest:197.164297 MB/s
Fastest:197.156408 MB/s
Fastest:196.966655 MB/s
Default:121.320212 MB/s
Default:121.256024 MB/s
Default:121.385734 MB/s
Default:121.286215 MB/s
Default:121.236746 MB/s
Default:121.340896 MB/s
Default:121.295172 MB/s
Default:121.264861 MB/s
Default:121.311711 MB/s
Default:121.360322 MB/s
strlen:3608.479264 MB/s
strlen:3586.658061 MB/s
strlen:3619.080745 MB/s
strlen:3613.568366 MB/s
strlen:3619.694270 MB/s
strlen:3615.812122 MB/s
strlen:3615.403959 MB/s
strlen:3609.495937 MB/s
strlen:3615.914177 MB/s
strlen:3612.651269 MB/s

Average parsing speed: 196.28 MB/sec in fastest mode, 121.31 MB/sec in default mode.

Popularity: 69%

Comments: (3)