For what it's worth, here are some numbers comparing cElementTree and Java DOM (through Jython). (Not a reasonable comparison, of course, but is supposed to serve as a rough reference point.)
-
The usual
start = time.time(); do_stuff(); print time.time()-startmaneuver was used for timing. - According to pystone.py,
This machine benchmarks at 23201.4 pystones/second. It claims to have a 1.8GHz processor. - Python 2.3.3 was used.
- The source XML was a 400k file that repeats one similar structure [1] N times.
- As for why this machine is so damn slow although the pystone reports relatively big numbers, I have no idea. Maybe there's something weird with disk reading, dunno. Anyway, the relative speeds of the benchmark seem to match other people's benchmarks.
xml.dom.minidom 5.7s (Python 2.2) xml.dom.minidom 1.3s (Python 2.3) ElementTree 1.2 1.1s Java DOM 0.4s (Jython 2.1) cElementTree 0.06sSo, it's pretty clear that cElementTree beats this particular style of XML parsing hands down..
And yes, this machine seems to be particularly slow at it's job, consistently, as Fredrik wondered in the comments of my entry about the Jython/Java style XML parsing.
[1]
<entry> <host>64.172.22.154</host> <referer>-</referer> ... [ca. 10 elements like those above] </entry>
update: Oh yeah, and the numbers are definitely not totally consistent with my previous ad hoc benchmarks, but previously xml.dom.minidom probably suffered more because of its extravagant use of memory. Now that I tuned the size of the XML file down to 400k, the difference between xml.dom.minidom and Java DOM is much milder.
[permalink] [7 comments] 26.01.2005, 22:58
Filed under: python
- Comments:
Posted by Ian Bicking at 27.01.2005, 00:01
Out of curiosity, what would Java DOM without Jython look like? (I.e., if you translate the Python portion to Java)
Posted by Jarno Virtanen at 27.01.2005, 12:03
Yep, it's complete in the sense that running it with jython script.py works and parses the document with the stock Java parser.