Fredrik asked about my suspicious timings, and I'm a bit puzzled too, now. Well, anyway, with XML (stolen from some David Mertz article that benchmarked XML parsers) like this:
times 3000 (which equals around 1 megabyte), the following code prints out a time of around 3 seconds:64.172.22.154 - - 19/Aug/2001:01:46:01 -0500 GET / HTTP/1.1 200 2131
from xml.dom.minidom import parse
import time
start_time = time.time()
dom1 = parse('myfile2.xml')
print time.time() - start_time
(And similarly, this parsing with elementree (w. Python parser) prints out a time of around 2.5 seconds:
import time
from elementtree import ElementTree
t1 = time.time()
tree = ElementTree.parse("myfile2.xml")
print time.time()-t1
)
The similar parsing with Jython and Java's libraries takes around half a second, by the way. The difference 5-6 times to pure Python parsers versus difference of over 10 times in the previous timing. The previous Python minidom timing suffered from memory consumption and swapping, I think.
Filed under: python