Parsing XML in Jython with Java SDK's libraries
In somewhat similar fashion, it's easy to parse an XML document to a DOM tree in Jython with Java SDK's libraries:
from java.io import FileInputStream from javax.xml.transform.stream import StreamSource from javax.xml.transform.stream import StreamResultThere's no specific need for this, because you can do DOM stuff in recent Python implementations too, but with my naive benchmarks the above Jython snippet clocked at roughly 2 seconds with a 3 megabyte XML file and a Python 2.3 minidom parsing [1] clocked at 26 seconds. (I am not saying that the stock Python 2.3 minidom parser is representative for Python XML parsers, but to give some reference for the speed.)from javax.xml.parsers import DocumentBuilderFactory
factory = DocumentBuilderFactory.newInstance() builder = factory.newDocumentBuilder()
input = FileInputStream("myfile.xml") document = builder.parse(input) document.getDocumentElement() # etc.
I am not, of course, suggesting that you should switch to Jython, but I wish I did demonstrate the usefulness of Jython as a scripting tool for various XML tasks.
[1]
from xml.dom.minidom import parse
dom1 = parse('myfile.xml')
[permalink] [4 comments] 01.09.2004, 22:58
Filed under: python
- Comments:
Posted by Paul Boddie at 02.09.2004, 17:30
What about using xml.dom.javadom? Does that affect the performance in any way? I'm assuming it still works with more recent Java XML toolkits.
Posted by Jarno Virtanen at 02.09.2004, 22:23
Hmm. Need to check that out.
Posted by Jarno Virtanen at 06.09.2004, 17:25
I'm definitely _not_ sure that my timings are correct. :-)
I'll try to rethink the timings as soon as I have the time and the energy. Don't hold your breath waiting.