Posted by Jim at November 11th, 2004

I’m using Java and XML in my master’s project. Though I’ve used XML outside of Java, I’ve never used both in the same project. Thus I’m dealing with the associated learning curve.

Why am I using XML in this project? I’m still answering that question myself to a degree. The main reason is that XML is made to be used on a variety of computers and there are now many tools written in many different languages for using it. I could have created my own text format for it. Text is the ultimate in portability. Of course, XML is also text and if I had to create my own text format, would it be better than XML? Probably not.

The best time to use XML in my view is when your data is likely to be shared among various programs written in hard to predict languages. If you know what program/language people will be using to view your data, you’ll likely be able to find a better option.

Where am I using XML in my project? The best possible spot, I think. The project is a collaborative tool for editing web pages. I’m using xhtml (a version of XML based on html). This will allow me to either just store text in the xhtml (later to be parsed by an unknown program on the server) or to allow people to assign templates to the page and publish it directly.

How does Java parse XML? Basically a person has three options.

DOM: Java’s built-in class that implements the w3c’s suggestion for how parse XML. Essentially, it sucks the document into memory and reprents it as a tree. You find stuff in the tree (at least in the examples I’ve seen) by calling a function that recursively goes through the document. This works fine with many documents, but not too well with multiple gigabyte documents.

SAX: Rather than bring huge files into memory, the SAX parser searches through the entire file and retrieves only the part you request. I’m told that you have to write a bit more code to implement this one, but I haven’t yet seen this from experience.

JDOM: The first two parsers were meant as implementations of outside standards. Hence, they aren’t very “java-ish.” JDOM is a third party implementation of DOM that is supposedly easier to use than either DOM or SAX. I haven’t tried it yet, but I’m getting tempted.