Now that I finally got the hang of JSON, guess what? I needed to do some XML parsing. Now, let me just say that we are very lucky nowadays. Everything you need to work with XML in Java comes built right in to the JDK. Back when I first started working with it, you had to find and download separate packages, and it was hard to find information. These days, XML is almost passé. And yet, here I am, needing to do a little old-fashioned XML parsing.
You’ll find everything you need to work with XML in the JDK somewhere under
javax.xml. Today, we’re talking about XML parsing, which, go figure, is found under
There are two major types of XML parsing. (There are others, but these came first and are still the biggies.) Document Object Model, or DOM, creates a full tree-like representation of the XML document in memory. The advantage of this technique is that you can quickly travel up and down the tree in any direction you like, as much as you like. The disadvantage is that it takes a lot of memory to create the model, and it will be tied up as long as you are using it. DOM is good for applications that are highly dependent on the XML structure and need to refer to it multiple times. The other type is Simple API for XML, or SAX. SAX eliminates the memory hogging of DOM, but at the expense of passing through the document only once. It is good for applictions where you can grab the information you need in a single pass.
To get a DOM of an XML document, you’ll need a
DocumentBuilder. And how do we get that, you ask? From a
DocumentBuilderFactory, of course!
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstnce();
DocumentBuilder builder = factory.newDocumentBuilder();
Once you have the builder, give it some XML data as a
InputStream, or URI:
Document xmlDoc = builder.parse(inputXml);
And now you can have some fun! Use the methods in
Document to travel through the XML, search it, and even modify it.
SAX is a different way of thinking about XML from DOM, but still very powerful in its own way. Once again, you’ll want to start by creating a factory:
SAXParserFactory factory = SAXParserFactory.newInstance();
…and a parser:
SAXParser parser = factory.newSAXParser();
Here’s where it gets a bit more complicated. SAX is a callback parser. That means that you have to write an entire class to give to the parser that contains methods to handle the XML as it comes, and the parser will call the methods on the class as it streams through the XML. There are two classes you can use for this, but one is deprecated, so you will only want to use the other one.
The XML parsing class is
org.xml.sax.helpers.DefaultHandler. This is a concrete class, so you could actually create and use an instance of it, but it wouldn’t be very helpful since the default implementation of the methods is to do nothing. But this is still handy for you because you can just not override any methods that you don’t care about. The methods you are most likely to use are:
startElement(String uri, String localName, String qName, Attributes attributes)
characters(char ch, int start, int length)
endElement(String uri, String localName, String qName)
endDocument methods are handy for any pre- or post-processing you need to do. The
startElement method lets you know about an element’s opening tag,
characters gives you any text between the tags, and
endElement tells you the element is closed. There are also methods that will tell you when a warning or error occurs.
With DOM and SAX, you can parse any valid XML data and put it into a form that’s more useful to you.