0% found this document useful (0 votes)
60 views16 pages

Parsing XML With SAX, DOM & JDOM: Hicham Qaissi

The document describes three different approaches to parsing XML documents in Java: SAX, DOM, and JDOM. It uses a sample XML file containing book data to demonstrate how each approach would search for a given book by ISBN and return its details. SAX is an event-based API where classes implement interfaces to handle XML parsing events. DOM builds an in-memory tree of the XML document. JDOM provides simplified DOM-like access without building the entire document tree in memory. The document contains code examples for searching the sample XML file using each of the three parsing methods.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views16 pages

Parsing XML With SAX, DOM & JDOM: Hicham Qaissi

The document describes three different approaches to parsing XML documents in Java: SAX, DOM, and JDOM. It uses a sample XML file containing book data to demonstrate how each approach would search for a given book by ISBN and return its details. SAX is an event-based API where classes implement interfaces to handle XML parsing events. DOM builds an in-memory tree of the XML document. JDOM provides simplified DOM-like access without building the entire document tree in memory. The document contains code examples for searching the sample XML file using each of the three parsing methods.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Parsing XML with SAX, DOM & JDOM Hicham Qaissi

[email protected]

Parsing XML with SAX, DOM & JDOM

Contents
0. 1. 2. 3. 4. 5. What is an XML parser? ............................................................................................ 3 Describing the example to develop........................................................................... 3 SAX............................................................................................................................. 6 DOM ........................................................................................................................ 11 JDOM....................................................................................................................... 14 Conclusion ............................................................................................................... 16

Parsing XML with SAX, DOM & JDOM

0. What is an XML parser?


The XML parsers bring us the possibility of analyzing and composing of the XML documents. Analyzing the XML data and structure, we can make some objects in some languages programming (Java in our case). Also we can make the inverse process, in other words, make a XML document from some data objects (See Fig. 1). In this manual, I analyze with examples three kinds, SAX, DOM & JDOM.

1. Describing the example to develop


The example that I make is entertained. This is the same for the entire three API (SAX, DOM and JDOM). The example consists in analyzing a XML document that contains information about some books (ISBN code (isbn is an attribute), Name, Author name, Price, Editorial). The program expects a book code (ISBN), and searches this book into the XML. If the book exists, all its information are printed by the standard output, in other case, we print a message notifying that the book doesnt exist in the XML. Are you finding it as amusing as I do? Lets go!!!

Parsing XML with SAX, DOM & JDOM

The xml example (books.xml) is the following: <books> <book isbn="0000000001"> <name>Book 1</name> <author>Author name 1</author> <price>12.54</price> <editorial>Editorial 1</editorial> </book> <book isbn="0000000002"> <name>Book 2</name> <author>Author name 2</author> <price>58.25</price> <editorial>Editorial 2</editorial> </book> <book isbn="0000000003"> <name>Book 3</name> <author>Author name 3</author> <price>29.45</price> <editorial>Editorial 3</editorial> </book> <book isbn="0000000004"> <name>Book 4</name> <author>Author name 4</author> <price>78.95</price> <editorial>Editorial 4</editorial> </book> <book isbn="0000000005"> <name>PBook 5</name> <author>Author name 5</author> <price>61.25</price> <editorial>Editorial 5</editorial> </book> </books>

Parsing XML with SAX, DOM & JDOM

For all parsers (SAX, DOM & JDOM), I use this DTO (Data Transfer Object):
public class MyBook { private private private private private String String String String String isbn; name; author; price; editorial;

public String getIsbn() { return isbn; } public void setIsbn(String isbn) { this.isbn = isbn; } public String getName() { return name; } public void setName(String name) { this.name = name; } public String getAuthor() { return author; } public void setAuthor(String author) { this.author = author; } public String getPrice() { return price; } public void setPrice(String price) { this.price = price; } public String getEditorial() { return editorial; } public void setEditorial(String editorial) { this.editorial = editorial; } }

Parsing XML with SAX, DOM & JDOM

2. SAX
SAX (Simple API for XML), it Works by events and associated methods. As the parser is reading the document XML and finds the components (the events) of the document (elements, attributes, values, etc) or it detects errors, is invoking to the methods that the programmer has associated. You can find more information about SAX on

www.saxproject.org. First, be sure that youve included the sax jar in the classpath (The jar file can be downloaded http://sourceforge.net/projects/sax/files/). We must instantiate the reader. This reader implements the XMLReaders interface, we can obtain it from the abstract class SAXParser. I obtain SAXParser from the SAXParserFactory. The method parse of XMLReader analyses the xml document:
import java.io.IOException; import org.xml.sax.SAXException; import javax.xml.parsers.ParserConfigurationException; import javax.xml.parsers.SAXParser; import javax.xml.parsers.SAXParserFactory; import org.xml.sax.XMLReader; public class MySAXSeracher{ public static void main(String[] args) { try { SAXParserFactory factory = SAXParserFactory.newInstance(); factory.setNamespaceAware( true ); factory.setValidating( true ); SAXParser saxParser = factory.newSAXParser(); XMLReader xr = saxParser.getXMLReader(); xr.parse( args[0] ); } catch ( IOException ioe ) { System.out.println( "Error: " + ioe.getMessage() ); } catch ( SAXException saxe ){ System.out.println( "Error: " + saxe.getMessage() ); } catch ( ParserConfigurationException pce ){ System.out.println( "Error: " + pce.getMessage() ); } } }

If the program compiles, it means that java and the jar file are ok. Nevertheless, the program doesnt do anything because we havent been interested on any event at the moment. Its important to catch the exceptions

java.io.IOException,
and

org.xml.sax.SAXException javax.xml.parsers.ParserConfigurationException.

Parsing XML with SAX, DOM & JDOM

To

manipulate

the

events,

org.xml.sax.helpers.DefaultHandler.

our main class must extends DefaultHandler implements the following

interfaces: org.xml.sax.ContentHandler: events about data (The most extended) org.xml.sax.ErrorHandler: events about errors org.xml.sax.DTDhandler: DTDs treatment org.xml.sax.EntityResolver: foreign entities We can make our own classes implementing ContentHandler and ErrorHandler to treat the event which we are interested in: Data: implementing ContentHandler and associate it to the reader (parser) with the method setContenthandler(). Errors: implementing ErrorHandler and associate it to the reader (parser) with the method setErrorHandler(). The most important methods in the interface ContentHandler (implemented by DefaultHandler which is extended by our class MySAXSearcher) are: startDocument():Receive notification of the beginning of a document. endDocument(): Receive notification of the end of a document. startElement():Receive notification of the beginning of an element endElement():Receive notification of the end of an element. characters():Receive notification of character data. See more about ContentHandler on http://download.oracle.com/javase/1.4.2/docs/api/org/xml/sax/ContentHandler.html.

Now, MySAXSearcher is the following (Ive made my own ContentHandler and ErrorHandler, its much more clean than overriding the ContentHandler and ErrorHandler interesting methods in our class that extends DefaultHandler):

Parsing XML with SAX, DOM & JDOM

MySAXSearcher.java:
import java.io.IOException; import javax.xml.parsers.ParserConfigurationException; import javax.xml.parsers.SAXParser; import javax.xml.parsers.SAXParserFactory; import org.xml.sax.SAXException; import org.xml.sax.XMLReader; import org.xml.sax.helpers.DefaultHandler;

public class MySAXSearcher extends DefaultHandler{ public static void main(String[] args) { MySAXSearcher searcher = new MySAXSearcher(); searcher.searchBook(args[0], args[1]); } private void searchBook(String xml, String isbn){ try { SAXParserFactory factory = SAXParserFactory.newInstance(); factory.setNamespaceAware( true ); factory.setValidating( true ); SAXParser saxParser = factory.newSAXParser(); XMLReader xr = saxParser.getXMLReader(); // Assigning my own ContentHandler at my XMLReader. MyContentHandler ch = new MyContentHandler(); ch.isbnSearched = isbn; xr.setContentHandler( ch ); // Assigning my own ErrorHandler at my XMLReader. xr.setErrorHandler( new MyOwnErrorHandler() ); xr.setFeature( "http://xml.org/sax/features/validation", false); xr.setFeature( "http://xml.org/sax/features/namespaces", true); long before = System.currentTimeMillis(); xr.parse( xml ); long after = System.currentTimeMillis(); printResult (xml, ch, after - before); } catch ( IOException ioe ) { System.out.println( "Error: " + ioe.getMessage() ); } catch ( SAXException saxe ){ System.out.println( "Error: " + saxe.getMessage() ); } catch ( ParserConfigurationException pce ){ System.out.println( "Error: " + pce.getMessage() ); } } public void printResult(String xml, MyContentHandler ch, long time){ System.out.println("Document " + xml + ". Parsed in : " + time + " ms"); if (ch.book != null){ System.out.println("Book found:"); System.out.println(" Isbn: " + ch.book.getIsbn()); System.out.println(" Name: " + ch.book.getName()); + ch.book.getAuthor()); System.out.println(" Author: " System.out.println(" Price: " + ch.book.getPrice()); System.out.println(" Editorial: " + ch.book.getEditorial());

Parsing XML with SAX, DOM & JDOM

} else { System.out.println("Book not found"); } } }

MyContentHandler.java:
import import import import org.xml.sax.Attributes; org.xml.sax.ContentHandler; org.xml.sax.Locator; org.xml.sax.SAXException;

public class MyContentHandler implements ContentHandler { boolean isBookFound = false; String isbnSearched = ""; String currentNode = ""; MyBook book = null; // Overrided public void startDocument() throws SAXException { System.out.println("***Start document***"); } // Overrided public void endDocument() throws SAXException { System.out.println("***End document***"); } // Overrided public void startElement(String uri, String local, String raw, Attributes attrs) { currentNode = local; if ("book".equals(local) && !isBookFound){ // The book node only has an attribute (isbn) if ("isbn".equals(attrs.getLocalName(0)) && isbnSearched.equals(attrs.getValue(0))){ isBookFound = true; book = new MyBook(); book.setIsbn(isbnSearched); } } } // Overrided public void characters(char ch[], int start, int length) { String value = ""; // I get the text value for (int i = start; i < start + length; i++) { value+= Character.toString(ch [i]); } if (!"".equals(value.trim()) && isBookFound){ if("name".equals(currentNode)){ book.setName(value.trim()); } else if ("author".equals(currentNode)){ book.setAuthor(value.trim()); } else if ("price".equals(currentNode)){ book.setPrice(value.trim()); } else if ("editorial".equals(currentNode)){ book.setEditorial(value.trim()); isBookFound = false; } }

Parsing XML with SAX, DOM & JDOM

} // Overrided public void endElement(String arg0, String arg1, String arg2) throws SAXException { } // Overrided public void endPrefixMapping(String arg0) throws SAXException { } // Overrided public void ignorableWhitespace(char[] arg0, int arg1, int arg2) throws SAXException { } // Overrided public void processingInstruction(String arg0, String arg1) throws SAXException { } // Overrided public void setDocumentLocator(Locator arg0) { } // Overrided public void skippedEntity(String arg0) throws SAXException { } // Overrided public void startPrefixMapping(String arg0, String arg1) throws SAXException { } }

MyErrorHandler.java:
import org.xml.sax.ErrorHandler; import org.xml.sax.SAXException; import org.xml.sax.SAXParseException; public class MyErrorHandler implements ErrorHandler { // Overrided public void warning(SAXParseException ex) { System.err.println("[Warning] : "+ ex.getMessage()); } // Overrided public void error(SAXParseException ex) { System.err.println("[Error] : "+ex.getMessage()); } // Overrided public void fatalError(SAXParseException ex) throws SAXException { System.err.println("[Error!] : "+ex.getMessage()); } }

With our xml (books.xml), and the book code to search 0000000003, we can executed our program with: java MySAXSearcher books.xml 0000000003

Parsing XML with SAX, DOM & JDOM


10

The result must be the following:


***Start document*** ***End document*** Document books.xml Parsed in: 141ms Book found: Isbn: 0000000003 Name: Book 3 Author: Author name 3 Price: 29.45 Editorial: Editorial 3

3. DOM
DOM (Document Object Model), while SAX offers access at all elements of document, DOM brings the parsing as a tree that can be parsed and transformed. DOM has some disadvantages and advantages with regards to SAX: Disadvantage: The data can be acceded only when the entire document is parsed. The tree is an object loaded on the memory; this is problematic for big and complex documents. Advantages: With DOM we can manipulate (update, delete and add elements) the xml document. Also, we can create a new xml document.

To manipulate an xml document, we must instantiate a Document (interface) object that implements the Document interface (extends the interface Node). We use the classes javax.xml.parsers.DocumentBuilder and javax.xml.parsers.DocumentBuilderFactory, we invoke the method parse() to obtain a Document object. For manipulate an XML with DOM, there are some important classes: org.w3c.dom.Document (interface representing the entire XML document),

org.w3c.dom.Element (Elements in the XML document), org.w3c.dom.Node (node that has some elements) and org.w3c.dom.Att (The attributes of every element). Ok, now lets talk in java code language. As DTO (Data Transfer Object), I use the same object MyBook.

Parsing XML with SAX, DOM & JDOM


11

MyDOMSearcher.java:
import java.io.File; import java.io.IOException; import javax.xml.parsers.DocumentBuilder; import javax.xml.parsers.DocumentBuilderFactory; import javax.xml.parsers.ParserConfigurationException; import import import import org.w3c.dom.Document; org.w3c.dom.Node; org.w3c.dom.NodeList; org.xml.sax.SAXException;

public class MyDOMSearcher { public static void main(String[] args) { MyDOMSearcher searcher = new MyDOMSearcher(); searcher.searchBook(args[0], args[1]); } private void searchBook(String xml, String isbn) { long before = System.currentTimeMillis(); MyBook book = null; try{ DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); factory.setNamespaceAware(true); factory.setValidating(true); DocumentBuilder parser = factory.newDocumentBuilder(); // I assign my own ErrorHandler to my Parser parser.setErrorHandler(new MyErrorHandler()); File file = new File(xml); Document doc = parser.parse (file); // I obtain all the elements <book> // NodeList is an interface that has 2 methods: // 1. item(int): returns the Node (Interface) Object of the position int. // 2. getLength(): returns the length of the List NodeList booksNodes = doc.getElementsByTagName("book"); NodeList bookChildsNodes = null; String isbnAttribute = ""; for(int i = 0; i < booksNodes.getLength(); i++) { Node node = booksNodes.item(i); if(node != null && node.hasAttributes()) { isbnAttribute = node.getAttributes().getNamedItem("isbn").getNodeValue(); if(isbnAttribute.equals(isbn)){ //I've caught the isbn searched if(book == null){ book = new MyBook(); book.setIsbn(isbn); } if(node.hasChildNodes()){ bookChildsNodes = node.getChildNodes(); for (int j = 0; j < bookChildsNodes.getLength(); j++) { if("name".equals(bookChildsNodes.item(j).getNodeName())){ book.setName(bookChildsNodes.item(j).getTextContent());

Parsing XML with SAX, DOM & JDOM


12

}else if("author".equals(bookChildsNodes.item(j).getNodeName())){ book.setAuthor(bookChildsNodes.item(j).getTextContent()); }else if("price".equals(bookChildsNodes.item(j).getNodeName())){ book.setPrice(bookChildsNodes.item(j).getTextContent()); }else if("editorial".equals(bookChildsNodes.item(j).getNodeName())){ book.setEditorial(bookChildsNodes.item(j).getTextContent()); // I've found my book. Ending the for iteration break; } } } } } } }catch(IOException ioe){ System.err.println("[Error] : "+ioe.getMessage()); }catch(ParserConfigurationException pce){ System.err.println("[Error] : "+pce.getMessage()); }catch(SAXException se){ System.err.println("[Error] : "+se.getMessage()); } long after = System.currentTimeMillis(); printResults(xml, book, after - before); } public void printResults(String xml, MyBook book, long time) { System.out.println("Document " + xml + ". Parsed in : " + time + " ms"); if (book != null){ System.out.println("Book found:"); System.out.println(" Isbn: " + book.getIsbn()); System.out.println(" Name: " + book.getName()); System.out.println(" Author: " + book.getAuthor()); System.out.println(" Price: " + book.getPrice()); System.out.println(" Editorial: " + book.getEditorial()); }else{ System.out.println("Book not found"); } } }

Parsing XML with SAX, DOM & JDOM


13

4. JDOM
All the precedents APIs are available for many programming languages, but their use is laborious in Java. A specific API has been made for java (JDOM), that API uses the own capacities and features of Java, therefore, using it make the XMlL parsing easily. We can find some related information on www.jdom.org. Now, lets make the same example (searching a book in our XML) with JDOM (be sure that the jar is installed in your classpath, you can download it on http://www.jdom.org/dist/binary/).

MyJDOMSearcher.java:
import java.io.IOException; import java.util.Iterator; import java.util.List; import import import import org.jdom.Document; org.jdom.Element; org.jdom.JDOMException; org.jdom.input.SAXBuilder;

public class MyJDOMSearcher { private String isbn; private MyBook book; private boolean noSearchMore = false; public static void main(String[] args) { try { long before = System.currentTimeMillis(); MyJDOMSearcher searcher = new MyJDOMSearcher(); // The second parameter is the isbn to search searcher.isbn = args[1]; SAXBuilder saxBuilder = new SAXBuilder(); Document document = saxBuilder.build(args[0]); searcher.searchBook(document.getRootElement()); long after = System.currentTimeMillis(); searcher.printResults(args[0], after-before); } catch (JDOMException jde){ System.err.println("[Error] JDOMException: "+jde.getMessage()); } catch (IOException ioe){ System.err.println("[Error] IOException: "+ioe.getMessage()); } } private void searchBook(Element element){ inspect(element); List content = element.getContent(); Iterator iterator = content.iterator(); Element child = null; Object object = null;

Parsing XML with SAX, DOM & JDOM


14

while(iterator.hasNext()){ // All times we have "books" node object = iterator.next(); if(object instanceof Element){ child = ((Element)object); //Casting from Object to Element searchBook(child); } } } // Recursively descend the tree public void inspect(Element element) { if (!noSearchMore){ // If I've had the book yet, I'll do anything if("book".equals(element.getQualifiedName()) & book == null){ if(isbn.equals(element.getAttribute("isbn").getValue())){ book = new MyBook(); book.setIsbn(isbn); } } if(book != null){ if("name".equals(element.getQualifiedName())){ book.setName(element.getValue()); } if("author".equals(element.getQualifiedName())){ book.setAuthor(element.getValue()); } if("price".equals(element.getQualifiedName())){ book.setPrice(element.getValue()); } if("editorial".equals(element.getQualifiedName())){ book.setEditorial(element.getValue()); noSearchMore = true; } } } } private void printResults(String xml, long time) { System.out.println("Document " + xml + ". Parsed in : " + time + " ms"); if (book != null){ System.out.println("Book found:"); System.out.println(" Isbn: " + book.getIsbn()); System.out.println(" Name: " + book.getName()); System.out.println(" Author: " + book.getAuthor()); System.out.println(" Price: " + book.getPrice()); System.out.println(" Editorial: " + book.getEditorial()); } else { System.out.println("Book not found"); } } }

Parsing XML with SAX, DOM & JDOM


15

5. Conclusion
Executing the same example with the three APIs (MySAXSearcher, MyDOMSearcher and MyJDOMSearcher) having us parameters received the same xml file and the isbn to search ("0000000003"), the result (in time) obtained is the following: MySAXSearcher
93 ms

MyDOMSearcher
750 ms

MyJDOMSearcher
609 ms

The SAX API is faster than DOM and JDOM (But its laborious).

Parsing XML with SAX, DOM & JDOM


16

You might also like