0% found this document useful (0 votes)
55 views

XML Parsers: Types of Parsers Using XML Parsers SAX DOM DOM Versus SAX Products Conclusion

This document discusses XML parsers and provides an overview of different types of parsers including validating versus non-validating, DOM, and SAX parsers. It explains that DOM parsers build a tree of the entire XML document in memory, while SAX parsers are event-based and read-only. SAX parsers are generally better for large documents or data streams while DOM is better if random access or modifications are needed. Popular parser products and the advantages and disadvantages of DOM and SAX are also summarized.

Uploaded by

Muthumanikandan
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views

XML Parsers: Types of Parsers Using XML Parsers SAX DOM DOM Versus SAX Products Conclusion

This document discusses XML parsers and provides an overview of different types of parsers including validating versus non-validating, DOM, and SAX parsers. It explains that DOM parsers build a tree of the entire XML document in memory, while SAX parsers are event-based and read-only. SAX parsers are generally better for large documents or data streams while DOM is better if random access or modifications are needed. Popular parser products and the advantages and disadvantages of DOM and SAX are also summarized.

Uploaded by

Muthumanikandan
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 20

XML Parsers

Overview
Types of parsers

Using XML parsers

SAX

DOM

DOM versus SAX

Products

Conclusion
Types of Parsers
There are several different ways to categorise
parsers:
Validating versus non-validating parsers
Parsers that support the Document Object Model
(DOM)
Parsers that support the Simple API for XML (SAX)
Parsers written in a particular language (Java, C++,
Perl, etc.)
Non-validating Parsers
Speed and efficiency
- It takes a significant amount of effort for an XML
parser to process a DTD and make sure that
every element in an XML document follows the
rules of the DTD.

If only want to find tags and extract


information - use non-validating
Using XML Parsers
Three basic steps to use an XML parser
Create a parser object
Pass your XML document to the parser
Process the results

Generally, writing out XML is outside scope of


parsers (though some may implement proprietary
mechanisms)
Parsing XML
Two established API's:

SAX (Simple API for XML)


Define handlers containing methods as XML
parsed

DOM (Document Object Model)


Defines a logical tree representing the parsed
XML
Parsing XML: DOM
Document Object Model
standard API for accessing and creating XML data
tree-based
programming language indepedent
developed by W3C
whole document is read into memory
read and write
Creating a DOM Tree
A DOM implementation will have a method to pass a
XML file to a factory object that will return a
Document object that represents root element of
whole document

After this, may use DOM standard interface to


interact with XML structure

A
P Application
I
Parsing XML: DOM

XML File DOM Tree


DOM Interfaces
The DOM defines several interfaces

Node The base data type of the DOM


Element Represents element
Attr Represents an attribute of an element
Text The content of an element or attribute
Document Represents the entire XML document.
A Document object is often referred to
as a DOM tree
DOM Level
DOM Level 1
- basic functionality for document navigation and
manipulation.

DOM Level 2
- includes a style sheet object model
- defines an event model and provides support for
XML namespaces.

DOM Level 3
- still under development
- addresses document loading and saving
- content model (DTDs and schemas) with document validation
support.
Parsing XML: SAX
Simple API for XML
API for accessing xml data
event based
programming language indepedent
application has to store fragments into memory
read only
Parsing XML: SAX
SAX is an interface to the XML parser based on
streaming and call-backs
You need to implement the HandlerBase interface :
startDocument, endDocument
startElement, endElement
characters
warning, error, fatalError
Parsing XML: SAX

XML File SAX calls


SAX versus DOM
DOM:
read and write
need to move back and forth in data
document is human created

SAX:
read only
huge data or streams
data is machine generated
DOM pro and contra
PRO
The file is parsed only once.
High navigation abilities : this is the aim of the DOM design.

CONTRA
More memory needed since the XML tree is in memory.
SAX pro and contra
PRO
Low memory needs since the XML file is never entirely in
memory
Can deal with XML streams

CONTRA
The file has to be parsed entirely to access any node. Thus,
getting the 10 nodes included in a catalog ended up in parsing
10 times the same file.
Poor navigation abilities : no way to get easily the children of a
given node or the list of "B" nodes
SAX versus DOM
If your document is very large and you only need a
few elements - use SAX

If you need to process many elements and perform


operations on XML - use DOM

If you need to access the XML many times


- use DOM
Parser Products
Xerces4J / Xerces4C++ (Apache)
James Clarks XP (Java)
IBM XML4J / XML4C++
Java Project X (Sun)
Oracles XML Parser for Java
MSXML (Microsoft)
Dan Connollys XML Parser (Phyton)

Conclusion
The parser is key building block for every XML
application.

When building XML applications, you have to think


how will you handle large chunks of data

Choosing between SAX and DOM is not always trivial


The End

Questions?

Thank you!

You might also like