XML is a versatile data format that is to be used for storing and transporting structured information. A significant amount of configuration files, data interchange, and others are done using XML in Java. For effective manipulation of XML documents in Java, there exists a set of parsers for XML. These parsers are capable of reading XML content and making them readable and editable. Any Java developer who is into XML has to know these parsers.
There are two main groups of Java XML parsers:
- DOM (Document Object Model)
- SAX (Simple API for XML).
Each parser type serves different needs, from simple data extraction to complex document manipulation.
This article tries to offer an introduction to these parsers and their subtypes; it will describe their key features and use cases.
XML File Used for Example XML File in Java
Below is the XML file to be used with Java Programs:
example.xml
<?xml version="1.0" encoding="UTF-8"?>
<Test>
<case id="1">
<domain>Java</domain>
<count>39</count>
</case>
<case id="2">
<domain>C/C++</domain>
<count>45</count>
</case>
</Test>
Types of XML Parsers
1. DOM (Document Object Model) Parser
Overview
The DOM parser reads the entire XML document and builds an in-memory tree representation, which allows the document to be traversed and manipulated by normal DOM APIs.
Features
- Tree View: This represents the XML document as a tree of nodes.
- Random Access: All nodes can be accessed and modified freely at any time.
- Rich API: traversal, manipulations, and querying methods over the document.
Use Cases
- Complex XML Documents: Useful to the documents where the nodes are supposed to be accessed and changed quite often.
- In-Memory Operations: Ideal for applications that require taking the entire XML structure into memory and manipulating it.
Pros and Cons
- Pros: Can be easily used and has robust navigation and modification abilities.
- Cons: Memory intensive, inefficient for large documents.
Example
Java
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
import org.w3c.dom.Node;
public class DomParserExample {
public static void main(String[] args) {
try {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse("example.xml");
NodeList nodeList = document.getElementsByTagName("exampleTag");
for (int i = 0; i < nodeList.getLength(); i++) {
Node node = nodeList.item(i);
System.out.println(node.getTextContent());
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
Output:
Java
C/C++
2. Simple API for XML (SAX) parser
Overview
Simple API for XML Parser is event-driven, just like an event-driven parser, but it has the additional ability to perform serial access. In this regard, it does not load the entire document into memory, as does the DOM parser; instead, it reads the document sequentially and generates events, such as when elements start and finish, which can be acted upon by custom event handlers.
Features
- Event-Driven: It parses the document and raises the events of elements and attributes.
- Low Memory Usage: It processes the document so that the entire document is not necessarily stored in memory.
- Fast Performance: Quick for large documents due to sequential access.
Use Cases
- Large XML Documents: Suitable for large documents where processing is needed for only some pieces.
- Streaming Requirements: Ideal for applications that work with XML data in a streaming fashion.
Pros and Cons
- Pros: Low memory footprint, fast processing.
- Cons: Hard to implement, no random access to elements.
Example
Java
import javax.xml.parsers.SAXParserFactory;
import javax.xml.parsers.SAXParser;
import org.xml.sax.helpers.DefaultHandler;
import org.xml.sax.Attributes;
public class SaxParserExample {
public static void main(String[] args) {
try {
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();
saxParser.parse("example.xml", new MyHandler());
} catch (Exception e) {
e.printStackTrace();
}
}
}
class MyHandler extends DefaultHandler {
public void startElement(String uri, String localName, String qName, Attributes attributes) {
System.out.println("Start Element: " + qName);
}
public void endElement(String uri, String localName, String qName) {
System.out.println("End Element: " + qName);
}
public void characters(char[] ch, int start, int length) {
System.out.println("Characters: " + new String(ch, start, length));
}
}
Output:
Start Element: Test
Characters:
Start Element: case
Characters:
Start Element: domain
Characters: Java
End Element: domain
Characters:
Start Element: count
Characters: 39
End Element: count
Characters:
End Element: case
Characters:
Start Element: case
Characters:
Start Element: domain
Characters: C/C++
End Element: domain
Characters:
Start Element: count
Characters: 45
End Element: count
Characters:
End Element: case
Characters:
End Element: Test
3. StAX (Streaming API for XML) Parser
Overview
StAX is a pull-parsing model of XML. It provides an application developer with the ability to pull events from the parser, such as the start and end of elements, when needed, and thus dramatically controls the parsing process.
Features
- Pull-Based: Control-based parsing is where developers control the parsing process by pulling events.
- Moderate Memory Usage: More efficient in memory than DOM, but not that much as SAX.
- Bidirectional Parsing: It allows for both forward and backward traversal of the document.
Use Cases
- Moderate-Sized Documents: Used in applications that require a balance between memory consumption and ease of use.
- Complex Processing Logic: Ideal for situations in which complex document processes are required.
Pros and Cons
- Pros: Well-balanced in memory usage and control; flexible.
- Cons: May be more complicated than SAX, and not as efficient for very large documents.
Example
Java
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamReader;
import javax.xml.stream.XMLStreamConstants;
import java.io.FileReader;
public class StaxParserExample {
public static void main(String[] args) {
try {
XMLInputFactory factory = XMLInputFactory.newInstance();
XMLStreamReader reader = factory.createXMLStreamReader(new FileReader("example.xml"));
while (reader.hasNext()) {
int event = reader.next();
switch (event) {
case XMLStreamConstants.START_ELEMENT:
System.out.println("Start Element: " + reader.getLocalName());
break;
case XMLStreamConstants.END_ELEMENT:
System.out.println("End Element: " + reader.getLocalName());
break;
case XMLStreamConstants.CHARACTERS:
if (reader.hasText()) {
System.out.println("Characters: " + reader.getText().trim());
}
break;
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
Output:
Start Element: Test
Characters:
Start Element: case
Characters:
Start Element: domain
Characters: Java
End Element: domain
Characters:
Start Element: count
Characters: 39
End Element: count
Characters:
End Element: case
Characters:
Start Element: case
Characters:
Start Element: domain
Characters: C/C++
End Element: domain
Characters:
Start Element: count
Characters: 45
End Element: count
Characters:
End Element: case
Characters:
End Element: Test
4. JAXB – Java Architecture for XML Binding
Overview
JAXB allows Java developers to map Java objects with XML representations and also assists in the reverse—verting XML representations to Java objects. The mapping of XML representations to Java objects and vice versa is vastly enhanced.
Features
- Object-XML Mapping: A technology that converts Java objects to XML and those that convert them back.
- Annotations: Annotations can be used to map Java classes with XML elements.
- Binding: In other words, automatically handling the binding between the Java objects and XML.
Use Cases
- Data binding: Ideal for applications that require frequent sweeping back and forth between Java-object and XML conversions.
- Configuration Files: Used by any application that uses XML for its configuration.
Pros and Cons
- Pros: This simplifies object-XML conversion, making the boilerplate code small.
- Cons: Less control over XML parsing compared to other methods.
Example
Java
import javax.xml.bind.JAXBContext;
import javax.xml.bind.Marshaller;
import javax.xml.bind.Unmarshaller;
import java.io.StringReader;
import java.io.StringWriter;
public class JaxbExample {
public static void main(String[] args) {
try {
JAXBContext context = JAXBContext.newInstance(Person.class);
// Marshalling - Convert Java object to XML
Person person = new Person("John", 30);
StringWriter writer = new StringWriter();
Marshaller marshaller = context.createMarshaller();
marshaller.marshal(person, writer);
System.out.println("XML Output:");
System.out.println(writer.toString());
// Unmarshalling - Convert XML to Java object
StringReader reader = new StringReader(writer.toString());
Unmarshaller unmarshaller = context.createUnmarshaller();
Person unmarshalledPerson = (Person) unmarshaller.unmarshal(reader);
System.out.println("Java Object:");
System.out.println(unmarshalledPerson);
} catch (Exception e) {
e.printStackTrace();
}
}
}
class Person {
private String name;
private int age;
// Default constructor is required for JAXB
public Person() {}
public Person(String name, int age) {
this.name = name;
this.age = age;
}
// Getters and setters are required for JAXB
public String getName() { return name; }
public void setName(String name) { this.name = name; }
public int getAge() { return age; }
public void setAge(int age) { this.age = age; }
@Override
public String toString() {
return "Person{name='" + name + "', age=" + age + '}';
}
}
Output:
XML Output:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<person>
<age>30</age>
<name>John</name>
</person>
Java Object:
Person{name='John', age=30}
Conclusion
Java's power to handle XML is in the rich set of tools for parsing and handling the language. The DOM parser is good when working with an XML in-memory setup; the SAX parser works well within a low-memory, high-performance environment; and the StAX parser, appropriate for a good balance between the two, will keep you in control of the parsing process. JAXB is designed to be easier with object-XML mapping, thus pretty good with applications requiring frequent data binding. Selection of a parser will require identification of an application's needs relative to other factors, including document size, memory available, and complexity of the XML processing to be undertaken. Having more information about these parsers will make you better prepared to manage XML well in your Java applications.
Similar Reads
StAX XML Parser in Java
This article focuses on how one can parse a XML file in Java.XML : XML stands for eXtensible Markup Language. It was designed to store and transport data. It was designed to be both human- and machine-readable. Thatâs why, the design goals of XML emphasize simplicity, generality, and usability acros
6 min read
JAVA DOM Parser
IntroductionJava is one of the most popular programming languages that is applicable to different purposes, starting from the web environment and ending with business-related ones. XML processing is one of the most important facets in the competence of Java as a language. The principal data exchange
8 min read
StAX vs SAX Parser in Java
Streaming the API for XML, called the StAX, is an API for reading and writing the XML Documents. It was introduced in Java 6 and is considered superior to SAX and DOM which are other methods in Java to access XML. Java provides several ways [APIs] to access XML. Traditionally, XML APIs are either- T
3 min read
Java Operators
Java operators are special symbols that perform operations on variables or values. These operators are essential in programming as they allow you to manipulate data efficiently. They can be classified into different categories based on their functionality. In this article, we will explore different
15 min read
ParseContextClass in Java
ParseContext class is a component of the Java package org.apache.tika.parser, which is used to parse context and pass it on to the Tika (The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types) parsers org.apache.tika.parser.ParseContext implements a
2 min read
Separators in Java
Be it a programmer who is unaware of this concept but is indulging in every program. Separators help us defining the structure of a program. In Java, There are few characters used as separators. The most commonly used separator in java is a semicolon(;). Let us explore more with the help of an illus
2 min read
JavaScript JSON Parser
JSON (JavaScript Object Notation) is a popular lightweight data exchange format for sending data between a server and a client, or across various systems. JSON data is parsed and interpreted using a software component or library called a JSON parser. Through the JSON parsing process, a JSON string i
3 min read
Java Strings
In Java, a String is the type of object that can store a sequence of characters enclosed by double quotes, and every character is stored in 16 bits, i.e., using UTF 16-bit encoding. A string acts the same as an array of characters. Java provides a robust and flexible API for handling strings, allowi
10 min read
Java Quantifiers
Quantifiers in Java allow users to specify the number of occurrences to match against. These are used with regular expressions to specify the number of times a particular pattern or character can appear in the Input. Below are some commonly used quantifiers in Java. Quantifiers Description X* Zero o
5 min read
How to parse JSON in Java
JSON (JavaScript Object Notation) is a lightweight, text-based, language-independent data exchange format that is easy for humans and machines to read and write. JSON can represent two structured types: objects and arrays. An object is an unordered collection of zero or more name/value pairs. An arr
4 min read