WEB TECHNOLOGIES
PRASAD B
Assoc. Prof.
Dept. of Computer and Engineering
Email: [email protected]
UNIT - 2
Objective
• To introduce XML and processing of XML Data with
Java
Email: [email protected]
UNIT - 2
Scope
• XML plays an important role in many different IT systems.
• XML is often used for distributing data over the Internet.
• It is important (for all types of software developers!) to have a
good understanding of XML.
Email: [email protected]
UNIT – 2
Course Outcomes
• Explore the concepts of XML and how to Parse XML
files using Java DOM and SAX parsers.
Email: [email protected]
UNIT - 2
XML
( Extensible Markup Language)
What You Should Already Know
05/02/23 PRASAD B, Assoc. prof., 6
MLRITM,JNTU-H
XML - Introduction
05/02/23 PRASAD B, Assoc. prof., 7
MLRITM,JNTU-H
What is XML?
XML stands for EXtensible Markup Language
XML is a markup language much like HTML
XML was designed to store and transport data
XML was designed to be self-descriptive
XML is a W3C Recommendation
05/02/23 PRASAD B, Assoc. prof., 8
MLRITM,JNTU-H
What is a XML File?
XML Does Not DO Anything:
Maybe it is a little hard to understand, but XML does not DO anything
Is a text-based markup language derived from Standard Generalized
Markup Language (SGML).
XML was designed to store and transport data.
XML was designed to be both human- and machine-readable.
05/02/23 PRASAD B, Assoc. prof., 9
MLRITM,JNTU-H
The Difference Between XML and HTML
XML and HTML were designed with different goals:
XML was designed to carry data - with focus on what data is
HTML was designed to display data - with focus on how data
looks
XML tags are not predefined like HTML tags are
05/02/23 PRASAD B, Assoc. prof., 10
MLRITM,JNTU-H
What Can XML Do?
XML Simplifies Things:
It simplifies data sharing
It simplifies data transport
It simplifies platform changes
It simplifies data availability
XML is used in many aspects of web development.
XML is often used to separate data from presentation.
XML is Often a Complement to HTML
05/02/23 PRASAD B, Assoc. prof., 11
MLRITM,JNTU-H
Why XML?
There are three important characteristics of XML that make it useful in
a variety of systems and solutions:
XML is extensible: XML allows you to create your own self-
descriptive tags, or language, that suits your application.
XML carries the data, does not present it: XML allows you to store
the data irrespective of how it will be presented.
XML is a public standard: XML was developed by an organization
called the World Wide Web Consortium (W3C) and is available as an
open standard.
05/02/23 PRASAD B, Assoc. prof., 12
MLRITM,JNTU-H
XML Usage
XML can work behind the scene to simplify the creation of HTML documents for
large web sites.
XML can be used to exchange the information between organizations and systems.
XML can be used for offloading and reloading of databases.
XML can be used to store and arrange the data, which can customize your data
handling needs.
XML can easily be merged with style sheets to create almost any desired output.
Virtually, any type of data can be expressed as an XML document.
05/02/23 PRASAD B, Assoc. prof., 13
MLRITM,JNTU-H
XML - Documents
05/02/23 PRASAD B, Assoc. prof., 14
MLRITM,JNTU-H
XML - Documents
An XML document is a basic unit of XML information
composed of elements and other markup in an orderly
package.
An XML document can contains wide variety of data.
05/02/23 PRASAD B, Assoc. prof., 15
MLRITM,JNTU-H
XML – Document Sections
Document Prolog Section
Document Elements Section
05/02/23 PRASAD B, Assoc. prof., 16
MLRITM,JNTU-H
XML Document example
// Document Prolog Section
<?xml version="1.0"?>
// Document Elements Section
<contact-info>
<name> Tanmay Patil </name>
<company> TutorialsPoint </company>
<phone> (011) 123-4567 </phone>
</contact-info>
05/02/23 PRASAD B, Assoc. prof., 17
MLRITM,JNTU-H
Document Prolog Section:
The document prolog comes at the top of the document, before
the root element. This section contains:
XML declaration
Document type declaration
05/02/23 PRASAD B, Assoc. prof., 18
MLRITM,JNTU-H
Document Elements Section
Document Elements are the building blocks of XML.
These divide the document into a hierarchy of sections, each
serving a specific purpose.
We can separate a document into multiple sections so that they
can be rendered differently, or used by a search engine.
The elements can be containers, with a combination of text and
other elements.
05/02/23 PRASAD B, Assoc. prof., 19
MLRITM,JNTU-H
XML- Syntax
05/02/23 PRASAD B, Assoc. prof., 20
MLRITM,JNTU-H
XML Syntax
05/02/23 PRASAD B, Assoc. prof., 21
MLRITM,JNTU-H
XML - Declaration
05/02/23 PRASAD B, Assoc. prof., 22
MLRITM,JNTU-H
XML - Declaration
The XML document can optionally have an XML declaration.
It is written as below:
<?xml version="1.0" encoding="UTF-8"?>
Where version is the XML version and encoding specifies the
character encoding used in the document.
05/02/23 PRASAD B, Assoc. prof., 23
MLRITM,JNTU-H
Syntax Rules for XML Declaration
Must begin with "<?xml>" where "xml" is written in lower-case.
The XML declaration has no closing tag i.e. </?xml>
If document contains XML declaration, then it strictly needs to
be the first statement of the XML document.
If the XML declaration is included, it must contain version
number attribute.
05/02/23 PRASAD B, Assoc. prof., 24
MLRITM,JNTU-H
Syntax Rules for XML Declaration- CONT….
The Parameter names and values are case-sensitive.
The names are always in lower case.
The order of placing the parameters is important. The correct
order is: version, encoding and standalone.
Either single or double quotes may be used.
05/02/23 PRASAD B, Assoc. prof., 25
MLRITM,JNTU-H
Parameter Parameter_value Parameter_description
Version 1.0 Specifies the version of the XML
standard used.
Encoding UTF-8, UTF-16, ISO-10646- It defines the character encoding
UCS-2, ISO-10646-UCS-4, ISO- used in the document. UTF-8 is
8859-1 to ISO-8859-9, ISO- the default encoding used.
2022-JP, Shift_JIS, EUC-JP
Standalone yes or no. It informs the parser whether the
document relies on the
information from an external
source, such as external
document type definition (DTD),
for its content. The default value
is set to no. Setting it to yes tells
the processor there are no
external declarations required for
parsing the document.
05/02/23 PRASAD B, Assoc. prof.,
MLRITM,JNTU-H
XML Declaration Examples
XML declaration with no parameters:
<?xml >
XML declaration with version definition:
<?xml version="1.0">
XML declaration with all parameters defined:
<?xml version="1.0" encoding="UTF-8" standalone="no" ?
>
XML declaration with all parameters defined in single quotes:
<?xml version='1.0' encoding='iso-8859-1'
05/02/23 PRASAD B, Assoc. prof., 27
standalone='no' ?> MLRITM,JNTU-H
XML Tags and Elements
05/02/23 PRASAD B, Assoc. prof., 28
MLRITM,JNTU-H
XML - Tags and Elements
An XML file is structured by several XML-elements, also called
XML-nodes or XML-tags.
XML-elements' names are enclosed by triangular brackets < >
as shown below:.
<element>
05/02/23 PRASAD B, Assoc. prof., 29
MLRITM,JNTU-H
XML Tags
XML tags form the foundation of XML.
They define the scope of an element in the XML.
They can also be used to insert comments, declare settings
required for parsing the environment and to insert special
instructions.
05/02/23 PRASAD B, Assoc. prof., 30
MLRITM,JNTU-H
XML -TagsTypes:
Start Tag
End Tag
Empty Tag
05/02/23 PRASAD B, Assoc. prof., 31
MLRITM,JNTU-H
Empty Tag:
The text that appears between start-tag and end-tag is called content.
An element which has no content is termed as empty.
An empty element can be represented in two ways as below:
A start-tag immediately followed by an end-tag as shown below:
<hr></hr>
A complete empty-element tag is as shown below:
<hr />
05/02/23 PRASAD B, Assoc. prof., 32
MLRITM,JNTU-H
Syntax Rules for Tags and Elements
Element Syntax
Nesting of elements
Root element
Case sensitivity
05/02/23 PRASAD B, Assoc. prof., 33
MLRITM,JNTU-H
Element Syntax:
Each XML-element needs to be closed either with start or with end
Elements as shown below:
<element>....</element>
or in simple-cases, just this way:
<element/>
05/02/23 PRASAD B, Assoc. prof., 34
MLRITM,JNTU-H
Nesting of elements:
An XML-element can contain multiple XML-elements as its children, but the
children elements must not overlap. i.e., an end tag of an element must
have the same name as that of the most recent unmatched start tag.
Following example shows
incorrect nested tags: correct nested tags:
<?xml version="1.0"?> <?xml version="1.0"?>
<contact-info> <contact-info>
<company>TutorialsPoint <company>TutorialsPoint
<contact-info> </company>
</company> <contact-info>
05/02/23 PRASAD B, Assoc. prof., 35
MLRITM,JNTU-H
Root element:
XML documents must contain one root element that is
the parent of all other elements:
<root>
<child>
<subchild>.....</subchild>
</child>
</root>
05/02/23 PRASAD B, Assoc. prof., 36
MLRITM,JNTU-H
Case sensitivity:
The names of XML-elements are case-sensitive.
That means the name of the start and the end elements need to
be exactly in the same case.
For example
<contact-info> is different from <Contact-Info>.
05/02/23 PRASAD B, Assoc. prof., 37
MLRITM,JNTU-H
Rules for Tags and Elements Example:
<?xml version="1.0"?>
<contact-info>
<company> TutorialsPoint </company>
<contact-info>
05/02/23 PRASAD B, Assoc. prof., 38
MLRITM,JNTU-H
XML - Attributes
05/02/23 PRASAD B, Assoc. prof., 39
MLRITM,JNTU-H
XML - Attributes
An attribute specifies a single property for the element, using a
name/value pair.
An XML-element can have one or more attributes.
For example:
<a href="http://www.tutorialspoint.com/">Tutorialspoint!</a>
Here href is the attribute name and
http://www.tutorialspoint.com/ is attribute value.
05/02/23 PRASAD B, Assoc. prof., 40
MLRITM,JNTU-H
Syntax Rules for XML Attributes
Attribute names in XML (unlike HTML) are case sensitive.
Same attribute cannot have two values in a syntax.
The following example shows incorrect syntax:
<a b="x" c="y" b="z">....</a>
because the attribute b is specified twice:
Attribute names are defined without quotation marks, whereas attribute values
must always appear in quotation marks.
Following example demonstrates incorrect xml syntax:
<a b=x>....</a>
the attribute value is not defined in quotation marks.
05/02/23 PRASAD B, Assoc. prof., 41
MLRITM,JNTU-H
XML - Attribute Types
String Type
TokenizedT ype
Enumerated Type
05/02/23 PRASAD B, Assoc. prof., 42
MLRITM,JNTU-H
String Type
It takes any literal string as a value.
CDATA is a StringType.
CDATA is character data.
This means, any string of non-markup characters is a legal part
of the attribute.
05/02/23 PRASAD B, Assoc. prof., 43
MLRITM,JNTU-H
TokenizedT ype
This is more constrained type.
The validity constraints noted in the grammar are applied after the
attribute value is normalized.
05/02/23 PRASAD B, Assoc. prof., 44
MLRITM,JNTU-H
TokenizedT ype
ID : It is used to specify the element as unique.
IDREF : It is used to reference an ID that has been named for another element.
IDREFS : It is used to reference all IDs of an element.
ENTITY : It indicates that the attribute will represent an external entity in the
document.
ENTITIES : It indicates that the attribute will represent external entities in the
document.
NMTOKEN : It is similar to CDATA with restrictions on what data can be part of the
attribute.
NMTOKENS : It is similar to CDATA with restrictions on what data can be part of the
attribute.
05/02/23 PRASAD B, Assoc. prof., 45
MLRITM,JNTU-H
Enumerated Type
This has a list of predefined values in its declaration. out of which,
it must assign one value.
There are two types of enumerated attribute:
NotationType : It declares that an element will be referenced to a
NOTATION declared somewhere else in the XML document.
Enumeration : Enumeration allows you to define a specific list of
values that the attribute value must match.
05/02/23 PRASAD B, Assoc. prof., 46
MLRITM,JNTU-H
XML - References
05/02/23 PRASAD B, Assoc. prof., 47
MLRITM,JNTU-H
XML References
References usually allow you to add or include additional text or
markup in an XML document.
References always begin with the symbol "&" ,which is a
reserved character and end with the symbol ";"
05/02/23 PRASAD B, Assoc. prof., 48
MLRITM,JNTU-H
XML References Types
Entity References
Character References
05/02/23 PRASAD B, Assoc. prof., 49
MLRITM,JNTU-H
Entity References:
An entity reference contains a name between the start and the
end delimiters.
For example &
where amp is name. The name refers to a predefined string of text
and/or markup.
05/02/23 PRASAD B, Assoc. prof., 50
MLRITM,JNTU-H
Character References:
These contain references, such as A
contains a hash mark (“#”) followed by a number.
The number always refers to the Unicode code of a character.
In this case, 65 refers to alphabet "A".
05/02/23 PRASAD B, Assoc. prof., 51
MLRITM,JNTU-H
XML - Text
05/02/23 PRASAD B, Assoc. prof., 52
MLRITM,JNTU-H
XML Text
The names of XML-elements and XML-attributes are case-
sensitive, which means the name of start and end elements need
to be written in the same case.
To avoid character encoding problems, all XML files should be
saved as Unicode UTF-8 or UTF-16 files.
05/02/23 PRASAD B, Assoc. prof., 53
MLRITM,JNTU-H
XML Text – cont…..
Whitespace characters like blanks, tabs and line-breaks between
XML-elements and between the XML-attributes will be ignored.
Some characters are reserved by the XML syntax itself.
Hence, they cannot be used directly.
To use them, some replacement-entities are used.
05/02/23 PRASAD B, Assoc. prof., 54
MLRITM,JNTU-H
Replacement-entities
not allowed replacement- character
character entity description
< < less than
> > greater than
& & ampersand
' ' apostrophe
" " quotation mark
05/02/23 PRASAD B, Assoc. prof., 55
MLRITM,JNTU-H
predefined character entities
Entity name Character Decimal Hexadecimal
reference reference
quot " " "
amp & & &
apos ' ' '
lt < < <
gt > > >
05/02/23 PRASAD B, Assoc. prof., 56
MLRITM,JNTU-H
XML - Comments
05/02/23 PRASAD B, Assoc. prof., 57
MLRITM,JNTU-H
XML - Comments
XML comments are similar to HTML comments.
The comments are added as notes or lines for understanding
the purpose of an XML code.
Comments can be used to include related links, information
and terms.
They are visible only in the source code; not in the XML code.
Comments may appear anywhere in XML code.
05/02/23 PRASAD B, Assoc. prof., 58
MLRITM,JNTU-H
Syntax:
<!-------Your comment----->
A comment starts with <!-- and ends with -->.
we can add textual notes as comments between the characters.
we must not nest one comment inside the other.
05/02/23 PRASAD B, Assoc. prof., 59
MLRITM,JNTU-H
XML Comments Rules
Comments cannot appear before XML declaration.
Comments may appear anywhere in a document.
Comments must not appear within attribute values.
Comments cannot be nested inside the other comments.
05/02/23 PRASAD B, Assoc. prof., 60
MLRITM,JNTU-H
XML - DTDs
Document Type Declaration
05/02/23 PRASAD B, Assoc. prof., 61
MLRITM,JNTU-H
Document Type Declaration
The XML Document Type Declaration, commonly known as DTD,
is a way to describe XML language precisely.
DTDs check vocabulary and validity of the structure of XML
documents against grammatical rules of appropriate XML
language.
An XML DTD can be either specified inside the document, or it
can be kept in a separate document and then liked separately.
05/02/23 PRASAD B, Assoc. prof., 62
MLRITM,JNTU-H
<!DOCTYPE element DTD identifier
Syntax [
declaration1
declaration2 ........
]>
In the above syntax,
The DTD starts with <!DOCTYPE delimiter.
An element tells the parser to parse the document from the
specified root element.
DTD identifier is an identifier for the document type definition,
which may be the path to a file on the system or URL to a file on the
internet. If the DTD is pointing to external path, it is called External
Subset.
The square brackets [ ] enclose an optional list of entity
declarations called Internal Subset.
05/02/23 PRASAD B, Assoc. prof., 63
MLRITM,JNTU-H
XML DTD -Types
Internal DTD
External DTD
• System identifiers
• Public identifiers
05/02/23 PRASAD B, Assoc. prof., 64
MLRITM,JNTU-H
Internal DTD
A DTD is referred to as an internal DTD if elements are
declared within the XML files.
To refer it as internal DTD, standalone attribute in XML
declaration must be set to yes.
This means, the declaration works independent of external
source.
05/02/23 PRASAD B, Assoc. prof., 65
MLRITM,JNTU-H
Internal DTD - Syntax
<!DOCTYPE root-element [element-declarations]>
where root-element is the name of root element and
element-declarations is where we declare the elements.
05/02/23 PRASAD B, Assoc. prof., 66
MLRITM,JNTU-H
Following is a simple example of internal DTD:
// Start Declaration
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
// DTD
<!DOCTYPE address [
// DTD Body
<!ELEMENT address (name,company,phone)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT company (#PCDATA)>
<!ELEMENT phone (#PCDATA)>
// End Declaration
]>
// XML document
<address>
<name>Tanmay Patil</name>
<company>TutorialsPoint</company>
<phone>(011) 123-4567</phone>
</address>
05/02/23 PRASAD B, Assoc. prof., 67
MLRITM,JNTU-H
Start Declaration- Begin the XML
DTD- Immediately after the XML header, the document type
declaration follows, commonly referred to as the DOCTYPE
The DOCTYPE declaration has an exclamation mark (!) at the start of
the element name.
The DOCTYPE informs the parser that a DTD is associated with this
XML document.
05/02/23 PRASAD B, Assoc. prof., 68
MLRITM,JNTU-H
DTD Body- The DOCTYPE declaration is followed by body of the
DTD, where you declare elements, attributes, entities, and notations
End Declaration - Finally, the declaration section of the DTD is
closed using a closing bracket and a closing angle bracket (]>).
This effectively ends the definition, and thereafter, the XML
document follows immediately.
05/02/23 PRASAD B, Assoc. prof., 69
MLRITM,JNTU-H
Internal DTD - Rules
The document type declaration must appear at the start of the
document (preceded only by the XML header) — it is not
permitted anywhere else within the document.
Similar to the DOCTYPE declaration, the element declarations
must start with an exclamation mark.
The Name in the document type declaration must match the
element type of the root element.
05/02/23 PRASAD B, Assoc. prof., 70
MLRITM,JNTU-H
External DTD
In external DTD elements are declared outside the XML file.
They are accessed by specifying the system attributes which
may be either the legal .dtd file or a valid URL.
To refer it as external DTD, standalone attribute in the XML
declaration must be set as no.
This means, declaration includes information from the external
source.
05/02/23 PRASAD B, Assoc. prof., 71
MLRITM,JNTU-H
External DTD - Syntax
<!DOCTYPE root-element SYSTEM "file-name">
where file-name is the file with .dtd extension.
05/02/23 PRASAD B, Assoc. prof., 72
MLRITM,JNTU-H
Following is a simple example of external DTD:
<?xml version="1.0" encoding="UTF-8"
standalone="no" ?>
<!DOCTYPE address SYSTEM "address.dtd">
address.dtd
<address>
<!ELEMENT address
<name>Tanmay Patil</name>
(name,company,phone)>
<company>TutorialsPoint</company> <!ELEMENT name (#PCDATA)>
<phone>(011) 123-4567</phone> <!ELEMENT company (#PCDATA)>
<!ELEMENT phone (#PCDATA)>
</address>
05/02/23 PRASAD B, Assoc. prof., 73
MLRITM,JNTU-H
External DTD - Types
we can refer to an external DTD by using either
System identifiers or
Public identifiers.
05/02/23 PRASAD B, Assoc. prof., 74
MLRITM,JNTU-H
System Identifiers
A system identifier enables you to specify the location of an external
file containing DTD declarations.
Syntax is as follows:
<!DOCTYPE name SYSTEM "address.dtd" [...]>
As you can see, it contains keyword SYSTEM and a URI
reference pointing to the location of the document.
05/02/23 PRASAD B, Assoc. prof., 75
MLRITM,JNTU-H
Public Identifiers
Public identifiers provide a mechanism to locate DTD resources
and are written as below:
<!DOCTYPE name PUBLIC "-//Beginning XML//DTD Address Example//EN">
As we can see, it begins with keyword PUBLIC, followed by a
specialized identifier.
Public identifiers are used to identify an entry in a catalog.
Public identifiers can follow any format, however, a commonly
used format is called Formal Public Identifiers, or FPIs.
05/02/23 PRASAD B, Assoc. prof., 76
MLRITM,JNTU-H
XML - Schemas
05/02/23 PRASAD B, Assoc. prof., 77
MLRITM,JNTU-H
XML - Schemas
XML Schema is commonly known as XML Schema Definition (XSD).
It is used to describe and validate the structure and the content of
XML data.
XML schema defines the elements, attributes and data types.
Schema element supports Namespaces.
It is similar to a database schema that describes the data in a
database.
05/02/23 PRASAD B, Assoc. prof., 78
MLRITM,JNTU-H
XML Schemas are More Powerful than DTD
XML Schemas are written in XML
XML Schemas are extensible to additions
XML Schemas support data types
XML Schemas support namespaces
05/02/23 PRASAD B, Assoc. prof., 79
MLRITM,JNTU-H
Why Use an XML Schema?
With XML Schema, XML files can carry a description of its
own format.
With XML Schema, independent groups of people can agree
on a standard for interchanging data.
With XML Schema, we can verify data.
05/02/23 PRASAD B, Assoc. prof., 80
MLRITM,JNTU-H
XML Schemas Support Data Types
One of the greatest strength of XML Schemas is the support for
data types:
It is easier to describe document content
It is easier to define restrictions on data
It is easier to validate the correctness of data
It is easier to convert data between different data types
05/02/23 PRASAD B, Assoc. prof., 81
MLRITM,JNTU-H
XML Schemas use XML Syntax
Another great strength about XML Schemas is that they are written
in XML:
You don't have to learn a new language
You can use your XML editor to edit your Schema files
You can use your XML parser to parse your Schema files
You can manipulate your Schemas with the XML DOM
You can transform your Schemas with XSLT
05/02/23 PRASAD B, Assoc. prof., 82
MLRITM,JNTU-H
XML Schemas are extensible, because they are written in XML.
With an extensible Schema definition you can:
Reuse your Schema in other Schemas
Create your own data types derived from the standard types
Reference multiple schemas in the same document
05/02/23 PRASAD B, Assoc. prof., 83
MLRITM,JNTU-H
XML Schemas Secure Data Communication
When sending data from a sender to a receiver, it is
essential that both parts have the same "expectations"
about the content.
With XML Schemas, the sender can describe the data in a
way that the receiver will understand.
XML data type "date" requires the format "YYYY-MM-DD".
05/02/23 PRASAD B, Assoc. prof., 84
MLRITM,JNTU-H
Well-Formed is Not Enough
A well-formed XML document is a document that conforms to the XML syntax rules, like:
it must begin with the XML declaration
it must have one unique root element
start-tags must have matching end-tags
elements are case sensitive
all elements must be closed
all elements must be properly nested
all attribute values must be quoted
entities must be used for special characters
Even if documents are well-formed they can still contain errors, and those
errors can have serious consequences.
05/02/23 PRASAD B, Assoc. prof., 85
MLRITM,JNTU-H
XML – Schemas - Syntax
<?xml version="1.0"?>
<xs:schema>
...
...
</xs:schema>
The <schema> element may contain some attributes.
elements and data types used in the schema
elements defined by this schema (note, to, from, heading, body.)
default namespace
elements used by the XML must be namespace qualified.
05/02/23 PRASAD B, Assoc. prof., 86
MLRITM,JNTU-H
XML – Schemas - Syntax
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.w3schools.com"
xmlns="http://www.w3schools.com"
elementFormDefault="qualified">
...
...
</xs:schema>
05/02/23 PRASAD B, Assoc. prof., 87
MLRITM,JNTU-H
<xs:schema xmlns:xs=http://www.w3.org/2001/XMLSchema
indicates that the elements and data types used in the schema
come from the "http://www.w3.org/2001/XMLSchema" namespace.
It also specifies that the elements and data types that come from
the "http://www.w3.org/2001/XMLSchema" namespace should be
prefixed with xs:
targetNamespace=http://www.w3schools.com
indicates that the elements defined by this schema (note, to, from,
heading, body.) come from the "http://www.w3schools.com" namespace.
05/02/23 PRASAD B, Assoc. prof., 88
MLRITM,JNTU-H
xmlns=http://www.w3schools.com]
indicates that the default namespace is
"http://www.w3schools.com".
elementFormDefault="qualified">
indicates that any elements used by the XML instance document which were
declared in this schema must be namespace qualified.
05/02/23 PRASAD B, Assoc. prof., 89
MLRITM,JNTU-H
Following is a simple example shows how to use schema:
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="contact">
<xs:complexType>
<xs:sequence>
<xs:element name="name" type="xs:string" />
<xs:element name="company" type="xs:string" />
<xs:element name="phone" type="xs:int" />
</xs:sequence>
The basic idea behind XML Schemas is
</xs:complexType>
that they describe the legitimate format
</xs:element>
</xs:schema>
that an XML document can take.
05/02/23 PRASAD B, Assoc. prof., 90
MLRITM,JNTU-H
XML – Schemas : Elements
As we saw in the XML – Elements are the building blocks of
XML document.
An element can be defined within an XSD as follows:
<xs:element name="x" type="y"/>
05/02/23 PRASAD B, Assoc. prof., 91
MLRITM,JNTU-H
XML – Schemas : Elements
As we saw in the XML – Elements are the building blocks of
XML document.
An element can be defined within an XSD as follows:
<xs:element name="x" type="y"/>
05/02/23 PRASAD B, Assoc. prof., 92
MLRITM,JNTU-H
Schemas Elements - Definition Types
we can define XML schema elements in following ways:
Simple Type
Complex Type
Global Types
05/02/23 PRASAD B, Assoc. prof., 93
MLRITM,JNTU-H
Simple Type
Simple type element is used only in the context of the text.
Some of predefined simple types are: xs:integer, xs:boolean,
xs:string, xs:date.
For example:
<xs:element name="phone_number" type="xs:int" />
05/02/23 PRASAD B, Assoc. prof., 94
MLRITM,JNTU-H
Complex Type
A complex type is a container for other element definitions.
This allows you to specify which child elements an element can
contain and to provide some structure within your XML
documents.
05/02/23 PRASAD B, Assoc. prof., 95
MLRITM,JNTU-H
In the example, Address element consists of child elements. This is a
container for other <xs:element> definitions, that allows to build a simple
hierarchy of elements in the XML document.
<xs:element name="Address">
<xs:complexType>
<xs:sequence>
<xs:element name="name" type="xs:string" />
<xs:element name="company" type="xs:string" />
<xs:element name="phone" type="xs:int" />
</xs:sequence>
</xs:complexType>
</xs:element>
05/02/23 PRASAD B, Assoc. prof., 96
MLRITM,JNTU-H
Global Types
With global type, we can define a single type in your document,
which can be used by all other references.
For example, suppose you want to generalize the person
and company for different addresses of the company. In such
case, you can define a general type as below:
05/02/23 PRASAD B, Assoc. prof., 97
MLRITM,JNTU-H
<xs:element name="AddressType">
<xs:complexType>
<xs:sequence>
<xs:element name="name" type="xs:string" />
<xs:element name="company" type="xs:string" />
</xs:sequence>
</xs:complexType>
</xs:element>
05/02/23 PRASAD B, Assoc. prof., 98
MLRITM,JNTU-H
Instead of having to define the name and the company twice (once for Address1 and once
for Address2), we now have a single definition. This makes maintenance simpler, i.e., if you
decide to add "Postcode" elements to the address, you need to add them at just one place.
<xs:element name="Address1">
<xs:complexType>
<xs:sequence>
<xs:element name="address" type="AddressType" />
<xs:element name="phone1" type="xs:int" />
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="Address2">
<xs:complexType>
<xs:sequence>
<xs:element name="address" type="AddressType" />
<xs:element name="phone2" type="xs:int" />
</xs:sequence>
</xs:complexType>
</xs:element>
05/02/23 PRASAD B, Assoc. prof., 99
MLRITM,JNTU-H
Attributes
Attributes in XSD provide extra information within an element.
Attributes have name andtype property as shown below:
<xs:attribute name="x" type="y"/>
05/02/23 PRASAD B, Assoc. prof., 100
MLRITM,JNTU-H
Restrictions on elements
Refer below link:
http://www.w3schools.com/xml/schema_facets.asp
05/02/23 PRASAD B, Assoc. prof., 101
MLRITM,JNTU-H
XML Parsers
05/02/23 PRASAD B, Assoc. prof., 102
MLRITM,JNTU-H
What is XML Parser?
It is a software library (or a package) that provides methods (or
interfaces) for client applications to work with XML documents
It checks the well-formattedness
It may validate the documents
It does a lot of other detailed things so that a client is shielded
from that complexities
05/02/23 PRASAD B, Assoc. prof., 103
MLRITM,JNTU-H
Types of Parsers
DOM: Document Object Model
SAX: Simple API for XML
A DOM parser implements DOM API
A SAX parser implement SAX API
05/02/23 PRASAD B, Assoc. prof., 104
MLRITM,JNTU-H
Dom Parser - Parses the document by loading the complete
contents of the document and creating its complete hiearchical tree
in memory.
SAX Parser - Parses the document on event based triggers. Does
not load the complete document into the memory.
05/02/23 PRASAD B, Assoc. prof., 105
MLRITM,JNTU-H
DOM Parser
05/02/23 PRASAD B, Assoc. prof., 106
MLRITM,JNTU-H
DOM Parser
A DOM document is an object containing all the information of
an XML document
It is composed of a tree (DOM tree) of nodes , and various nodes
that are somehow associated with other nodes in the tree but
are not themselves part of the DOM tree
05/02/23 PRASAD B, Assoc. prof., 107
MLRITM,JNTU-H
Main features of DOM parsers
A DOM parser creates an internal structure in memory which is a DOM
document object
Client applications get the information of the original XML document
by invoking methods on this Document object or on other objects it
contains
DOM parser is tree-based (or DOM obj-based)
Client application seems to be pulling the data actively, from the data
flow point of view
05/02/23 PRASAD B, Assoc. prof., 108
MLRITM,JNTU-H
Advantage:
It is good when random access to widely separated
parts of a document is required
It supports both read and write operations
Disadvantage:
It is memory inefficient
It seems complicated, although not really
05/02/23 PRASAD B, Assoc. prof., 109
MLRITM,JNTU-H
Java DOM Parser - Steps
Following are the steps used while parsing a document using DOM
Parser.
1. Import XML-related packages.
2. Read name of XML document using command prompt.
3. Invoke the parser
4. Call the method
05/02/23 PRASAD B, Assoc. prof., 110
MLRITM,JNTU-H
Check the file name :
DOM and SAX Program
For the program
05/02/23 PRASAD B, Assoc. prof., 111
MLRITM,JNTU-H
1. Import XML-related packages.
05/02/23 PRASAD B, Assoc. prof., 112
MLRITM,JNTU-H
2. Read name of XML document using command
prompt.
05/02/23 PRASAD B, Assoc. prof., 113
MLRITM,JNTU-H
3. Invoke the parser
05/02/23 PRASAD B, Assoc. prof., 114
MLRITM,JNTU-H
4. Call the method
05/02/23 PRASAD B, Assoc. prof., 115
MLRITM,JNTU-H
File name : Parsing_DOMDemo.java:
import java.io.*;
import javax.xml.parsers.*;
import org.w3c.dom.*;
import org.xml.sax.*; 1
public class Parsing_DOMDemo
{
static public void main(String[] arg)
catch (Exception e)
4
{
try
{ 2 {
System.out.println(file_name + "
System.out.print("Enter the name of XML document ");
isn't well-formed!");
BufferedReader input = new BufferedReader(new InputStreamReader(System.in));
String file_name = input.readLine(); System.exit(1);
File fp = new File(file_name); } }
if(fp.exists())
{
try
3 else
{ {
System.out.print("File not found!");
DocumentBuilderFactory Factory_obj = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = Factory_obj.newDocumentBuilder();
} }
InputSource ip_src = new InputSource(file_name);
Document doc = builder.parse(ip_src); catch(IOException ex)
System.out.println(file_name + " is well-formed!"); {
} ex.printStackTrace();
}} }
05/02/23 PRASAD B, Assoc. prof., 116
MLRITM,JNTU-H
File name : dom.xml
<?xml version="1.0"?>
<student> <name>SHARAN</name>
05/02/23 PRASAD B, Assoc. prof., 117
MLRITM,JNTU-H
After Validating:
File name: dom1.xml
<?xml version="1.0"?>
<student>
<name>SHARAN</name>
</student>
05/02/23 PRASAD B, Assoc. prof., 118
MLRITM,JNTU-H
SAX Parser
05/02/23 PRASAD B, Assoc. prof., 119
MLRITM,JNTU-H
SAX Parser
It does not first create any internal structure
Client does not specify what methods to call
Client just overrides the methods of the API and place his
own code inside there
When the parser encounters start-tag, end-tag,etc., it thinks
of them as events
05/02/23 PRASAD B, Assoc. prof., 120
MLRITM,JNTU-H
When such an event occurs, the handler automatically calls
back to a particular method overridden by the client, and
feeds as arguments the method what it sees
SAX parser is event-based,it works like an event handler in
Java (e.g. MouseAdapter)
Client application seems to be just receiving the data
inactively, from the data flow point of view
05/02/23 PRASAD B, Assoc. prof., 121
MLRITM,JNTU-H
Advantage:
It is simple
It is memory efficient
It works well in stream application
Disadvantage:
The data is broken into pieces and clients never
have all the information as a whole unless they
create their own data structure
05/02/23 PRASAD B, Assoc. prof., 122
MLRITM,JNTU-H
Java SAX Parser - Steps
Following are the steps used while parsing a document using DOM
Parser.
1. Import XML-related packages.
2. Read name of XML document using command prompt.
3. Invoke the XML reader parser
4. Call the method-parser
05/02/23 PRASAD B, Assoc. prof., 123
MLRITM,JNTU-H
Check the file name :
DOM and SAX Program
For the program
05/02/23 PRASAD B, Assoc. prof., 124
MLRITM,JNTU-H
1. Import XML-related packages.
05/02/23 PRASAD B, Assoc. prof., 125
MLRITM,JNTU-H
2. Read name of XML document using command
prompt.
05/02/23 PRASAD B, Assoc. prof., 126
MLRITM,JNTU-H
3. Invoke the XML reader parser
05/02/23 PRASAD B, Assoc. prof., 127
MLRITM,JNTU-H
4. Call the method - parser
05/02/23 PRASAD B, Assoc. prof., 128
MLRITM,JNTU-H
name : Parsing_SAXDemo.java:
import java.io.*;
import org.xml.sax.*;
import org.xml.sax.helpers.*;
1
public class Parsing_SAXDemo{
public static void main(String[] args) throws IOException
{
try {
System.out.print("Enter the name of XML document "); 2
BufferedReader input = new BufferedReader(new InputStreamReader(System.in));
String file_name = input.readLine();
File fp = new File(file_name);
if (fp.exists())
{
try
{
XMLReader reader = XMLReaderFactory.createXMLReader(); 3
reader.parse(file_name); System.out.println(file_name + " is well-formed.");
}
4
catch (Exception e) {
System.out.println(file_name + " is not well-formed."); System.exit(1);
} }
else
{
System.out.println("File is not present: " + file_name);
}}
catch (IOException ex)
{
ex.printStackTrace();
}}}
05/02/23 PRASAD B, Assoc. prof., 129
MLRITM,JNTU-H
File name : sax.xml
<?xml version="1.0"?>
<student> <name>SHARAN</name>
05/02/23 PRASAD B, Assoc. prof., 130
MLRITM,JNTU-H
After Validating:
File name: sax1.xml
<?xml version="1.0"?>
<student>
<name>SHARAN</name>
</student>
05/02/23 PRASAD B, Assoc. prof., 131
MLRITM,JNTU-H