Extensible Mark-up Language (XML)
Extensible Mark-up Language (XML) is a markup language, like HTML.
Both XML and HTML are based on Standard Generalized Mark-up
Language (SGML).
SGML was developed in 1970s . SGML focuses on content structure. This
language is good for creating catalogues, manuals etc: But it is a very
complex language and its specification is not freely available.
XML grew out of an effort to re-engineer SGML for the Web, generally to
make it simpler and easier to parse.
XML is called meta language because it can create other mark up
languages. This language was designed to describe data and its tags are
not pre-defined.
XML uses a DTD (Document Type Definition) to formally describe the
data.
It acts as an infrastructure because it is the core building block for a wide
XML is a simple text-based format for representing structured information:
documents, data, configuration, books, transactions, invoices, and much more.
XML is one of the most widely-used formats for sharing structured information
today: between programs, between people, between computers and people,
both locally and across networks.
Examples
XML is very widely used today. It is the basis of many standards such as the
Universal Business Language (UBL); Universal Plug and Play (UPnP) used for
home electronics; word processing formats such as ODF and OOXML; graphics
formats such as SVG; it is supported directly by computer programming
languages and databases, from giant servers all the way down to mobile
telephones.
If you double-click an icon on your computer desktop, chances are that an XML
message is sent from one component of the desktop to another.
If you take your car to be repaired, the engine's computer sends XML to the
mechanic's diagnostic systems. It is the age of XML: it is everywhere.
XML Separates Data from Presentation
XML does not carry any information about how to be displayed.
The same XML data can be used in many different presentation scenarios.
Because of this, with XML, there is a full separation between data and
presentation.
XML is Often a Complement to HTML
In many HTML applications, XML is used to store or transport data, while
HTML is used to format and display the same data.
XML Separates Data from HTML
When displaying data in HTML, you should not have to edit the HTML file
when the data changes.
With XML, the data can be stored in separate XML files.
With a few lines of JavaScript code, you can read an XML file and update
the data content of any HTML page.
HTML vs XML
XML is not a replacement for HTML, they are used together.
XML and HTML were designed with different goals:
• XML was designed to describe data and to focus on what data is. HTML was designed to
display data and to focus on how data looks.
• HTML is about displaying information, whereas XML is about describing information.
XML is extensible and its own tags can be created. But HTML has a restricted set of tags like
<TABLE>, <H1>, <B>, etc.
For example, consider designing a Web page o put a library catalog on the Web.
HTML code:
<HTML>
<BODY>
<H1>Harry Potter</H1>
<H2>J.K. Rowling</H2>
<H3>1999</H3>
<H3>Scholastic</H3>
</BODY>
XML Code:
<BOOK>
<TITLE>Harry Potter</TITLE>
<AUTHOR>J.K. Rowling</AUTHOR>
<PUBLISHER>Scholastic</PUBLISHER>
</BOOK>
XML is designed to:
• separate syntax from semantics to provide a common framework for
structuring information.
• allow self-made markup for any imaginable application domain.
• support internationalization (Unicode) and platform independence.
• be the future of structured information, including databases.
SYNTAX OF THE XML DOCUMENT
Let us discuss the syntax of the XML document with the help of the following
example:
<?xml version="1.0"?>
<note>
<to>Rajesh</to>
<from>Ravi</from>
<heading>Reminder</heading>
<body>This Saturday we have a meeting at 12.00 pm</body>
</note>
The first line in the document is the XML declaration and it should always be
included. It defines the XML version of the document. In this case, the
document conforms to the 1.0 specification of XML:
<?xml version="1.0"?>
The next line defines the first element of the document called the root
The next line defines the child elements of the root: to, from, heading, and body.
<to>Rajesh</to>
<from>Ravi</from>
<heading>Reminder</heading>
<body>This Saturday, we have a meeting at 12.00 pm</body>
The last line defines the end of the root element:
</note>
In HTML, some elements do not have a closing tag. The following code is legal in HTML:
<p>This is a paragraph
<p>This is another paragraph
But in XML, all elements must have a closing tag like this
<p>This is a paragraph</p>
<p>This is another paragraph</p>
XML tags are case-sensitive
The tag <Letter> is different from the tag <letter>.
Opening and closing tags must therefore be written with the same case.
For example:
<Message>This is incorrect</message> (WRONG)
All XML elements must be properly nested
<b><i>This text is bold and italic</i></b>
All XML documents must have a root tag.
All XML documents must contain a single tag pair to define the root
element. All other elements must be nested within the root element.
All elements can have sub (children) elements. Sub elements must be in
pairs and correctly nested within their parent element:
<root>
<child>
<subchild>
</subchild>
</child>
</root>
The attribute values must be included within quotes.
XML elements can have attributes in name/value pairs just like in HTML.
<?xml version="1.0"?>
<note date="12/11/05">
<to>Rajesh</to>
<from>Ravi</from>
<heading>Reminder</heading>
<body>This Saturday, we have a meeting</body>
</note>
XML ATTRIBUTES
XML attributes are normally used to describe XML elements, or to provide additional
information about elements.
For example in HTML, the <img> tag has SRC as an attribute. The SRC attribute provides
additional information about the element.
Attributes are always contained within the start tag of an element.
examples: <file type="gif">
<person id="3044“>
Usually attributes are used to provide information that is not a part of the content of the XML
document. Often the value of an attribute is more important to the XML parser than to the
reader.
The Use of Elements vs The Use of Attributes
Consider the following examples:
Using an attribute for gender:
<person gender="female">
<firstname>Raji</firstname>
<lastname>Kumar</lastname>
Using an element for gender :
<person>
<gender>female</gender>
<firstname> Raji </firstname>
<lastname> Kumar </lastname>
</person>
In the first example gender is an attribute. In the second example, gender is
an element.
Both the examples provide the same information to the reader. There are no
fixed rules about when to use attributes to describe data, and when to use
elements.
Another example demonstrating how elements can be used instead of
attributes:
The following three XML documents contain exactly the same information.
A date attribute is used in the first, a date element is used in the second, and
an expanded date element is used in the third.
Example 1:
<?xml version="1.0"?>
<note date="12/11/05“>
<to>Rajesh</to>
<from>Ravi</from>
</note>
Example 2:
<?xml version="1.0"?>
<note>
<date>12/11/05</date>
<to> Rajesh </to>
<from> Ravi </from>
</note>
Example 3:
<?xml version="1.0"?>
<note>
<date>
<day>12</day>
<month>11</month>
<year>05</year>
</date>
<to> Rajesh </to>
<from> Ravi </from>
The disadvantages of using attributes are:
• attributes cannot contain multiple values (but elements can)
• attributes are not expandable (for future changes)
• attributes cannot describe structures (like child elements can)
• attributes are more difficult to manipulate by program code
• attribute values are not easy to test against a DTD
XML VALIDATION
‘Well Formed' XML Documents
A Well Formed' XML document is a document that conforms to the XML
syntax rules that we have described in the previous section.
An XML document is called well-formed if it satisfies certain rules, specified by the
W3C.
These rules are:
• A well-formed XML document must have a corresponding end tag for all of its start
tags.
• Nesting of elements within each other in an XML document must be proper.
For example,
<tutorial>
<topic>XML</topic>
</tutorial> is a correct way of nesting but
<tutorial>
<topic>XML</tutorial>
</topic> is not.
• In each element two attributes must not have the same value.
For example,
<tutorial id="001">
<topic>XML</topic>
</tutorial> is right, but
<tutorial id="001" id="w3r">
<topic>XML</topic>
</tutorial> is incorrect.
• An XML document can contain only one root element. So, the root element of an xml
document is an element which is present only once in an xml document and it does not appear
as a child element within any other element.
The following is a “Well Formed” XML document:
<?xml version="1. 0"?>
<note>
<to> Rajesh</to>
<from>Ravi</from>
<heading>Reminder</heading>
</note>
‘Valid' XML Documents
A Valid' XML document is a ‘Well Formed' XML document which conforms to the
rules of a Document Type Definition (DTD).
The following is the same document as above but with an added reference to a DTD:
<?xml version="1. 0"?>
<!DOCTYPE note SYSTEM "[Link]">
<note>
<to> Rajesh</to>
<from>Ravi</from>
‹heading>Reminder</heading>
</note>
XML DTD
The purpose of a DTD is to define the legal building blocks of an XML document.
It defines the document structure with a list of legal elements. A DTD can be
declared inline in XML document, or as an external reference.
Internal DTD
The following is an XML document with a Document Type Definition.
<?xml version="1.0"?>
<! DOCTYPE note [
<! ELEMENT note (to, from, heading, body) >
<! ELEMENT to (#PCDATA) >
<! ELEMENT from (#PCDATA) >
<! ELEMENT heading (#PCDATA)>
<! ELEMENT body (#PCDATA)>
]>
<note>
<to>Rajesh</to>
<from>Ravi</from>
<heading>Reminder</heading>
‹body>This Saturday we have a meeting at 12.00 pm</body>
! ELEMENT note defines the element note as having four elements: to, from, heading, body.
! ELEMENT to defines the to element to be of the type CDATA (parsed character data).
!ELEMENT from defines the from element to be of the type CDATA.
External DTD
This is the same XML document with an external DTD:
<?xml version="1.0"?>
<! DOCTYPE note SYSTEM "[Link]">
<note>
<to>Rajesh</to›
<from>Ravi</from>
<heading>Reminder</heading>
<body>This Saturday we have a meeting at 12.00 pm</body>
</note>
This is a copy of the file [Link] containing the Document Type Definition:
<? xml version="1.0“?>
<! ELEMENT note (to, from, heading, body) >
<! ELEMENT to (#PCDATA)>
<! ELEMENT from (#PCDATA) >
<! ELEMENT heading (#PCDATA)>
<! ELEMENT body (# PCDATA)>
Why to use a DTD?
XML provides an application-independent way of sharing data.
With a DTD, independent groups of people can agree to use a common
DTD for interchanging data.
Anyone can use a standard DTD to verify that data in his application that
he receives from the outside world is valid You can also use a DTD to
verify your own data.
THE BUILDING BLOCKS OF XML DOCUMENTS
XML documents (and HTML documents) are made up of the following
building blocks:
Elements
Tags
Attributes
Entities
PCDATA
CDATA
Elements
Elements are the main building blocks of both XML and HTML documents. Examples
of HTML elements are body and table.
Examples of XML elements can be note and message.
Elements can contain text, other elements, or be empty.
Tags
Tags are used to mark up elements. A starting tag like <element_name> mark up
the beginning of an element, and an ending tag like </element_name> marks up
the end of an element.
Some examples are:
A body element: <body> body text in between </body>
A message element: <message> some message in between </message>
Attributes
Attributes provide extra information about elements. Attributes are placed inside the
start tag of an element.
Attributes come in name/value pairs. The following img element has an additional
information about a source file:
<img src="[Link]" />
PCDATA
PCDATA means Parsed Character data. PCDATA is text that will be parsed
by a parser. Tags inside the text will be treated as markup and entities
will be expanded
CDATA
CDATA also means character data. CDATA is text that does not get parsed
by a parser. Tags inside the text will not be treated as markup and the
entities will not be expanded.