XML was designed to transport and store data.
HTML was designed to display data.
What You Should Already Know
Before you continue you should have a basic understanding of the following:
HTML
JavaScript
If you want to study these subjects first, find the tutorials on our Home page.
What is XML?
XML stands for EXtensible Markup Language
XML is a markup language much like HTML
XML was designed to carry data, not to display data
XML tags are not predefined. You must define your own tags
XML is designed to be self-descriptive
XML is a W3C Recommendation
The Difference Between XML and HTML
XML is not a replacement for HTML.
XML and HTML were designed with different goals:
XML was designed to transport and store data, with focus on what data is.
HTML was designed to display data, with focus on how data looks.
HTML is about displaying information, while XML is about carrying information.
XML Does not DO Anything
Maybe it is a little hard to understand, but XML does not DO anything. XML was created to structure, store,
and transport information.
The following example is a note to Tove from Jani, stored as XML:
The note above is quite self descriptive. It has sender and receiver information, it also has a heading and a
message body.
But still, this XML document does not DO anything. It is just pure information wrapped in tags. Someone
must write a piece of software to send, receive or display it.
XML is Just Plain Text
XML is nothing special. It is just plain text. Software that can handle plain text can also handle XML.
However, XML-aware applications can handle the XML tags specially. The functional meaning of the tags
depends on the nature of the application.
With XML You Invent Your Own Tags
The tags in the example above (like <to> and <from>) are not defined in any XML standard. These tags are
"invented" by the author of the XML document.
That is because the XML language has no predefined tags.
The tags used in HTML (and the structure of HTML) are predefined. HTML documents can only use tags
defined in the HTML standard (like <p>, <h1>, etc.).
XML allows the author to define his own tags and his own document structure.
XML is Not a Replacement for HTML
XML is a complement to HTML.
It is important to understand that XML is not a replacement for HTML. In most web applications, XML is
used to transport data, while HTML is used to format and display the data.
My best description of XML is this:
XML is a software- and hardware-independent tool for carrying information.
XML is a W3C Recommendation
XML became a W3C Recommendation 10. February 1998.
To read more about the XML activities at W3C, please read our W3C Tutorial.
XML is Everywhere
We have been participating in XML development since its creation. It has been amazing to see how quickly
the XML standard has developed, and how quickly a large number of software vendors has adopted the
standard.
XML is now as important for the Web as HTML was to the foundation of the Web.
XML is everywhere. It is the most common tool for data transmissions between all sorts of applications, and
is becoming more and more popular in the area of storing and describing information.
How Can XML be Used?
XML is used in many aspects of web development, often to simplify data storage and sharing.
XML Separates Data from HTML
If you need to display dynamic data in your HTML document, it will take a lot of work to edit the HTML each
time the data changes.
With XML, data can be stored in separate XML files. This way you can concentrate on using HTML for
layout and display, and be sure that changes in the underlying data will not require any changes to the HTML.
With a few lines of JavaScript, you can read an external XML file and update the data content of your HTML.
XML Simplifies Data Sharing
In the real world, computer systems and databases contain data in incompatible formats.
XML data is stored in plain text format. This provides a software- and hardware-independent way of storing
data.
This makes it much easier to create data that different applications can share.
XML Simplifies Data Transport
With XML, data can easily be exchanged between incompatible systems.
One of the most time-consuming challenges for developers is to exchange data between incompatible systems
over the Internet.
Exchanging data as XML greatly reduces this complexity, since the data can be read by different incompatible
applications.
XML Simplifies Platform Changes
Upgrading to new systems (hardware or software platforms), is always very time consuming. Large amounts
of data must be converted and incompatible data is often lost.
XML data is stored in text format. This makes it easier to expand or upgrade to new operating systems, new
applications, or new browsers, without losing data.
XML Makes Your Data More Available
Since XML is independent of hardware, software and application, XML can make your data more available
and useful.
Different applications can access your data, not only in HTML pages, but also from XML data sources.
With XML, your data can be available to all kinds of "reading machines" (Handheld computers, voice
machines, news feeds, etc), and make it more available for blind people, or people with other disabilities.
XML is Used to Create New Internet Languages
A lot of new Internet languages are created with XML.
Here are some examples:
XHTML the latest version of HTML
WSDL for describing available web services
WAP and WML as markup languages for handheld devices
RSS languages for news feeds
RDF and OWL for describing resources and ontology
SMIL for describing multimedia for the web
If Developers Have Sense
If they DO have sense, future applications will exchange their data in XML.
The future might give us word processors, spreadsheet applications and databases that can read each other's
data in a pure text format, without any conversion utilities in between.
We can only pray that all the software vendors will agree.
XML Tree
XML documents form a tree structure that starts at "the root" and branches to "the leaves".
An Example XML Document
XML documents use a self-describing and simple syntax:
<?xml version="1.0" encoding="ISO-8859-1"?>
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
The first line is the XML declaration. It defines the XML version (1.0) and the encoding used (ISO-8859-1 =
Latin-1/West European character set).
The next line describes the root element of the document (like saying: "this document is a note"):
<note>
The next 4 lines describe 4 child elements of the root (to, from, heading, and body):
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
And finally the last line defines the end of the root element:
</note>
You can assume, from this example, that the XML document contains a note to Tove from Jani.
Don't you agree that XML is pretty self-descriptive?
XML Documents Form a Tree Structure
XML documents must contain a root element. This element is "the parent" of all other elements.
The elements in an XML document form a document tree. The tree starts at the root and branches to the
lowest level of the tree.
All elements can have sub elements (child elements):
<root>
<child>
<subchild>.....</subchild>
</child>
</root>
The terms parent, child, and sibling are used to describe the relationships between elements. Parent elements
have children. Children on the same level are called siblings (brothers or sisters).
All elements can have text content and attributes (just like in HTML)
EXAMPLE:
The image above represents one book in the XML below:
<bookstore>
<book category="COOKING">
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
</book>
<book category="CHILDREN">
<title lang="en">Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
<book category="WEB">
<title lang="en">Learning XML</title>
<author>Erik T. Ray</author>
<year>2003</year>
<price>39.95</price>
</book>
</bookstore>
The root element in the example is <bookstore>. All <book> elements in the document are contained within
<bookstore>.
The <book> element has 4 children: <title>,< author>, <year>, <price>.
XML Syntax Rules
The syntax rules of XML are very simple and logical. The rules are easy to learn, and easy to use.
All XML Elements Must Have a Closing Tag
In HTML, you will often see elements that don't have a closing tag:
<p>This is a paragraph
<p>This is another paragraph
In XML, it is illegal to omit the closing tag. All elements must have a closing tag:
<p>This is a paragraph</p>
<p>This is another paragraph</p>
Note: You might have noticed from the previous example that the XML declaration did not have a closing tag.
This is not an error. The declaration is not a part of the XML document itself, and it has no closing tag.
XML Tags are Case Sensitive
XML elements are defined using XML tags.
XML tags are case sensitive. With XML, the tag <Letter> is different from the tag <letter>.
Opening and closing tags must be written with the same case:
<Message>This is incorrect</message>
<message>This is correct</message>
Note: "Opening and closing tags" are often referred to as "Start and end tags". Use whatever you prefer. It is
exactly the same thing.
XML Elements Must be Properly Nested
In HTML, you might see improperly nested elements:
<b><i>This text is bold and italic</b></i>
In XML, all elements must be properly nested within each other:
<b><i>This text is bold and italic</i></b>
In the example above, "Properly nested" simply means that since the <i> element is opened inside the <b>
element, it must be closed inside the <b> element.
XML Documents Must Have a Root Element
XML documents must contain one element that is the parent of all other elements. This element is called
the root element.
<root>
<child>
<subchild>.....</subchild>
</child>
</root>
XML Attribute Values Must be Quoted
XML elements can have attributes in name/value pairs just like in HTML.
In XML the attribute value must always be quoted. Study the two XML documents below. The first one is
incorrect, the second is correct:
<note date=12/11/2007>
<to>Tove</to>
<from>Jani</from>
</note>
<note date="12/11/2007">
<to>Tove</to>
<from>Jani</from>
</note>
The error in the first document is that the date attribute in the note element is not quoted.
Entity References
Some characters have a special meaning in XML.
If you place a character like "<" inside an XML element, it will generate an error because the parser interprets
it as the start of a new element.
This will generate an XML error:
<message>if salary < 1000 then</message>
To avoid this error, replace the "<" character with an entity reference:
<message>if salary < 1000 then</message>
There are 5 predefined entity references in XML:
< < less than
> > greater than
& & ampersand
' ' apostrophe
" " quotation mark
Note: Only the characters "<" and "&" are strictly illegal in XML. The greater than character is legal, but it is
a good habit to replace it.
Comments in XML
The syntax for writing comments in XML is similar to that of HTML.
<!-- This is a comment -->
White-space is Preserved in XML
HTML truncates multiple white-space characters to one single white-space:
HTML: Hello my name is Tove
Output: Hello my name is Tove.
With XML, the white-space in a document is not truncated.
XML Stores New Line as LF
In Windows applications, a new line is normally stored as a pair of characters: carriage return (CR) and line
feed (LF). The character pair bears some resemblance to the typewriter actions of setting a new line. In Unix
applications, a new line is normally stored as a LF character. Macintosh applications also use an LF to store a
new line.
XML Advanced
XML Namespaces
XML CDATA
XML Encoding
XML Server
XML DOM Advanced
XML Don't
XML Technologies
XML in Real Life
XML Editors
XML Summary
XML Namespaces
XML Namespaces provide a method to avoid element name conflicts.
Name Conflicts
In XML, element names are defined by the developer. This often results in a conflict
when trying to mix XML documents from different XML applications.
This XML carries HTML table information:
<table>
<tr>
<td>Apples</td>
<td>Bananas</td>
</tr>
</table>
This XML carries information about a table (a piece of furniture):
<table>
<name>African Coffee Table</name>
<width>80</width>
<length>120</length>
</table>
If these XML fragments were added together, there would be a name conflict. Both
contain a <table> element, but the elements have different content and meaning.
An XML parser will not know how to handle these differences.
Solving the Name Conflict Using a Prefix
Name conflicts in XML can easily be avoided using a name prefix.
This XML carries information about an HTML table, and a piece of furniture:
<h:table>
<h:tr>
<h:td>Apples</h:td>
<h:td>Bananas</h:td>
</h:tr>
</h:table>
<f:table>
<f:name>African Coffee Table</f:name>
<f:width>80</f:width>
<f:length>120</f:length>
</f:table>
In the example above, there will be no conflict because the two <table> elements have
different names.
XML Namespaces - The xmlns Attribute
When using prefixes in XML, a so-called namespace for the prefix must be defined.
The namespace is defined by the xmlns attribute in the start tag of an element.
The namespace declaration has the following syntax. xmlns:prefix="URI".
<root>
<h:table xmlns:h="http://www.w3.org/TR/html4/">
<h:tr>
<h:td>Apples</h:td>
<h:td>Bananas</h:td>
</h:tr>
</h:table>
<f:table xmlns:f="http://www.w3schools.com/furniture">
<f:name>African Coffee Table</f:name>
<f:width>80</f:width>
<f:length>120</f:length>
</f:table>
</root>
In the example above, the xmlns attribute in the <table> tag give the h: and f: prefixes a
qualified namespace.
When a namespace is defined for an element, all child elements with the same prefix are
associated with the same namespace.
Namespaces can be declared in the elements where they are used or in the XML root
element:
<root
xmlns:h="http://www.w3.org/TR/html4/"
xmlns:f="http://www.w3schools.com/furniture">
<h:table>
<h:tr>
<h:td>Apples</h:td>
<h:td>Bananas</h:td>
</h:tr>
</h:table>
<f:table>
<f:name>African Coffee Table</f:name>
<f:width>80</f:width>
<f:length>120</f:length>
</f:table>
</root>
Note: The namespace URI is not used by the parser to look up information.
The purpose is to give the namespace a unique name. However, often companies use the
namespace as a pointer to a web page containing namespace information.
Try to go to http://www.w3.org/TR/html4/.
Uniform Resource Identifier (URI)
A Uniform Resource Identifier (URI) is a string of characters which identifies an
Internet Resource.
The most common URI is the Uniform Resource Locator (URL) which identifies an
Internet domain address. Another, not so common type of URI is the Universal
Resource Name (URN).
In our examples we will only use URLs.
Default Namespaces
Defining a default namespace for an element saves us from using prefixes in all the
child elements. It has the following syntax:
xmlns="namespaceURI"
This XML carries HTML table information:
<table xmlns="http://www.w3.org/TR/html4/">
<tr>
<td>Apples</td>
<td>Bananas</td>
</tr>
</table>
U
This XML carries information about a piece of furniture:
<table xmlns="http://www.w3schools.com/furniture"> J
<name>African Coffee Table</name>
<width>80</width>
<length>120</length>
</table>
Namespaces in Real Use
XSLT is an XML language that can be used to transform XML documents into other
formats, like HTML.
In the XSLT document below, you can see that most of the tags are HTML tags.
The tags that are not HTML tags have the prefix xsl, identified by the namespace F
xmlns:xsl="http://www.w3.org/1999/XSL/Transform": t
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<html>
<body>
<h2>My CD Collection</h2>
<table border="1">
<tr>
<th align="left">Title</th>
<th align="left">Artist</th>
</tr>
<xsl:for-each select="catalog/cd">
<tr>
<td><xsl:value-of select="title"/></td>
<td><xsl:value-of select="artist"/></td>
</tr>
</xsl:for-each>
</table>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
XML Encoding
XML documents can contain non ASCII characters, like Norwegian æ ø å , or French ê è é.
To avoid errors, specify the XML encoding, or save XML files as Unicode.
XML Encoding Errors
If you load an XML document, you can get two different errors indicating encoding problems:
An invalid character was found in text content.
You get this error if your XML contains non ASCII characters, and the file was saved as single-byte ANSI (or
ASCII) with no encoding specified.
Single byte XML file with encoding attribute.
Same single byte XML file with no encoding attribute.
Switch from current encoding to specified encoding not supported.
You get this error if your XML file was saved as double-byte Unicode (or UTF-16) with a single-byte
encoding (Windows-1252, ISO-8859-1, UTF-8) specified.
You also get this error if your XML file was saved with single-byte ANSI (or ASCII), with double-byte
encoding (UTF-16) specified.
Double byte XML file without encoding.
Same double byte XML file with single byte encoding.
Windows Notepad
Windows Notepad save files as single-byte ANSI (ASCII) by default.
If you select "Save as...", you can specify double-byte Unicode (UTF-16).
Save the XML file below as Unicode (note that the document does not contain any encoding attribute):
<?xml version="1.0"?>
<note>
<from>Jani</from>
<to>Tove</to>
<message>Norwegian: æøå. French: êèé</message>
</note>
The file above, note_encode_none_u.xml will NOT generate an error. But if you specify a single-byte
encoding it will.
The following encoding (open it), will give an error message:
<?xml version="1.0" encoding="windows-1252"?>
The following encoding (open it), will give an error message:
<?xml version="1.0" encoding="ISO-8859-1"?>
The following encoding (open it), will give an error message:
<?xml version="1.0" encoding="UTF-8"?>
The following encoding (open it), will NOT give an error:
<?xml version="1.0" encoding="UTF-16"?>
Conclusion
Always use the encoding attribute
Use an editor that supports encoding
Make sure you know what encoding the editor uses
Use the same encoding in your encoding attribute
XML on the Server
XML files are plain text files just like HTML files.
XML can easily be stored and generated by a standard web server.
Storing XML Files on the Server
XML files can be stored on an Internet server exactly the same way as HTML files.
Start Windows Notepad and write the following lines:
<?xml version="1.0" encoding="ISO-8859-1"?>
<note>
<from>Jani</from>
<to>Tove</to>
<message>Remember me this weekend</message>
</note>
Save the file on your web server with a proper name like "note.xml".
Generating XML with ASP
XML can be generated on a server without any installed XML software.
To generate an XML response from the server - simply write the following code and save it as an ASP file on
the web server:
<%
response.ContentType="text/xml"
response.Write("<?xml version='1.0' encoding='ISO-8859-1'?>")
response.Write("<note>")
response.Write("<from>Jani</from>")
response.Write("<to>Tove</to>")
response.Write("<message>Remember me this weekend</message>")
response.Write("</note>")
%>
Note that the content type of the response must be set to "text/xml".
See how the ASP file will be returned from the server.
If you want to study ASP, you will find our ASP tutorial on our homepage.
XML DOM Advanced
The XML DOM (Document Object Model) defines a standard way for accessing and manipulating XML documents.
The XML DOM
The DOM views XML documents as a tree-structure. All elements can be accessed through the DOM tree.
Their content (text and attributes) can be modified or deleted, and new elements can be created. The elements,
their text, and their attributes are all known as nodes.
In an earlier chapter of this tutorial we introduced the XML DOM , and used the XML DOM
getElementsByTagName() method to retrieve data from a DOM tree.
In this chapter we will describe some other commonly used XML DOM methods. In the examples below, we
have used the XML file: books.xml.
Get the Value of an Element
The following code retrieves the text value of the first <title> element:
x=xmlDoc.getElementsByTagName("title")[0].childNodes[0];
txt=x.nodeValue;
Get the Value of an Attribute
The following code retrieves the text value of the "lang" attribute of the first <title> element:
txt=xmlDoc.getElementsByTagName("title")[0].getAttribute("lang");
Change the Value of an Element
The following code changes the text value of the first <title> element:
x=xmlDoc.getElementsByTagName("title")[0].childNodes[0];
x.nodeValue="Easy Cooking";
Change the Value of an Attribute
The setAttribute() method can be used to change the value of an existing attribute, or to create a new attribute.
The following code adds a new attribute called "edition" (with the value "first") to each <book> element
x=xmlDoc.getElementsByTagName("book");
for(i=0;i<x.length;i++)
{
x[i].setAttribute("edition","first");
}
Create an Element
The createElement() method creates a new element node.
The createTextNode() method creates a new text node.
The appendChild() method adds a child node to a node (after the last child).
To create a new element with text content, it is necessary to create both an element node and a text node.
The following code creates an element (<edition>), and adds it to the first <book> element:
Remove an Element
The removeChild() method removes a specified node (or element).
The following code fragment will remove the first node in the first <book> element:
Example
x=xmlDoc.getElementsByTagName("book")[0];
x.removeChild(x.childNodes[0]);
XML CDATA
All text in an XML document will be parsed by the parser.
But text inside a CDATA section will be ignored by the parser.
PCDATA - Parsed Character Data
XML parsers normally parse all the text in an XML document.
When an XML element is parsed, the text between the XML tags is also parsed:
<message>This text is also parsed</message>
The parser does this because XML elements can contain other elements, as in this example, where the
<name> element contains two other elements (first and last):
<name><first>Bill</first><last>Gates</last></name>
and the parser will break it up into sub-elements like this:
<name>
<first>Bill</first>
<last>Gates</last>
</name>
Parsed Character Data (PCDATA) is a term used about text data that will be parsed by the XML parser.
CDATA - (Unparsed) Character Data
The term CDATA is used about text data that should not be parsed by the XML parser.
Characters like "<" and "&" are illegal in XML elements.
"<" will generate an error because the parser interprets it as the start of a new element.
"&" will generate an error because the parser interprets it as the start of an character entity.
Some text, like JavaScript code, contains a lot of "<" or "&" characters. To avoid errors script code can be
defined as CDATA.
Everything inside a CDATA section is ignored by the parser.
A CDATA section starts with "<![CDATA[" and ends with "]]>":
<script>
<![CDATA[
function matchwo(a,b)
{
if (a < b && a < 0) then
{
return 1;
}
else
{
return 0;
}
}
]]>
</script>
In the example above, everything inside the CDATA section is ignored by the parser.
Notes on CDATA sections:
A CDATA section cannot contain the string "]]>". Nested CDATA sections are not allowed.
The "]]>" that marks the end of the CDATA section cannot contain spaces or line breaks.
XML Don't
Here are some technologies you should try to avoid when using XML.
Internet Explorer - XML Data Islands
What is it? An XML data island is XML data embedded into an HTML page.
Why avoid it? XML Data Islands only works with Internet Explorer browsers.
What to use instead? You should use JavaScript and XML DOM to parse and display XML in HTML.
For more information about JavaScript and XML DOM, visit our XML DOM tutorial.
XML Data Island Example
This example uses the XML document "cd_catalog.xml".
Bind the XML document to an <xml> tag in the HTML document. The id attribute defines an id for the data
island, and the src attribute points to the XML file:
Example
This example only works in IE
<html>
<body>
<xml id="cdcat" src="cd_catalog.xml"></xml>
<table border="1" datasrc="#cdcat">
<tr>
<td><span datafld="ARTIST"></span></td>
<td><span datafld="TITLE"></span></td>
</tr>
</table>
</body>
</html>
The datasrc attribute of the <table> tag binds the HTML table to the XML data island.
The <span> tags allow the datafld attribute to refer to the XML element to be displayed. In this case,
"ARTIST" and "TITLE". As the XML is read, additional rows are created for each <CD> element.
Internet Explorer - Behaviors
What is it? Internet Explorer 5 introduced behaviors. Behaviors are a way to add behaviors to XML (or
HTML) elements with the use of CSS styles.
Why avoid it? The behavior attribute is only supported by Internet Explorer.
What to use instead? Use JavaScript and XML DOM (or HTML DOM) instead.
Example 1 - Mouseover Highlight
The following HTML file has a <style> element that defines a behavior for the <h1> element:
<html>
<head>
<style type="text/css">
h1 { behavior: url(behave.htc) }
</style>
</head>
<body>
<h1>Mouse over me!!!</h1>
</body>
</html>
The XML document "behave.htc" is shown below (The file contains a JavaScript and event handlers for the
elements):
<attach for="element" event="onmouseover" handler="hig_lite" />
<attach for="element" event="onmouseout" handler="low_lite" />
<script type="text/javascript">
function hig_lite()
{
element.style.color='red';
}
function low_lite()
{
element.style.color='blue';
}
</script>
Example 2 - Typewriter Simulation
The following HTML file has a <style> element that defines a behavior for elements with an id of "typing":
<html>
<head>
<style type="text/css">
#typing
{
behavior:url(typing.htc);
font-family:'courier new';
}
</style>
</head>
<body>
<span id="typing" speed="100">IE5 introduced DHTML behaviors.
Behaviors are a way to add DHTML functionality to HTML elements
with the ease of CSS.<br /><br />How do behaviors work?<br />
By using XML we can link behaviors to any element in a web page
and manipulate that element.</p>v </span>
</body>
</html>
The XML document "typing.htc" is shown below:
<attach for="window" event="onload" handler="beginTyping" />
<method name="type" />
<script type="text/javascript">
var i,text1,text2,textLength,t;
function beginTyping()
{
i=0;
text1=element.innerText;
textLength=text1.length;
element.innerText="";
text2="";
t=window.setInterval(element.id+".type()",speed);
}
function type()
{
text2=text2+text1.substring(i,i+1);
element.innerText=text2;
i=i+1;
if (i==textLength)
{
clearInterval(t);
}
}
</script>