0% found this document useful (0 votes)
228 views32 pages

Introduction to XML Basics

The document provides an introduction to XML basics, including: - The components of an XML document include elements, attributes, processing instructions, comments, and the XML declaration. - Elements can contain child elements and have a beginning and ending tag. Attributes provide additional information about elements. - XML is designed to describe data rather than present it, allowing users to define their own elements and tags to structure information.

Uploaded by

Anonymous Tvpppp
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
228 views32 pages

Introduction to XML Basics

The document provides an introduction to XML basics, including: - The components of an XML document include elements, attributes, processing instructions, comments, and the XML declaration. - Elements can contain child elements and have a beginning and ending tag. Attributes provide additional information about elements. - XML is designed to describe data rather than present it, allowing users to define their own elements and tags to structure information.

Uploaded by

Anonymous Tvpppp
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd

XML Basics

With thanks
Slides from Nitin Rastogi

ICS 541 - 01 (062) XML Basics 1


Lecture objectives
• To introduce the basic components of XML.

ICS 541 - 01 (062) XML Basics 2


Lecture Outline
• Introduction
• The anatomy of XML document
• Components of XML document
• XML validation
• Rules for well-formed XML document
• XML DTD
• More XML components
• References
• Reading list

ICS 541 - 01 (062) XML Basics 3


Introduction
• What is XML?

• How can XML be used?

• What does XML look like?

• XML and HTML

• XML is free and extensible

ICS 541 - 01 (062) XML Basics 4


What is XML?
• XML stands for Extensible Markup Language.

• XML developed by the World Wide Web Consortium ([Link])


• Created in 1996. The first specification was published in 1998 by the W3C
• It is specifically designed for delivering information over the internet.

• XML like HTML is a markup language, but unlike HTML it doesn’t have
predefined elements.

• You create your own elements and you assign them any name you like, hence the
term extensible.

• HTML describes the presentation of the content, XML describes the content.

• You can use XML to describe virtually any type of document: Koran, works of
Shakespeare, and others.
• Go to [Link] to download
ICS 541 - 01 (062) XML Basics 5
How can XML be Used?
• XML is used to Exchange Data
• With XML, data can be exchanged between
incompatible systems
• With XML, financial information can be exchanged
over the Internet
• XML can be used to Share Data
• XML can be used to Store Data
• XML can make your Data more Useful
• XML can be used to Create new Languages

ICS 541 - 01 (062) XML Basics 6


What does XML look like?
<Bibliography>

<Book>
<Title> Java </Title>
Book <Author> Mustafa </Author>
Title Author year
<Year> 1995 </year>
</Book>
Java Mustafa 1995 …
Pascal Ahmed 1980 …
Basic Ali 1975 …
<Book>
Oracle Emad 1973
<Title> Oracle </Title>
…. …. <Author> Emad </Author>
<Year> 1973 </Year>
Relation </Book>
….
….
</ Bibliography>

XML document
ICS 541 - 01 (062) XML Basics 7
XML and HTML …

• XML is not a replacement for HTML


• XML was designed to carry data
• XML and HTML were designed with different goals
• XML was designed to describe data and to focus on what
data is
• HTML was designed to display data and to focus on how
data looks.
• HTML is about displaying information, while XML is
about describing information

ICS 541 - 01 (062) XML Basics 8


XML and HTML
• HTML is for humans
• HTML describes web pages
• You don’t want to see error messages about the web pages you visit
• Browsers ignore and/or correct as many HTML errors as they can, so
HTML is often sloppy

• XML is for computers


• XML describes data
• The rules are strict and errors are not allowed
• In this way, XML is like a programming language
• Current versions of most browsers can display XML

ICS 541 - 01 (062) XML Basics 9


XML is free and extensible
• XML tags are not predefined
• You must "invent" your own tags
• The tags used to mark up HTML documents and the
structure of HTML documents are predefined
• The author of HTML documents can only use tags that
are defined in the HTML standard

• XML allows the author to define his own tags and


his own document structure, hence the term
extensible.

ICS 541 - 01 (062) XML Basics 10


The Anatomy of XML Document

<?xml version:”1.0”?>
XML
Processing
Declaration
<?xml-stylesheet type="text/xsl" href=“[Link]"?> instruction

Comments <!-- File name: [Link] -->


Attribute
<Bibliography>
<Book ISBN=“1-111-122”>
<Title> Java </Title>
<Author> Mustafa </Author>
<Year> 1995 </Year>
</Book>
Root or document Elements nested
.
element Within root element
.
<Book>
<Title> Oracle </Title>
<Author> Emad </Author>
<Year> 1973 </Year>
</Book>
</Bibliography>

ICS 541 - 01 (062) XML Basics 11


Components of an XML Document
• Elements
• Each element has a beginning and ending tag
• <TAG_NAME>...</TAG_NAME>
• Elements can be empty (<TAG_NAME />)

• Attributes
• Describes an element; e.g. data type, data range, etc.
• Can only appear on beginning tag
• Example: <Book ISBN = “1-111-123”>

• Processing instructions
• Encoding specification (Unicode by default)
• Namespace declaration
• Schema declaration

ICS 541 - 01 (062) XML Basics 12


XML declaration
• The XML declaration looks like this:
<?xml version="1.0" encoding="UTF-8“ standalone="yes"?>

• The XML declaration is not required by browsers, but is required by most


XML processors (so include it!)

• If present, the XML declaration must be first--not even white space should
precede it

• Note that the brackets are <? and ?>

• version="1.0" is required (I am not sure it is the only version so far)

• encoding can be "UTF-8" (ASCII) or "UTF-16" (Unicode), or something


else, or it can be omitted

• standalone tells whether there is a separate DTD

ICS 541 - 01 (062) XML Basics 13


Processing Instructions
• PIs (Processing Instructions) may occur anywhere in the XML document
(but usually in the beginning)

• A PI is a command to the program processing the XML document to


handle it in a certain way

• XML documents are typically processed by more than one program

• Programs that do not recognize a given PI should just ignore it

• General format of a PI: <?target instructions?>

• Example: <?xml-stylesheet type="text/css“


href="[Link]"?>

ICS 541 - 01 (062) XML Basics 14


XML Elements
• An XML element is everything from the element's start
tag to the element's end tag
• XML Elements are extensible and they have
relationships
• XML Elements have simple naming rules
• Names can contain letters, numbers, and other characters
• Names must not start with a number or punctuation character
• Names must not start with the letters xml (or XML or Xml ..)
• Names cannot contain spaces

ICS 541 - 01 (062) XML Basics 15


XML Attributes
• XML elements can have attributes
• Data can be stored in child elements or in attributes
• Should you avoid using attributes?
• Here are some of the problems using attributes:

• attributes cannot contain multiple values (child elements can)

• attributes are not easily expandable (for future changes)

• attributes cannot describe structures (child elements can)

• attributes are more difficult to manipulate by program code

• attribute values are not easy to test against a Document Type Definition
(DTD) - which is used to define the legal elements of an XML document

ICS 541 - 01 (062) XML Basics 16


Distinction between subelement and attribute
• In the context of documents, attributes are part of markup, while subelement
contents are part of the basic document contents

• In the context of data representation, the difference is unclear and may be


confusing

• Same information can be represented in two ways

• <Book … Publisher = “McGraw Hill” … </Book>

• <Book>

<Publisher> McGraw Hill </Publisher>

</Book>

• Suggestion: use attributes for identifiers of elements, and use subelements


for contents

ICS 541 - 01 (062) XML Basics 17


XML Validation

• Well-Formed XML document:


• Is an XML document with the correct basic syntax

• Valid XML document:


• Must be well formed plus
• Conforms to a predefined DTD or XML Schema.

ICS 541 - 01 (062) XML Basics 18


Rules For Well-Formed XML
• Must begin with the XML declaration
• Must have one unique root element
• All start tags must match end-tags
• XML tags are case sensitive
• All elements must be closed
• All elements must be properly nested
• All attribute values must be quoted
• XML entities must be used for special characters
ICS 541 - 01 (062) XML Basics 19
XML DTD
• A DTD defines the legal elements of an XML
document
• defines the document structure with a list of legal elements
and attributes

• XML Schema
• XML Schema is an XML based alternative to DTD

• Errors in XML documents will stop the XML program

• XML Validators

ICS 541 - 01 (062) XML Basics 20


More XML components
• Namespace

• Entities

• CDATA

ICS 541 - 01 (062) XML Basics 21


Namespace
• Overview

• Declaration

• Default namespace

• Scope

• attribute

ICS 541 - 01 (062) XML Basics 22


Namespaces: Overview
• Part of XML’s extensibility

• Allow authors to differentiate between tags of the same


name (using a prefix)
• Frees author to focus on the data and decide how to best
describe it
• Allows multiple XML documents from multiple authors to
be merged

• Identified by a URI (Uniform Resource Identifier)


• When a URL is used, it does NOT have to represent a live
server
ICS 541 - 01 (062) XML Basics 23
Namespaces: Declaration

xmlns: bk = “[Link]

Prefix URI(URL)

Namespace declaration

Example:

<BOOK xmlns:bk=“[Link]
<bk:TITLE> All About XML </bk:TITLE>
<bk:AUTHOR> Joe Developer </bk:AUTHOR>
<bk:PRICE currency=‘US Dollar’> 19.99 </bk:PRICE>
</BOOK>

ICS 541 - 01 (062) XML Basics 24


Namespaces: Default Namespace
• An XML namespace declared without a prefix
becomes the default namespace for all sub-elements

• All elements without a prefix will belong to the


default namespace:

<BOOK
xmlns=“[Link]
<TITLE> All About XML </TITLE>
<AUTHOR> Joe Developer </AUTHOR>
</BOOK>
ICS 541 - 01 (062) XML Basics 25
Namespaces: Scope
• Unqualified elements belong to the inner-most default
namespace.
• BOOK, TITLE, and AUTHOR belong to the default book namespace

• PUBLISHER and NAME belong to the default publisher namespace

<BOOK xmlns=“[Link]/bookinfo”>
<TITLE> All About XML </TITLE>
<AUTHOR> Joe Developer </AUTHOR>
<PUBLISHER xmlns=“urn:publishers:publinfo”>
<NAME> Microsoft Press </NAME>
</PUBLISHER>
</BOOK>

ICS 541 - 01 (062) XML Basics 26


Namespaces: Attributes
• Unqualified attributes do NOT belong to any
namespace
• Even if there is a default namespace

• This differs from elements, which belong to the


default namespace

ICS 541 - 01 (062) XML Basics 27


Entities
• Entities provide a mechanism for textual substitution,

• e.g. Entity Substitution


&lt; <
&amp; &

• You can define your own entities

• Parsed entities can contain text and markup

• Unparsed entities can contain any data


• JPEG photos, GIF files, movies, etc.
ICS 541 - 01 (062) XML Basics 28
CDATA
• By default, all text inside an XML document is parsed
• You can force text to be treated as unparsed character data by
enclosing it in <![CDATA[ ... ]]>
• Any characters, even & and <, can occur inside a CDATA
• White space inside a CDATA is (usually) preserved
• The only real restriction is that the character sequence ]]>
cannot occur inside a CDATA
• CDATA is useful when your text has a lot of illegal
characters (for example, if your XML document contains
some HTML text)

ICS 541 - 01 (062) XML Basics 29


References
• W3 Schools XML Tutorial
• [Link]

• W3C XML page


• [Link]

• XML Tutorials
• [Link]

• Online resource for markup language technologies


• [Link]

• Several Online Presentations


ICS 541 - 01 (062) XML Basics 30
- Reading List
• W3 Schools XML Tutorial
• [Link]

ICS 541 - 01 (062) XML Basics 31


END

ICS 541 - 01 (062) XML Basics 32

You might also like