Introduction To XML
Introduction To XML
by Nikita Bais
Table Of Contents
Markup Languages What is XML ? The Difference Between XML and HTML How Can XML be Used? XML Structure XML Syntax Valid vs Well Formed XML Document Type Definition (DTD)
Markup Languages
Markup Languages
XML Basics
What is XML?
XML stands for EXtensible Markup Language XML was designed to carry data, not to display data XML tags are not predefined. Users can define their own tags. XML is designed to be self-descriptive XML is a W3C Recommendation XML documents can be validated using DTD
XML Basics
XML is not a replacement for HTML. XML is complement to HTML. XML and HTML were designed with different goals:
XML was designed to transport and store data, with focus on what data is HTML was designed to display data, with focus on how data looks
XML Basics
XML Separates Data from HTML XML Simplifies Data Sharing XML Simplifies Data Transport XML is Used to Create New Internet Languages
XML Structure
XML document includes logical and physical structure Logical Structure Indicates how document is built as opposed to what document contains.
Prolog
Document Element Follows prolog Heart of XML document where the actual content resides
(An entity has three parts: an ampersand (&), an entity name, and a semicolon (;). )
External Entities
Require separate storage Refers to a storage unit in its declaration by using SYSTEM or PUBLIC identifier
Syntax
<!ENTITY entity-name SYSTEM "URI/URL">
Example
<!ENTITY MyImage SYSTEM http://www.images.com/sunset.gif" NDATA GIF>
Ex: <!ENTITY MyImage PUBLIC -//Images//Text Standard Images//EN http://www.images.com/sunset.gif" NDATA GIF>
Parsed Entity
An entity made up of parsable text(any text data) XML processor extract content of entity Content of entity appears at the location of the entity reference in XML document
Example: <!ENTITY writer "Donald Duck.">
Entity declaration writer that contains Donald Duck
<author>&writer;</author>
Reference to the writer entity gets replaced with Donald Duck
Unparsed Entity
An entity that cannot be parsed by XML processor An entity might or might not be text, if text it is not parsable text i.e. binary. An entity sometimes referred as binary entity as its content is often binary file (i.e. image) Requires notation, that identifies the format, or type, of resource to which entity is declared.
Example
Entity Delcaration: <!ENTITY MyImage SYSTEM sunset.gif" NDATA GIF> Notation Declaration: <!NOTATION GIF SYSTEM //Utils/Gifview.exe> (This Specifies that XML processor should use Gifview.exe to process entity of type GIF)
XML Syntax
XML Syntax
XML Syntax
Attributes
Attributes provide a method of associating values to an element XML elements can have attributes in name/value pairs just like in HTML.
Example:
<EMAIL DATE=14/02/2011> </EMAIL>
Valid XML
XML validated against a DTD is "Valid" XML Obeys all the validity constraints identified in XML specification
Example: Validity Constraint : Required Attribute If default declaration is the key #REQUIRED then attribute must be specified for all the elements of the type in attribute-list declaration.
<?xml version="1.0" ?> <EMAIL> <TO>Ashish</TO> <CC>Rahul</CC> <SUBJECT>Meeting Reminder</SUBJECT> <BODY>Group Meeting at 4.00 PM</BODY> </EMAIL>
Benefits of well-formedness
For the Client saves downloading time of DTD, if the xml document is validated against DTD by server. In cases where validation is not required, the focus is on the structure of document.
(Note: Valid documents = Well-formedness + satisfying all validity constraints)
Document Classes
Background of design of XML Relates to OOP Conceptual use of inheritance and polymorphism
Example: Base class Book Book
Number Of Chapters
Cover Letter
DTD CONTD
CookBook
NumberOfChapters(Value 10) CoverLetter(Value Red) Recipe
TextBook
NumberOfChapters(Value 21) CoverLetter(Value Blue) Recipe
DTD CONTD
Polymorphism
Book
ArtBook
NumberOfChapters
DTD CONTD
DTD
Acts as a Rule Book that allows author to create new documents of same type and same characteristics as a base document Defines the building blocks of an XML document. Defines the document structure with a list of elements and attributes
DTD CONTD
DTD CONTD
DTD structure
Internal DTD (subset)
DTD which is declared inside XML document
<!DOCTYPE root-element [element-declarations]>
Internal DTD
CONTD.
Interpretation of DTD
!DOCTYPE EMAIL defines that the root element of this document is EMAIL !ELEMENT EMAIL defines that the EMAIL element contains four elements: " TO, FROM, CC, SUBJECT, BODY " !ELEMENT TO defines the TO element to be of type "#PCDATA" !ELEMENT FROM defines the FROM element to be of type "#PCDATA" !ELEMENT CC defines the CC element to be of type "#PCDATA !ELEMENT SUBJECT defines the SUBJECT element to be of type "#PCDATA !ELEMENT BODY defines the BODY element to be of type "#PCDATA"
External DTD
In the following example, email.dtd file is separately created and referenced in XML document as email.dtd
<?xml version="1.0"?> <!DOCTYPE EMAIL SYSTEM email.dtd"> <EMAIL> <TO>[email protected]</TO> <FROM>[email protected]</FROM> <CC>[email protected]</CC> <SUBJECT>My First DTD</SUBJECT> <BODY>Hello World</BODY> </EMAIL>
DTD CONTD
DTD CONTD
Element Declarations
Syntax: <!ELEMENT element-name category> or <!ELEMENT element-name (element-content)> Empty Elements : Empty elements are declared with the category keyword EMPTY: <!ELEMENT element-name EMPTY> Example: <!ELEMENT br EMPTY> XML example: <br />
DTD CONTD
Elements with Parsed Character Data Elements with only parsed character data are declared with #PCDATA inside parentheses: <!ELEMENT element-name (#PCDATA)> Example: <!ELEMENT FROM (#PCDATA)>
DTD CONTD
Elements with any Contents Elements declared with the category keyword ANY, can contain any combination of parsable data: <!ELEMENT element-name ANY> Example: <!ELEMENT EMAIL ANY>
DTD CONTD
Elements with Children (sequences) Elements with one or more children are declared with the name of the children elements inside parentheses: <!ELEMENT element-name (child1)> or <!ELEMENT element-name (child1,child2,...)> Example: <!ELEMENT EMAIL (TO, FROM, CC, SUBJECT, BODY)>
(NOTE : When children are declared in a sequence separated by commas, the children must appear in the same sequence in the document. )
DTD CONTD
Declaring Only One Occurrence of an Element <!ELEMENT element-name (child-name)> Example: <!ELEMENT EMAIL (BODY)> The example above declares that the child element BODY" must occur once, and only once inside the EMAIL" element.
DTD CONTD
Declaring Minimum One Occurrence of an Element <!ELEMENT element-name (child-name+)> Example: <!ELEMENT EMAIL (BODY+)> The + sign in the example above declares that the child element BODY" must occur one or more times inside the EMAIL" element.
DTD CONTD
Declaring Zero or More Occurrences of an Element <!ELEMENT element-name (child-name*)> Example: <!ELEMENT EMAIL (BODY*)> The * sign in the example above declares that the child element BODY" can occur zero or more times inside the EMAIL" element.
DTD CONTD
Declaring Zero or One Occurrences of an Element <!ELEMENT element-name (child-name?)> Example: <!ELEMENT EMAIL (BODY?)> The ? sign in the example above declares that the child element BODY" can occur zero or one time inside the EMAIL" element.
DTD CONTD
Declaring either/or Content Example: <!ELEMENT EMAIL(TO,FROM,CC,SUBJECT,(MESSAGE|BOD Y))> The example above declares that the EMAIL" element must contain a TO" element, a FROM" element, a CC" element, and either a MESSAGE" or a BODY" element.
DTD CONTD
Declaring Mixed Content Example: <!ELEMENT EMAIL (#PCDATA|TO|FROM|CC|SUBJECT|BODY)*> The example above declares that the EMAIL" element can contain zero or more occurrences of parsed character data, TO", FROM", CC", SUBJECT or BODY" elements.
DTD CONTD
Declaring Attributes An attribute declaration has the following syntax: <!ATTLIST element-name attribute-name attributetype default-value>
THANK YOU!!!!!!!