XML
XML
XML can be used to simplify data storage and sharing. With XML, data is
separated from HTML. So you can create HTML layouts for displaying data.
When the data changes, you don't have to recreate your HTML le. With
XML, data can also be easily e!changed bet"een computer and database
systems# even they are incompatible in any other "ays. $ecause XML data is
stored in te!t format, this ma%es it easier to e!port data from a system to an
XML le, and then import it into another system.
What is XML?
XML stands for &Xtensible Mar%up Language, "hich became a W'(
)ecommendation on *+. ,ebruary *--.. XML is a mar%up language "hich is
li%e HTML. XML and HTML both use tags. $ut there are some di/erences
bet"een them0
HTML "as designed for ho" to display data. 1nd XML "as designed for
ho" to store data.
HTML tags are predened 2for e!ample 34p53, 34table53, etc.6. $ut
XML tags are not predened. 7ou must dene your o"n tags
The follo"ing e!ample is product information, stored as XML0
<product>
<name>
Garmin Oregon 300 3 inch' Touch screen Handheld G!
"nit #ith a $uilt%in $ase map and !haded &elie'
<(name>
<categor)>G!<(categor)>
<*rand>Garmin<(*rand>
<price>+,-.--<(price>
<description>
Handheld G! na/igator 'or use outdoors0 in a car0 or on a
*oat 3 inch L12 touch screen displa) 3,40 5 400 pi5els6
#ith *uilt%in picture /ie#er
<(description>
<(product>
1s you can see, it is 8ust pure information "rapped in tags, "hich are not li%e
HTML tags 234table53, 34tr53, etc.6 are predened. 7ou can create your o"n
tags, li%e the ones in the above e!ample, for your XML le.
* (reated $y
Mr. 9eependra )astogi, Lecturer
9epartment of (omputer Science,TM:
XML
XML can be used to simplify data storage and sharing. With XML, data is
separated from HTML. So you can create HTML layouts for displaying data.
When the data changes, you don't have to recreate your HTML le. With
XML, data can also be easily e!changed bet"een computer and database
systems, even they are incompatible in any other "ays. $ecause XML data is
stored in te!t format, this ma%es it easier to e!port data from a system to an
XML le, and then import it into another system. $efore "e have XML, the
tas% of e!changing data bet"een incompatible systems "as not easier and
complicated due to the di/erent data formats. ;o" XML greatly reduces this
comple!ity, and "e can be easily e!pand or upgrade our operating systems
and applications "ithout losing data.
XML 7ormat and !tructure
1 XML document is consist of tags and data. 1ll data in an XML document is
"rapped by the tags. Let's see an e!ample0
<?5ml /ersion89:.09 encoding89;!O%<<+-%:9?>
<customer>
<=rstname>Michael<(=rstname>
<lastname>!mith<(lastname>
<gender>male<(gender>
<address>
<street>:-> West ar? @/e.<(street>
<cit)>Ae# Bor?<(cit)>
<state>AB<(state>
<Cip>::3>+<(Cip>
<countr)>"!<(countr)>
<(address>
<phone>>:<%,3+%+D>0<(phone>
<email>msmith,><E)ahoo.com<(email>
<(customer>
The rst line is the XML declaration. <t denes the XML version 2*.+6 and the
character set 2<S=>..?-6. The ne!t line uses tag 34customer53. <t is the root
element of the document. The ne!t four lines describe the child elements 26
of the root0 @lease note that, e!cept the rst line, all tags in the document
A (reated $y
Mr. 9eependra )astogi, Lecturer
9epartment of (omputer Science,TM:
XML
are used in pair0 opening tag and closing tag. 7ou can assume, from the
above e!ample, a pair of tags provide a 3container3, and data or child tags
are stored inside.
1lso, you can assume, the structure of an XML document is dened by the
relationship bet"een these 3containers30 a large container contains smaller
containers. =r you can assume the structure of an XML le as a 3tree3 that
starts at the root tag 3customer3 and branches to the leaves tags
3rstname3, 3lastname3, 3gender3, 3address3, 3phone3 and 3email30
customer >5 rstname
lastname
gender
address >5 street
city
state
Bip
country
phone
email
@lease note, an XML document must contain a pair of root tags for the
3parent3 element, "hcih contains all other elements. 7ou cannot have t"o
roots in a single XML document. 1ll elements can have child elements0
4root5
4child*5......4Cchild*5
4childA5
4subchild*5.....4Csubchild*5
4subchildA5.....4CsubchildA5
4CchildA5
4Croot5
' (reated $y
Mr. 9eependra )astogi, Lecturer
9epartment of (omputer Science,TM:
XML
XML Aamespace
;amespaces have t"o purposes in XML0
*. To distinguish bet"een elements and attributes from di/erent
vocabularies "ith di/erent meanings and that happen to share the
same name.
A. To group all the related elements and attributes from a single XML
application together so that soft"are can easily recogniBe them.
What is an XML namespace?
1n XML namespace is a collection of names that can be used as element or
attribute names in an XML document. The namespace Dualies element
names uniDuely on the Web in order to avoid conEicts bet"een elements
"ith the same name. The namespace is identied by some :niform )esource
<dentier 2:)<6, either a :niform )esource Locator 2:)L6, or a :niform
)esource ;umber 2:);6, but it doesn't matter "hat, if anything, it points to.
:)<s is used simply because they are globally uniDue across the <nternet.
;amespaces can be declared either e!plicitly or by default. With an e!plicit
declaration, you dene a shorthand, or pre!, to substitute for the full name
of the namespace. 7ou use this pre! to Dualify elements belonging to that
namespace. &!plicit declarations are useful "hen a node contains elements
from di/erent namespaces. 1 default declaration declares a namespace to
be used for all elements "ithin its scope, and a pre! is not used.
Wh) is it necessar)?
1s an e!ample, let's assume "e have an )9$ "ith a table of the follo"ing
structure 2employeeTable, sectionTable6. What %ind of SFL statement "ould
you create to obtain a list of &mployee <9, &mployee 9epartment ;ame, and
&mployee ;ameG <f "e merged employeeTable and sectionTable "ith the
sec<9 column, then "e "ould be able to obtain a list of employee <9s,
employee departments and employee names. Ho"ever, since the name
column and sec<9 column e!ist in both tables, "e "ould have to designate
the table name before the name and sec<9 columns, or designate the alias of
the table in order to clarify the table of the name and sec<9 columns in
Duestion.
H (reated $y
Mr. 9eependra )astogi, Lecturer
9epartment of (omputer Science,TM:
XML
So, "hat happens "hen this data is e!pressed in an XML documentG Let's
ta%e t"o XML documents, and see ho" "e can create a list sho"ing
employee <9, employee department and employee name using XML.
,irst, "e "ill create a root element 2employeeList element6 and an element
2personList element6 to summariBe employee information for one individual.
7ou might have thought about creating an employee <9 element 2emp<9
element in employeeXML document6, department name element 2name
element in sectionXML document6 and an employee name element 2name
element in employeeXML document6 as child elements, but there's a
problem "ith this method. The problem is an 3element name conEict.3 The
name element in the employee XML document is dened as an element
representing the employee's name, "hereas the name element of the
section XML document is dened as an element representing a department
name. <n merging these t"o XML documents, you "ill get a name element
? (reated $y
Mr. 9eependra )astogi, Lecturer
9epartment of (omputer Science,TM:
XML
that has t"o separate meanings. While a human might be able to tell the
di/erence bet"een these t"o 3name3 elements and "hat they represent,
computer systems cannot determine "hat these name elements are
supposed to mean. The XML *.+ specication never considered the merging
of di/erent types of XML documents 2vocabularies6 to create a ne" XML
document, "hich led to this type of name conEict problem.
emplo)eeXML 2ocument
4employee5
4person<nfo5
4emp<95&++++++*4Cemp<95
4sec<95S++*4Csec<95
4name5Iohn Smith4Cname5
4Cperson<nfo5
4person<nfo5
4emp<95&++++++A4Cemp<95
4sec<95S++A4Csec<95
4name5<chiro Tana%a4Cname5
4Cperson<nfo5
4Cemployee5
sectionXML 2ocument
4section5
4section<nfo5
4sec<95S++*4Csec<95
4name5Sales4Cname5
4Csection<nfo5
4section<nfo5
4sec<95S++A4Csec<95
4name59evelopment4Cname5
4Csection<nfo5
4Csection5
J (reated $y
Mr. 9eependra )astogi, Lecturer
9epartment of (omputer Science,TM:
XML
emplo)eeListXML 2ocument
4employeeList5
4personList5
4emp<95&++++++*4Cemp<95
4name5Sales4Cname5
4name5Iohn Smith4Cname5
4CpersonList5
4personList5
4emp<95&++++++A4Cemp<95
4name59evelopment4Cname5
4name5<chiro Tana%a4Cname5
4CpersonList5
4CemployeeList5
The name element e5pressing emplo)ee name and the name
element e5pressing department name conFictG
@/oiding Hlement Aame 1onFicts
@erhaps some of you out there had the idea that element name conEicts can
be avoided by applying the same type of alias used for )9$ table merge
e!plained above to an XML document. The W'( recommended a
specication called 3;amespaces in XML,3 "hereby XML vocabularies are
mutually di/erentiated, allo"ing for the re>use of a vocabulary. :tiliBing this
XML namespace allo"s us to avoid any element name conEicts.
Here, "e "ill add an XML namespace declaration and description in both the
employee data XML document and section data XML document. ;e!t, "e "ill
merge these t"o XML documents, and create a ne" employeeList data XML
document. 1s a result, "e can di/erentiate the 3emp0name element3 of the
employeeList data XML document as an element representing employee
name, and the 3sec0name element3 as an element representing department
name. <t might be easiest to thin% of one namespace as an aggregation of
elements and attributes.
K (reated $y
Mr. 9eependra )astogi, Lecturer
9epartment of (omputer Science,TM:
XML
Hmplo)ee XML 2ocument
4emp0employee !mlns0empL3urn0corp0emp35
4emp0person<nfo5
4emp0emp<95&++++++*4Cemp0emp<95
4emp0sec<95S++*4Cemp0sec<95
4emp0name5Iohn Smith4Cemp0name5
4Cemp0person<nfo5
4emp0person<nfo5
4emp0emp<95&++++++A4Cemp0emp<95
4emp0sec<95S++A4Cemp0sec<95
4emp0name5<chiro Tana%a4Cemp0name5
4Cemp0person<nfo5
4Cemp0employee5
section XML 2ocument
4sec0section !mlns0secL3urn0corp0sec35
4sec0section<nfo5
4sec0sec<95S++*4Csec0sec<95
4sec0name5Sales4Csec0name5
4Csec0section<nfo5
4sec0section<nfo5
4sec0sec<95S++A4Csec0sec<95
4sec0name59evelopment4Csec0name5
4Csec0section<nfo5
4Csec0section5
emplo)eeList XML 2ocument
4list0employeeList
!mlns0listL3urn0corp0list3
!mlns0empL3urn0corp0emp3
!mlns0secL3urn0corp0sec35
4list0personList5
4emp0emp<95&++++++*4Cemp0emp<95
4sec0name5Sales4Csec0name5
4emp0name5Iohn Smith4Cemp0name5
4Clist0personList5
4list0personList5
4emp0emp<95&++++++A4Cemp0emp<95
4sec0name59evelopment4Csec0name5
. (reated $y
Mr. 9eependra )astogi, Lecturer
9epartment of (omputer Science,TM:
XML
4emp0name5<chiro Tana%a4Cemp0name5
4Clist0personList5
4Clist0employeeList5
Aamespace 2eclaration
Write a namespace declaration according to the follo"ing description
method, describing the element start tag0
<f the element andCor attribute belong to a namespace, a colon 23036 is placed
bet"een the namespace pre! and the element nameC attribute name.
1s a test, let's ta%e the previous employeeList XML document as an e!ample,
and provide a namespace declaration and an element belonging to the
namespace.
<empIemplo)ee 5mlnsIemp89urnIcorpIemp9>
4emp0person<nfo5
Momitted
4Cemp0person<nfo5
4Cemp0employee5
<n this e!ample, "e have declared the namespace pre! as 3emp3, and the
namespace identier 2:)<6 as 3urn0corp0emp3. This means that element
names and attribute names "ith the 3emp3 pre! 2including the employee
element6 all belong to the urn0corp0emp namespace.
- (reated $y
Mr. 9eependra )astogi, Lecturer
9epartment of (omputer Science,TM:
XML
<f the namespace pre! is not provided for an element andC or attribute
name, e!cept for cases "here a default namespace is declared 2to be
discussed later6, the element name andC or attribute name do not belong to
a namespace. While the namespace declaration is described as a start tag
attribute, this is di/erent than a regular attribute. &lements having only an
3!mlns0N3 description have a namespace declaration, but no attribute.
1ny arbitrary te!t string can be used as a namespace pre!# since there is
no special meaning, any te!t string "ill do. Ho"ever, the :)< must be
universally uniDue. Fuite often a :)L beginning "ith 3http0CCN3 is used in
practice. Since the :)L is not actually accessed, it is not a problem if the le,
etc. does not really e!ist. :nderstand that a :)< represents nothing more
than a logical namespace name.
2e'ault Aamespaces
1 3default namespace3 is a namespace declaration that does not use a
namespace pre! 2See ,igure for notation method6. The scope of the default
namespace is the element for "hich the namespace "as declared and the
related content, 8ust as "ith the namespace scope discussed earlier. The
benet of using a default namespace is that the namespace pre! can be
omitted.
,or e!ample, "hen adding a ne" namespace to an e!isting XML document,
"riting a namespace pre! for each element to "hich the ne" namespace
*+ (reated $y
Mr. 9eependra )astogi, Lecturer
9epartment of (omputer Science,TM:
XML
"ill be applied involves a tremendous amount of tedious "or%. The larger the
XML document, the greater the labor involved, and the greater the li%elihood
of notation errors. <n this type of situation, adding only a default namespace
declaration to the XML document in Duestion eliminates the need to "rite a
namespace pre! for each and every element, saving a lot of time.
=n the other hand, there are dra"bac%s. =ne dra"bac% is that omitting the
namespace pre! ma%es it more diOcult to understand "hich element
belongs to "hich namespace, and "hich namespace is applicable. <n
addition, programmers should remember that "hen a default namespace is
declared, the namespace is applied only to the element, and not to any
attributes.
1 default namespace can be over"ritten partially by declaring a completely
di/erent default namespace "ithin the scope of the original default
namespace. 1 default namespace can be canceled using the follo"ing
notation method0
** (reated $y
Mr. 9eependra )astogi, Lecturer
9epartment of (omputer Science,TM:
XML
Aotation to cancel a de'ault namespace
elementname 5mlns899
The !mlnsL33 designation frees an element "ithin the namespace scope
from belonging to any namespace. 1 namespace using a namespace pre!
can be designated "ithin the scope of a default namespace.
*A (reated $y
Mr. 9eependra )astogi, Lecturer
9epartment of (omputer Science,TM:
XML
The Aeed 'or XML 2ocument !chema
XML documents are used for many di/erent purposes today. =rder
processing, invoices, estimates, travel e!pense reports, meeting minutes,
accounting forms, manuals, and other data used on a daily basis at "or% are
only the tip of the iceberg. ;o", "e see XML used for personal data such as
8ournals, household nances, and other applications. Pirtually any data can
be created using XML format, since XML allo"s a user to freely dene
element names and hierarchical structure.
<n this volume, "e "ill be loo%ing both at "riting XML documents simply
according to XML synta! 2"ell>formed XML documents6, as "ell as "riting
XML documents to be used as business>to>business data, or in other "ords, a
data format to be shared bet"een and among di/erent companies.
,or e!ample, if reDuested to create an XML document to serve as a purchase
order to be sent to Mr. 7 at (ompany X, "hat %ind of XML document "ould
you createG The follo"ing are three XML document e!amples, created by
three di/erent individuals. Ms. 1 has created very semantic element names.
Mr. $ has opted for rather abbreviated element names, and Ms. ( has
created elements having a hierarchical structure.
*' (reated $y
Mr. 9eependra )astogi, Lecturer
9epartment of (omputer Science,TM:
XML
&ven a simple reDuest to create an XML document to serve as a purchase
order to be sent to Mr. 7 at (ompany X can ta%e on a number of di/erent
XML document patterns. 1ny of the three e!amples above can be considered
to be proper XML documents. 1s long as the information reDuired for a
purchase order is included, li%ely any XML document you could create "ould
be a valid XML document for the purpose.
$ut "hat "ould change if you approached the tas% from Mr. 7's perspectiveG
1ssume that Mr. 7 uses the three XML documents above, or your XML
document, for order processing. Having received XML documents "ith
di/erent element names and hierarchical structures, Mr. 7 "ould have to
open each XML document in an editor, conrm "hether all of the reDuired
information "as present, and then process the purchase order. <n this case,
every purchase order "ould have to be processed by hand, and the entire
system could never be automated.
$ut "hat "ould happen if all XML documents sent had the same element
names and hierarchical structureG With standard element names and
structures, a system could be created to handle all incoming XML
documents, and order processing could be automated, "ithout Mr. 7 having
to verify the content of each individual document.
1 3Schema3 is "hat is reDuired to allo" the acceptance 2or creation6 of XML
documents "ith a standardiBed element name and hierarchy structure. We
%no" that in the )9$ "orld, a schema is dened "hen designing tables to
*H (reated $y
Mr. 9eependra )astogi, Lecturer
9epartment of (omputer Science,TM:
XML
stipulate category 2column6 data types and data siBes, set the primary %ey,
associate tables "ith other tables, etc. :nder XML Schema, a user notates
element names, orders of occurrence, and number of occurrences. When
XML is used for specic purposes, a schema "ill rst be dened, and then
XML documents "ill be created in accordance "ith that schema. <n doing so,
anyone can create an XML document having the same e!act element names
and hierarchical structure.
Let's ta%e another loo% at the tas% for creating an XML document to be used
for a purchase order. 1ssume that Mr. 7 sends to Ms. 1, Mr. $, and Ms. ( a
schema document for purchase orders 2XML document6. Ms. 1, Mr. $., and
Ms. ( then each create an XML document based on Mr. 7's schema. The
element names and hierarchical structure of the XML documents they send
to Mr. 7 are completely identical.
Mr. 7 can no" use an XML parser to verify "hether the documents have been
created according to the schema, so there is no need to open each le and
chec% element names and hierarchy structures. This reduces Mr. 7's
"or%load signicantly.
Types of XML Document Schema
There are many di/erent types of XML document schema. While the
follo"ing type of narrative format can be considered a type of schema, there
is the chance that di/erent people "ill interpret the narrative di/erently. This
is "hy, in general, XML document schema is created using Schema 9enition
Language. Schema 9enition Language is specialiBed denition language for
noting schema, and leaves no room for interpretive di/erences.
urchase Order XML !chema
Q*R The root element is 3orderform3
QAR The content of 3orderform3 is a 3customer3 element and a 3product3
element in that order. 3customer3 occurs once, and 3product3 may occur
Bero or more times.
Q'R The content of 3customer3 is the 3name3, 3address3, and 3tel3
elements, each occurring once in order
QHR The content of 3name3 and 3address3 is a te!t string
Q?R The content of 3tel3 is the 3portable3 and 3home3 elements, "ith either
one or the other occurring
QJR The content of 3portable3 and 3home3 is a te!t string
QKR The content of 3product3 is the 3productSname3 and 3num3 elements,
each occurring once in order
Q.R The content of 3productSname3 is a te!t string
*? (reated $y
Mr. 9eependra )astogi, Lecturer
9epartment of (omputer Science,TM:
XML
Q-R The content of 3num3 is a numeric value
There is more than one Schema 9enition Language out there. The Schema
9enition Language dened under the XML *.+ specication is the 39T9
29ocument Type 9enition6.3 1n even more strictly dened Schema
9enition Language is the 3XML Schema3 determined by the W'(. 9i/erent
vendors also have dened various Schema 9enition Languages.
2T2 !chema 2e=nition
:nder 9T9, the main categories comprising the XML document are declared.
9eclarations come under one of the follo"ing four categories0
Hlement T)pe 2eclaration
@ttri*ute List 2eclaration
Hntit) 2eclaration
Aotation 2eclaration
Here, "e "ill discuss the most important of these, the 3&lement Type
9eclaration.3
&lement Type 9eclarations declare elements contained "ithin an XML
document. The follo"ing sho"s the synta! for an &lement Type 9eclaration.
The (ontent Model is very important in the &lement Type 9eclaration. <t
denes "hether the element content is a te!t string or numeric value
2character data6, "hether only child elements occur 2element content6, etc.
*J (reated $y
Mr. 9eependra )astogi, Lecturer
9epartment of (omputer Science,TM:
XML
When content is te5t string or numeric /alue
When the element content is a te!t string or numeric value, the (ontent
Model is designated as T@(91T1. :nder 9T9, there is no di/erence bet"een
numeric type data and te!t type data. ,or e!ample, the follo"ing describes
the &lement Type 9eclaration that designates the content of 3productSname3
as a te!t string0
4U&L&M&;T productSname2T@(91T165
The correct element description that conforms to this &lement Type
9eclaration is 4productSname5television4CproductSname5. 9escribing a
child element such as 4productSname54abcC54CproductSname5 "ill cause
an error.
The follo"ing describes the &lement Type 9eclaration that designates the
content of 3num3 as a numeric value0
4U&L&M&;T num 2T@(91T165
The correct element description for this denition is 4num5*+4Cnum5. 1s
discussed earlier, both te!t strings and numeric values for element content
are designated as T@(91T1 under 9T9, so 4num5Ienny4Cnum5 is a correct
notation. The application must perform a chec% to see "hether the content
of an element is actually a number.
When content is a child element
When a child element occurs as the content of an element, the element
name of the child element occurring is designated in the (ontent Model.
Ho"ever, the order of occurrence and number of occurrences of the child
element must also be dened.
2e=ning the order o' occurrence
When there are a multiple number of child elements, you must designate the
order of occurrence. There are t"o "ays to notate the order of occurrence.
:sing a comma 2,6 bet"een the child element name and the ne!t child
element name indicates that the child elements "ill occur in the order given.
:sing a vertical line 2V6 means that either one or the other child element "ill
occur.
*K (reated $y
Mr. 9eependra )astogi, Lecturer
9epartment of (omputer Science,TM:
XML
3,3 =ccurs in the order given
3V3 &ither one or the other child element occurs
<n the follo"ing e!ample, the content of 3product3 is the 3productSname3 and
3num3 elements, occurring once each in that order.
4U&L&M&;T product 2productSname,num65
The follo"ing is a valid element description for this type of &lement Type
9eclaration0
4product5
4productSname5television4CproductSname5
4num5*+4Cnum5
4Cproduct5
$ecause 3,3 denes the order of occurrence as the order in "hich the child
element "as "ritten, the follo"ing "ould be e!amples of invalid notation0
To describe an &lement Type 9eclaration "here either the 3portable3 or
3home3 element 2child elements of 3tel36 occurs0
4U&L&M&;T tel 2portableVhome65
<n this case, the follo"ing "ould be an error "hen describing both the
portable and home elements0
tel
portable/portable
home/home
/tel
*. (reated $y
Mr. 9eependra )astogi, Lecturer
9epartment of (omputer Science,TM:
XML
2e=ning the num*er o' occurrences
<n addition to the order of occurrence for child element names, the number
of occurrences is also dened in the (ontent Model. The number of
occurrences is designated "ith one of three symbols0 3W3, 3X3 or 3G3. The 3W3
symbol means 3may occur Bero or more times.3 The 3X3 symbol means 3may
occur one or more times.3 The 3G3 symbol means 3may occur Bero times or
one time.3
1s "ith the notation e!amples for the &lement Type 9eclaration 4U
&L&M&;T product productSname,num5 sho"n earlier, not providing an
symbol for the number of occurrences means 3must occur once.3
3W3 May occur + or more times
3 3
May occur one or more times
3 3
May occur Bero times or once
;o designation =ne time
:nder 9T9, a programmer may not designate a specic number of
occurrences 2e.g. three times, bet"een t"o and ve times, etc.6.
,or e!ample, output the 3customer3 and 3product3 elements 2content of
3orderform36 in that order. To describe an &lement Type 9eclaration
designating one occurrence for 3customer3 and Bero or more occurrences
for 3product3, use the follo"ing notation0
4U&L&M&;T orderform 2customer,productW65
;o", let's describe all of the elements, referencing the notation e!amples
above.
:se the 3dtd3 e!tension "hen actually creating the document. The follo"ing
sho"s a le named 3order.dtd3, describing the @urchase =rder XML schema
9T90
order.dtd
*- (reated $y
Mr. 9eependra )astogi, Lecturer
9epartment of (omputer Science,TM:
XML
<GHLHMHAT order'orm 3customer0productJ6> K:LK,L
<GHLHMHAT customer 3name0address0tel6> K3L
<GHLHMHAT name 3M12@T@6> K4L
<GHLHMHAT address 3M12@T@6> K4L
<GHLHMHAT tel 3porta*le N home6> K+L
<GHLHMHAT porta*le 3M12@T@6> KDL
<GHLHMHAT home 3M12@T@6> KDL
<GHLHMHAT product 3productOname0num6> K>L
<GHLHMHAT productOname 3M12@T@6> K<L
<GHLHMHAT num 3M12@T@6> K-L
L;!T: Palid XML 2ocument 'or 2T2
orderform.!ml
4U9=(T7@& orderform S7ST&M 3order.dtd35
4orderform5
4customer5
4name5Ienny4Cname5
4address5To%yo4Caddress5
4tel5
4portable5???>????>????4Cportable5
4Ctel5
4Ccustomer5
4product5
4productSname5"ashing machine4CproductSname5
4num5*4Cnum5
4Cproduct5
4product5
4productSname5television4CproductSname5
4num5A4Cnum5
4Cproduct5
4Corderform5
Declaration to Associate an XML Document and Schema Document
The 4U9=(T7@&5 at the beginning of L<ST* is called the 39ocument
Type 9eclaration,3 and designates the 9T9 that denes the structure of the
XML document. There are t"o types of notation methods, one being an
3internal subset3 describing the &lement Type 9eclaration and individual
declarations "ithin the 9ocument Type 9eclaration, and the other being an
3e!ternal subset3 2used here6 "here the &lement Type 9eclaration and
individual declarations are designated in an e!ternal le. <n this volume, "e
"ill discuss the notation method for an e!ternal subset.
A+ (reated $y
Mr. 9eependra )astogi, Lecturer
9epartment of (omputer Science,TM:
XML
The location for the 9ocument Type 9eclaration is predetermined, coming
above the start tag of the root element. The 9ocument Type 9eclaration
synta! is described as sho"n belo", and then the root element name and le
name are designated0
Validating the XML Document
=nce the schema document and XML document have been created, "e can
verify "hether the XML document has been created in accordance "ith the
schema document. This validation can be performed using an XML parser,
eliminating the need for manual verication or creating a separate validation
program.
<n the prior volume, "e e!plained ho" to use <nternet &!plorer 23<&36 to verify
"hether an XML document has been correctly "ritten. Ho"ever, the XML
parser incorporated "ithin <& cannot verify "hether an XML document has
been created in accordance "ith a particular schema document. 1ccordingly,
"e "ill use a verication XML processor.
Let's verify the XML document "e created against the schema document.
;e!t, create the XML document as sho"n in L<STA, and conduct the same
operation as before. 1n error message should result.
L;!T,;n/alid XML 2ocument #ith respect to a 2T2
orderformSerr.!ml
4U9=(T7@& order form S7ST&M 3order.dtd3 5
4orderform5
4customer5
4name5Ienny4Cname5
4address5To%yo4Caddress5
4Ccustomer5
4product5
4productSname5"ashing machine4CproductSname5
4num5*4Cnum5
4Cproduct5
4product5
A* (reated $y
Mr. 9eependra )astogi, Lecturer
9epartment of (omputer Science,TM:
XML
4productSname5television4CproductSname5
4num5A4Cnum5
4Cproduct5
4Corderform5
The reason that this type of error occurred is that the tel element does not
occur in the XML document in L<STA, "hile the schema document reDuires
that the 3name3, 3address3 and 3tel3 elements 2content of 3customer36 occur
once in that order. :se the error message in the dialog bo! as a clue to chec%
the line before and after the error, and ma%e the necessary edits.
Entity Declaration
<n the prior volume, "e discussed using predened entity references, since
343 and 3Y3 characters cannot be used directly as the content of an element.
Since the 4calculation5a*4bA4Ccalculation5 statement causes an error to
occur, "e re"rote the statement to read
4calculation5a*Ylt#bA4Ccalculation5. There are ve types of predened
entity references provided under the XML *.+ specication.
<Ta*le> rede=ned Hntit) &e'erences
&ntity &ntity ;ame Symbol ;otation
lt Ylt#
gt Ygt#
Y amp Yamp#
3 Duot YDuot#
' apos Yapos#
When using a 9T9 entity declaration, you can dene your o"n entity
references in addition to the ve types above.
AA (reated $y
Mr. 9eependra )astogi, Lecturer
9epartment of (omputer Science,TM:
XML
What is XML Schema?
<n previous, "e discussed "ell>formed XML documents, valid XML documents
using 9T9s, and XML parsers. 9T9 has a characteristically simple synta! for
functions and content denition. We see, ho"ever, that 9T9 functions and
denitions have limitations "hen it comes to using XML for a variety of
comple! purposes.
Traditionally, 9T9 has been the standard for XML schema denition#
ho"ever, XML usage has e!panded dramatically in core application systems,
being tailored for a "ide range of purposes for "hich 9T9 is not fully capable
of supporting. Ziven this development, the W'( recommended 3XML
Schema3 as a schema denition language to replace 9T9. The
recommendation of XML Schema has spurred its adoption as a standard
schema denition language.
A' (reated $y
Mr. 9eependra )astogi, Lecturer
9epartment of (omputer Science,TM:
XML
Diferences between XML Schema and DTD Defnitions
What di/erences are there bet"een XML Schema and 9T9 denitionsG We
"ill e!plain these di/erences using an XML document related to employee
information as an e!ample.
When dening XML Schema, the content you "ish to put into an XML
document must rst be summariBed. The ne!t step is to create a tree
structure.
1ontent to put into the XML documentI
*. The root element is 3&mployeeS<nfo3
A. 1s the content for 3&mployeeS<nfo,3 3&mployee3 occurs + or more times
'. 1s content of 3&mployee,3 3;ame,3 39epartment,3 3Telephone,3 and
3&mail3 elements occur once in respective order
H. 3;ame,3 39epartment,3 3Telephone,3 and 3&mail3 content are te!t
strings
?. 3&mployee3 has an attribute called 3&mployeeS;umber3
J. 3&mployeeS;umber3 content must be int type
This provides us "ith an understanding of the hierarchical structure of the
XML document. ;o", "e can provide a schema denition using actual
schema denition language.
L<ST* is an e!ample using 9T9 and providing a schema denition for the
content above, "hile L<STA is an e!ample using XML Schema to provide a
schema denition 2employee.!s6.
L;!T:I Hmplo)ee ;n'ormation 2T2
4U&L&M&;T &mployeeS<nfo 2&mployee6W5
AH (reated $y
Mr. 9eependra )astogi, Lecturer
9epartment of (omputer Science,TM:
XML
4U&L&M&;T &mployee 2;ame, 9epartment, Telephone, &mail65
4U&L&M&;T ;ame 2T@(91T165
4U&L&M&;T 9epartment 2T@(91T165
4U&L&M&;T Telephone 2T@(91T165
4U&L&M&;T &mail 2T@(91T165
4U1TTL<ST &mployee &mployeeS;umber (91T1 T)&F:<)&95
L;!T,Hmplo)ee ;n'ormation XML !chemaemplo)ee.5s
+* 4G!ml versionL3*.+3G5
+A 4!s0schema !mlns0!sL3http0CC"""."'.orgCA++*CXMLSchema3 5
+'
+H 4!s0element nameL3&mployeeS<nfo3 typeL3&mployee<nfoType3 C5
+? 4!s0comple!Type nameL3&mployee<nfoType35
+J 4!s0seDuence5
+K 4!s0element refL3&mployee3 min=ccursL3+3
ma!=ccursL3unbounded3 C5
+. 4C!s0seDuence5
+- 4C!s0comple!Type5
*+
** 4!s0element nameL3&mployee3 typeL3&mployeeType3 C5
*A 4!s0comple!Type nameL3&mployeeType35
*' 4!s0seDuence 5
*H 4!s0element refL3;ame3 C5
*? 4!s0element refL39epartment3 C5
*J 4!s0element refL3Telephone3 C5
*K 4!s0element refL3&mail3 C5
*. 4C!s0seDuence5
*- 4!s0attribute nameL3&mployeeS;umber3 typeL3!s0int3
useL3reDuired3C5
A+ 4C!s0comple!Type5
A*
AA 4!s0element nameL3;ame3 typeL3!s0string3 C5
A' 4!s0element nameL39epartment3 typeL3!s0string3 C5
AH 4!s0element nameL3Telephone3 typeL3!s0string3 C5
A? 4!s0element nameL3&mail3 typeL3!s0string3 C5
AJ
AK 4C!s0schema5
2Line numbers have been added for reference, and are not necessary in
the actual code.6
1s you see, the synta! is completely di/erent bet"een the t"o. ,or the 9T9,
a uniDue synta! is "ritten, "hereas the XML Schema is "ritten in XML format
A? (reated $y
Mr. 9eependra )astogi, Lecturer
9epartment of (omputer Science,TM:
XML
conforming to XML *.+ synta!. L<ST' is an e!ample of a valid XML document
for the L<STA XML Schema 2employee.!ml6.
L;!T3I Palid XML 2ocument 'or XML !chema 3emplo)ee.5ml6
4G!ml versionL3*.+3G5
4&mployeeS<nfo
!mlns0!siL3http0CC"""."'.orgCA++*CXMLSchema>instance3
!si0no;amespaceSchemaLocationL3employee.!s35
4&mployee &mployeeS;umberL3*+?35
4;ame5Masashi =%amura4C;ame5
49epartment59esign 9epartment4C9epartment5
4Telephone5+'>*H?A>H?JK4CTelephone5
4&mail5o%amura[!mltr.co.8p4C&mail5
4C&mployee5
4&mployee &mployeeS;umberL3*+-35
4;ame51i%o Tana%a4C;ame5
49epartment5Sales 9epartment4C9epartment5
4Telephone5+'>JH?->-.KJH4CTelephone5
4&mail5tana%a[!mltr.co.8p4C&mail5
4C&mployee5
4C&mployeeS<nfo5
,or 9T9, a 9=(T7@& declaration is used to associate "ith the XML document#
but, in the case of XML Schema, the specication does not particularly
determine anything "ith respect to the association of the XML document.
1ccordingly, the implementation method of the validation tool actually used
is follo"ed. Ho"ever, under the XML Schema specication, there is a dened
method for "riting a hint to associate "ith the XML document. The follo"ing
content is inserted into the root element of the XML document.
!mlns0!siL3http0CC"""."'.orgCA++*CXMLSchema>instance3
!si0no;amespaceSchemaLocationL3employee.!s3
XML Schema Structure
,rom here, using the L<STA employee.!s le as an e!ample, "e "ill e!plain
the method for "riting XML schema.
XML Schema oot Element
The schema element is used as the root element, and the XML Schema
3;amespace3 is declared. ;amespace is a specication used to avoid the
duplication of attribute and element names dened under XML, and is
AJ (reated $y
Mr. 9eependra )astogi, Lecturer
9epartment of (omputer Science,TM:
XML
normally designated using :)L format. :nder L<STA, the
3!mlns0!sL3http0CC"""."'.orgCA++*CXMLSchema3 section at Line A is a
;amespace declaration. The 3!s3 designation is called the 3;amespace
@re!,3 and can be used "ith an element and a child element. Zenerally, the
3!s3 pre! is used most often.
Element Declaration
When declaring an element, an &L&M&;T %ey"ord is used under 9T9#
ho"ever, under XML Schema, the element element is used. The declaration
method is di/erent depending on "hether the element element has a child
element or not. When no child element is present, the element name is
designated "ith the name attribute, and the data type is designated using
the type attribute.
:nder 9T9, not much more than being able to sho" an optional te!t string
called T@(91T1 as the element content "as possible# ho"ever, under XML
Schema, a variety of data types can be dened. 9ata types can be
designated using pre>dened embedded simple type 2;ote6, including string
type, int type and date type sho"n in a table, as "ell as <9 type and
;MT=\&; type that are compatible "ith 9T9. These can be combined and
e!tended or restricted to create ne", uniDue data types.
Ta*le I Main XML !chema 2ata T)pes
]Zeneral 9ata Types
;ame &!planation
AK (reated $y
Mr. 9eependra )astogi, Lecturer
9epartment of (omputer Science,TM:
XML
!s0integer <ntegers 2innite precision6
!s0positive<nteger @ositive integers 2innite precision6
!s0negative<ntege
r
;egative integers 2innite precision6
!s0non@ositive<nte
ger
;egative integers including + 2innite
precision6
!s0non;egative<nt
eger
@ositive integers including + 2innite
precision6
!s0byte <nteger represented by . bits
!s0unsigned$yte <nteger represented by . bits 2no
symbols6
!s0short <nteger represented by *J bits
!s0unsignedShort <nteger represented by *J bits 2no
symbols6
!s0int <nteger represented by 'A bits
!s0unsigned<nt <nteger represented by 'A bits 2no
symbols6
!s0long <nteger represented by JH bits
!s0unsignedLong <nteger represented by JH bits 2no
symbols6
!s0decimal 9ecimal number 2innite precision6
!s0Eoat Single>precision Eoating>point number
2'A>bit6
!s0double 9ouble>precision Eoating>point number
2JH>bit6
!s0$oolean $oolean value
A. (reated $y
Mr. 9eependra )astogi, Lecturer
9epartment of (omputer Science,TM:
XML
!s0string 1rbitrary te!t string
]Types )epresenting 9ates and Times
;ame &!planation
!s0time Time of day
!s0dateTime 9ate and time of day
!s0date 9ate
!s0g7ear 7ear
!s0g7earMonth 7ear and month
!s0gMonth Month
!s0gMonth9ay Month and day
!s0g9ay 9ay
] 9T9>(ompatible Types
;ame &!planation
!s0<9 XML *.+ Specication <9 type
!s0<9)&, XML *.+ Specication <9)&, type
!s0<9)&,S XML *.+ Specication <9)&,S type
!s0&;T<T7 XML *.+ Specication &;T<T7 type
!s0&;T<T<&S XML *.+ Specication &;T<T<&S type
!s0;=T1T<=; XML *.+ Specication ;=T1T<=; type
!s0;MT=\&; XML *.+ Specication ;MT=\&; type
A- (reated $y
Mr. 9eependra )astogi, Lecturer
9epartment of (omputer Science,TM:
XML
!s0;MT=\&;S XML *.+ Specication ;MT=\&;S type
Mean"hile, if the element has a child element, a ne" data type must rst be
designated for the element 2Line **60
4!s0element nameL3&mployee3 typeL3&mployeeType3 C5
This 3&mployeeType type3 designated by the type attribute is a (omple!
9ata Type. Lines ** through A+ are (omple! Type declarations. <n the actual
content of the (omple! Type, &mployeeType type is designated "ith the
name attribute of the comple!Type element, and the Model Zroup 2settings
method for the occurrence order of the child element6 is designated in the
child element. <n the Model Zroup, use the seDuence element to output
occurrences in the order "ritten 2eDuivalent to the 3,3 in 9T96, and use the
choice element to output the occurrence of any given element 2eDuivalent to
the 3V3 in 9T96.
Meaning of the Model
Zroup
XML Schema 9T9
=utput the element in
the "ritten order in the
e!act number of
occurrences
designated
seDuence element