Web Technologies
Unit3 : XML
Date & Time :08-3-2021 &11:00AM-11.50AM
Mrs G Karthika
Asst.Professor
Department of CSE
GITAM Institute of Technology (GIT)
Visakhapatnam – 530045
Email: [email protected]
Department of CSE EID 302 & Web Technologies 1
Introduction
• eXtensible Markup Language
• Developed from SGML
• A meta-markup language
• HTML is a markup language, XML is used to define markup
languages
• Markup languages defined in XML are known as applications
• XML can be written by hand or generated by computer
– Useful for data exchange
04/21/25 Department of CSE EID 302 & Web Technologies 2
The Syntax of XML
• Levels of syntax
– Well-formed documents conform to basic XML rules
– Valid documents are well-formed and also conform to a schema which
defines details of the allowed content
• Well-formed XML documents
– All begin tags have a matching end tag
• Empty tags
– If a begin tag is inside an element, the matching end tag is also
– There is one root tag that contains all the other tags in a document
– Attributes must have a value assigned, the value must be quoted
– The characters <, >, & can only appear with their special meaning
• Validity is tested against a schema, discussed later
04/21/25 Department of CSE EID 302 & Web Technologies 3
XML Document Structure
• Auxiliary files
– Schema file: defines its tag set and structural syntactic rules
• DTD or XML Schema or one of several other
– Style file: contains a style sheet to describe how the content of the
document is to be printed or displayed
• Cascading Style Sheets
• XSLT
• Breaking file up
– Document entities
– Entity syntax
• Character data
– <![CDATA ….. ]]>
04/21/25 Department of CSE EID 302 & Web Technologies 4
Document Type Definitions
• A set of declarations
• Define tags, attributes, entities
• Specify the order and nesting of tags
• Specify which attributes can be used with
which tags
• General syntax
– <!keyword …. >
– Note, not XML!
04/21/25 Department of CSE EID 302 & Web Technologies 5
Declaring Elements
• General syntax
– <!ELEMENT element-name content-description)>
– Content description specifies what tags may appear inside the named
element and whether there may be any plain text in the content
• Sequence of tags
• Alternate tags
• Multiplicity
– + : one or more number of occurrences
– * : zero or more occurrences
– ? : zero or one occurrence
• #PCDATA
04/21/25 Department of CSE EID 302 & Web Technologies 6
Declaring Attributes
• General syntax
– <!ATTLIST element-name
(attribute-name attribute-type default-value?)+ >
• Default values
– A value
– #FIXED value
– #REQUIRED
– #IMPLIED (default, if not specified)
Department of CSE EID
04/21/25 7
302 & Web Technologies
Declaring Entities
• General Syntax
– <!ENTITY [%] entity-name “entity-value”>
– With %: a parameter entity
– Without %: a general entity
• Parameter entities may only be referenced in the DTD
• Remote form
– <!ENTITY entity-name SYSTEM “file-location”>
– The replacement for the entity is the content of the file
04/21/25 Department of CSE EID 302 & Web Technologies 8
7.4 Sample DTD
Department of CSE EID
04/21/25 9
302 & Web Technologies
Internal and External DTDs
• A document type declaration can either contain
declarations directly or can refer to another file
• Internal
– <!DOCTYPE root-element [
declarations
]>
• External file
– <!DOCTYPE root-name SYSTEM “file-name”>
• A public identifier can also be specified, that
would be mapped to a system identifier by the
processing system
Department of CSE EID
04/21/25 10
302 & Web Technologies
Namespaces
• “XML namespaces provide a simple method for qualifying element
and attribute names used in Extensible Markup Language
documents by associating them with namespaces identified by URI
references.”
– From the specification
http://www.w3.org/TR/2006/REC-xml-names-20060816/
• A namespace can be declared for an element and its descendants
by
– <element xmlns[:prefix]=“URI”>
– The prefix is used to qualify elements that belong to the namespace
– Multiple namespaces can be used in a single document
– Default namespace
• DTDs do not support namespaces very well
Department of CSE EID
04/21/25 11
302 & Web Technologies
7.6 XML Schemas
• Schema is a generic term for any description
of an XML content model
• DTDs have several deficits
– They do not use XML syntax
– They do not support namespaces
– Data types cannot be strictly specified
• Example date vs. string
Department of CSE EID
04/21/25 12
302 & Web Technologies
7.6 Schema Fundamentals
• Documents that conform to a schema’s rules are
considered instances of that schema
• Schema purposes
– Structure of instances
– Data types of elements and attributes
• XML Schemas support namespaces
– The XML Schema language itself is a set of XML tags
– The application being described is another set of tags
Department of CSE EID
04/21/25 13
302 & Web Technologies
7.6 Defining a Schema
• The root of an XML Schema document is the
schema tag
• Attributes
– xmlns attributes for the schema namespace and for
the namespace being defined
– A targetNamespace attribute declaring the
namespace being defined
– An elementFormDefault attribute with the
value qualified to indicate that all elements defined in
the target namespace must be namespace qualified
(either with a prefix or default) when used
Department of CSE EID
04/21/25 14
302 & Web Technologies
7.6 Defining a Schema Instance
• The xmlns attribute declares a namespace for an
element and its descendants
– <element xmlns[:prefix]=“URI”>
– The element itself may not be in the namespace
– Multiple elements may be defined
• The
http://www.w3.org/2001/XMLSchema-instance
namespace includes one attribute,
schemaLocation
– That attribute value is pairs, separated by spaces
– Each pair consists of a namespace and the location of
a file that defines that namespace
Department of CSE EID
04/21/25 15
302 & Web Technologies
7.6 An Overview of Data Types
• Data types are of two kinds
– Simple data types with string content
– Complex data types with elements, attributes and string
content
• Predefined types
– Primitive
– Derived
• Restrictions
– Facets
• Anonymous and named types
Department of CSE EID
04/21/25 16
302 & Web Technologies
7.6 Simple Types
• Named types can be used to give the type of
– an attribute (which must be simple) or
– an element (which may be simple or complex)
• Elements or attributes with simple type may
have default values specified
• New simple types can be defined by restriction
of base types
– Facet maxLength
– Facet precision
Department of CSE EID
04/21/25 17
302 & Web Technologies
7.6 Complex Types
• Definition of a complex type can specify
– Elements in content (either sequence or choice)
• Individual elements may specify a multiplicity
– Attributes that can appear for an element of that type
– Whether plain text is allowed in the content, a mixed
type
• An element definition can be associated with a
type by
– Referring to a named type directly in the type attribute
– Including an anonymous type definition
Department of CSE EID
04/21/25 18
302 & Web Technologies
7.6 Validating Instances of Schemas
• Various systems for validating instances
against schemas
– Online http://www.w3.org/2001/03/webdata/xsv
– XML support libraries include validation: Xerces
from Apache, Saxon, Altova XML tools
– Some IDE’s have automatic validation: Altova Spy,
Eclipse with Oxygen, Eclipse with XML Buddy Pro
• Certain IDE’s will use schemas to provide
support for XML file creation
Department of CSE EID
04/21/25 19
302 & Web Technologies
7.7 Displaying Raw XML Documents
• Plain XML documents are generally displayed
literally by browsers
– Firefox notes that there is no style information
Department of CSE EID
04/21/25 20
302 & Web Technologies
7.8 Displaying XML Documents with CSS
• An xml-stylesheet processing instruction can
be used to associate a general XML document
with a style sheet
– <?xml-stylesheet type=“text/css”
href=“planes.css”>
• The style sheet selectors will specify tags that
appear in a particular document
Department of CSE EID
04/21/25 21
302 & Web Technologies
7.9 XSLT Style Sheets
• A family of specifications for transforming XML
documents
– XSLT: specifies how to transform documents
– XPath: specifies how to select parts of a document and
compute values
– XSL-FO: specifies a target XML language describing the
printed page
• XSLT describes how to transform XML documents into
other XML documents such as XHTML
– XSLT can be used to transform to non-XML documents as
well
Department of CSE EID
04/21/25 22
302 & Web Technologies
7.9 Overview of XSLT
• A functional style programming language
• Basic syntax is XML
– There is some similarity to LISP and Scheme
• An XSLT processor takes an XML document as
input and produces output based on the
specifications of an XSLT document
Department of CSE EID
04/21/25 23
302 & Web Technologies
7.9 XSLT Processing
XSLT
Document
XSLT XSL
Processor Document
XML
Document
Department of CSE EID
04/21/25 24
302 & Web Technologies
7.9 XSLT Structure
• An XSLT document contains templates
• XPath is used to specify patterns of elements to which the
templates should apply
• The content of a template specifies how the matched element
should be processed
• The XSLT processor will look for parts of the input document
that match a template and apply the content of the template
when a match is found
• Two models
– Template-driven works with highly regular data
– Data-driven works with more loosely structured data with a recursive
structure (like XHTML documents)
Department of CSE EID
04/21/25 25
302 & Web Technologies
7.9 XSL Transformations for Presentation
• One of the most common applications of XSLT is to
transform an XML document into an XHTML
document for display
• A XSLT style sheet can be associated with an XML
document by using a processor instruction
• <?xml-stylesheet type=“text/xsl” href=“stylesheet-
ref”?>
• The example xslplane.xml is an xml file with data
about a single plane
– The file is linkded to the stylesheet xslplane.xsl
Department of CSE EID
04/21/25 26
302 & Web Technologies
7.9 XSLT Organization
• Root element stylesheet
– Specifies namespaces for XSL and for non-XSLT elements
included in the stylesheet
<xsl:stylesheet xmlns:xsl =
"http://www.w3.org/1999/XSL/Format"
xmlns =
"http://www.w3.org/1999/xhtml">
• Elements in XSLT itself will have the prefix xsl:
• Elements from XHTML will have no prefix (default
namespace)
Department of CSE EID
04/21/25 27
302 & Web Technologies
8.9 XSLT Templates
• There must be at least one template element in an style
sheet
• The value of the match attribute is an XPath expression
which specifies to which nodes the template applies
• Two standard choices for the match expression of the
first template
– ‘/’ to match the root node of the entire document structure
– ‘root-tag’ to match the root element of the document
• The first template is applied automatically
• All other templates are applied only in response to
apply-template elements
Department of CSE EID
04/21/25 28
302 & Web Technologies
7.9 XPath Basics and Node Selection
• An XPath expression beginning with a / specifies nodes in
an absolute position relative to the document root node
• Otherwise, the expression specifies nodes relative to the
current node, that is the node being processed before
the matched node
• The expression ‘.’ refers to the current node
• The apply-templates tag uses the select attribute to
choose which nodes should be matched to templates
• There is a default template applied if one is not provided
that matches a selected node
Department of CSE EID
04/21/25 29
302 & Web Technologies
7.9 Producing Transformation Output
• Elements not belonging to XSLT and other text will be copied
to the output when the containing template is applied
• The value-of tag causes the select attribute value to be
evaluated and the result is put into the output
– The value of an element is the text contained in it and in sub-
elements
– The value of an attribute is the value
• Example xslplane1.xsl transforms the xslplane.xml file into
XHTML for display purposes
– If the style sheet is in the same directory as the XML file, some
browsers will pick up the transformation and apply it
– This works with Firefox and Internet Explorer but not Opera
Department of CSE EID
04/21/25 30
302 & Web Technologies
7.9 Processing Repeated Elements
• File xslplanes.xml contains data about multiple
airplanes
• The style sheet xslplanes.xsl uses a for-each element
to process each plane element in the source
document
• A sort element could be included to sort output
– The element
<xsl:sort select=“year” data-
type=“number”/>
– Specifies sorting by year
Department of CSE EID
04/21/25 31
302 & Web Technologies
7.10 XML Processors
• XML processors provide tools in programming
languages to read in XML documents,
manipulate them and to write them out
Department of CSE EID
04/21/25 32
302 & Web Technologies
7.10 Purposes of XML Processors
• Four purposes
– Check the basic syntax of the input document
– Replace entities
– Insert default values specified by schemas or DTD’s
– If the parser is able and it is requested, validate the input document against the
specified schemas or DTD’s
• The basic structure of XML is simple and repetitive, so providing library
support is reasonable
• Examples
– Xerces-J from the Apache foundation provides library support for Java
– Command line utilities are provided for checking well-formedness and validity
• Two different standards/models for processing
– SAX
– DOM
Department of CSE EID
04/21/25 33
302 & Web Technologies
7.10 Parsing
• The process of reading in a document and
analyzing its structure is called parsing
• The parser provides as output a structured
view of the input document
Department of CSE EID
04/21/25 34
302 & Web Technologies
7.10 The SAX Approach
• In the SAX approach, an XML document is
read in serially
• As certain conditions, called events, are
recognized, event handlers are called
• The program using this approach only sees
part of the document at a time
Department of CSE EID
04/21/25 35
302 & Web Technologies
7.10 The DOM Approach
• In the DOM approach, the parser produces an in-memory
representation of the input document
– Because of the well-formedness rules of XML, the structure is a tree
• Advantages over SAX
– Parts of the document can be accessed more than once
– The document can be restructured
– Access can be made to any part of the document at any time
– Processing is delayed until the entire document is checked for
proper structure and, perhaps, validity
• One major disadvantage is that a very large document may
not fit in memory entirely
Department of CSE EID
04/21/25 36
302 & Web Technologies
7.11 Web Services
• Allow interoperation of software components
on different systems written in different
languages
• Servers that provide software services rather
than documents
• Remote Procedure Call
– DCOM and CORBA provide impllementations
– DCOM is Microsoft specific
– CORBA is cross-platrom
Department of CSE EID
04/21/25 37
302 & Web Technologies
7.11 Web Service Protocols
• Three roles in web services
– Service providers
– Service requestors
– Service registry
• The Web Services Definition Language provides a
standard way to describe services
• The Universal Description, Discovery and Integration
service provides a standard way to provide information
about services in response to a query
• SOAP is used to specify requests and responses
Department of CSE EID
04/21/25 38
302 & Web Technologies