Xerces C
Xerces C
Table of Contents
1. Xerces C++ Parser 6
Xerces-C++ Version 1.7.0 6
Applications of the Xerces Parser 6
Features 6
Platforms with Binaries 6
Other ports... 7
2. Installation 8
Windows NT/2000 8
UNIX 8
3. Build Instructions 10
Building on Windows and UNIX 10
Building on Other Platforms 10
Other Build Instructions 10
4. Building on Windows and UNIX 11
Building Xerces-C++ on Windows NT/2000 11
Building Xerces-C++ on Windows using Visual Age C++ 11
Building Xerces-C++ on Windows using Borland C++Builder 12
Building Xerces-C++ on UNIX platforms 12
Building Xerces-C++ as a single-threaded library on Unix platforms 17
5. Building on Other Platforms 19
Building Xerces-C++ on iSeries (AS/400) 19
Building Xerces-C++ on OS/2 using Visual Age C++ 24
Building Xerces-C++ on Macintosh 25
6. Other Build Instructions 29
Building Xerces-C++ with ICU using bundled Perl scripts on Windows 29
Building Xerces-C++ COM Wrapper on Windows 30
Building User Documentation 30
I wish to port Xerces to my favourite platform. Do you have any suggestions? 31
What should I define XMLCh to be? 32
Where can I look for more help? 32
7. API Documentation 33
API Docs for SAX and DOM 33
8. Xerces-C++ Samples 34
Introduction 34
Building the Samples 34
Running the Samples 34
9. Xerces-C++ Sample 1: SAXCount 36
SAXCount 36
10. Xerces-C++ Sample 2: SAXPrint 38
SAXPrint 38
11. Xerces-C++ Sample 3: DOMCount 41
Xerces-C++ Documentation
DOMCount 41
12. Xerces-C++ Sample 4: DOMPrint 43
DOMPrint 43
13. Xerces-C++ Sample 5: MemParse 46
MemParse 46
14. Xerces-C++ Sample 6: Redirect 48
Redirect 48
15. Xerces-C++ Sample 7: PParse 49
PParse 49
16. Xerces-C++ Sample 8: StdInParse 51
StdInParse 51
17. Xerces-C++ Sample 9: EnumVal 53
EnumVal 53
18. Xerces-C++ Sample 10: CreateDOMDocument 55
CreateDOMDocument 55
19. Xerces-C++ Sample 11: SAX2Count 56
SAX2Count 56
20. Xerces-C++ Sample 12: SAX2Print 58
SAX2Print 58
21. Xerces-C++ Sample 13: IDOMCount 61
IDOMCount 61
22. Xerces-C++ Sample 14: IDOMPrint 63
IDOMPrint 63
23. Xerces-C++ Sample 9: SEnumVal 66
SEnumVal 66
24. Schema 69
Introduction 69
Limitations 69
Interpretation of Areas that are Unclear or Implementation-Dependent 69
Usage 69
Associating Schema Grammar with instance document 70
25. FAQs 72
Distributing Xerces-C++ 72
Building / Running FAQs 74
Programming/Parsing FAQs 77
Other Xerces-C++ Questions 85
26. Programming Guide 87
SAX Programming Guide 87
SAX2 Programming Guide 87
DOM Programming Guide 87
Experimental IDOM Programming Guide 87
Xerces-C++ Documentation
Features
· Conforms to XML Spec 1.0 [2]
· Tracking of latest DOM (Level 1.0) [3] , DOM (Level 2.0) [4] , SAX/SAX2 [6] , Namespace [7] , and
W3C's XML Schema recommendation version 1.0 [8] specifications.
· Source code, samples, and documentation is provided.
· Programmatic generation and validation of XML
· Pluggable catalogs, validators and encodings
· High performance
· Customizable error handling
- 6-
Chapter 1 - Xerces C++ Parser Xerces-C++ Documentation
Other ports...
· OS/390
· AS/400
· FreeBSD
· SGI IRIX
· Macintosh
· OS/2
· PTX
· UnixWare
· and more!
- 7-
2
Installation
Windows NT/2000
Install the binary Xerces-C++ release by using unzip on the file-win32.zip archive in the Windows
environment. You can use WinZip, or any other UnZip utility.
unzip xerces-c1_7_0-win32.zip
UNIX
Binary installation of this release is to extract the files from the compressed .tar archive (using 'tar').
cd $HOME
gunzip xerces-c1_7_0-linux.tar.gz
tar -xvf xerces-c1_7_0-linux.tar
This will create an 'xerces-c1_7_0-linux' sub-directory (in the home directory) which contains the
Xerces-C++ distribution. You will need to add the xerces-c1_7_0-linux/bin directory to your PATH
environment variable:
Note: On Solaris, you may need to use gtar instead of tar. See FAQ for more
information.
For Bourne Shell, K Shell or Bash, type:
export PATH="$PATH:$HOME/xerces-c1_7_0-linux/bin"
If you wish to make this setting permanent, you need to change your profile by changing your setup files
which can be either .profile or .kshrc.
- 8-
Chapter 2 - Installation Xerces-C++ Documentation
In addition, you will also need to set the environment variables XERCESCROOT, ICUROOT and the
library search path. (LIBPATH on AIX, LD_LIBRARY_PATH on Solaris and Linux, SHLIB_PATH on
HP-UX).
Note: XERCESCROOT and ICUROOT are needed only if you intend to recompile the
samples or build your own applications. The library path is necessary to link the shared
libraries at runtime.
For Bourne Shell, K Shell or Bash, type:
export XERCESCROOT=<wherever you installed Xerces-C++>
export ICUROOT=<wherever you installed ICU>
export LIBPATH=$XERCESCROOT/lib:$LIBPATH (on AIX)
export LD_LIBRARY_PATH=$XERCESCROOT/lib:$LD_LIBRARY_PATH (on Solaris, Linux)
export SHLIB_PATH=$XERCESCROOT/lib:$SHLIB_PATH (on HP-UX)
Note: If you need to build the samples after installation, make sure you read and follow
the Build Instructions.
- 9-
3
Build Instructions
- 10-
4
Building on Windows and UNIX
Once you are inside MSVC, you need to build the project marked XercesLib.
If you want to include the Xerces-C++ project separately, you need to pick up:
xerces-c-src1_7_0\Projects\Win32\VC6\xerces-all\XercesLib\XercesLib.dsp
You must make sure that you are linking your application with the xerces-c_1.lib library and also make
sure that the associated DLL is somewhere in your path.
Note: If you are working on the AlphaWorks version which uses ICU, you must have the
ICU data DLL named icudata.dll available from your path setting. For finding out
where you can get ICU from and build it, look at the How to Build ICU.
Building samples
If you are using the source package, inside the same workspace (xerces-all.dsw), you'll find several other
projects. These are for the samples. Select all the samples and right click on the selection. Then choose
"Build (selection only)" to build all the samples in one shot.
If you are using the binary package, load the
xerces-c1_7_0-win32\samples\Projects\Win32\VC6\samples.dsw Microsoft Visual C++ workspace
inside your MSVC IDE. Then select all the samples and right click on the selection. Then choose "Build
(selection only)" to build all the samples in one shot.
- 11-
Chapter 4 - Building on Windows and UNIX Xerces-C++ Documentation
- 12-
Chapter 4 - Building on Windows and UNIX Xerces-C++ Documentation
This should be the full path of the directory where you extracted Xerces-C++.
This generates a shell-script called configure. It is tempting to run this script directly as is normally
the case, but wait a minute. If you are using the default compilers like gcc [16] and g++ [16] you do not
have a problem. But if you are not on the standard GNU compilers, you need to export a few more
environment variables before you can invoke configure.
- 13-
Chapter 4 - Building on Windows and UNIX Xerces-C++ Documentation
Rather than make you to figure out what strange environment variables you need to use, we have
provided you with a wrapper script that does the job for you. All you need to tell the script is what your
compiler is, and what options you are going to use inside your build, and the script does everything for
you. Here is what the script takes as input:
- 14-
Chapter 4 - Building on Windows and UNIX Xerces-C++ Documentation
- 15-
Chapter 4 - Building on Windows and UNIX Xerces-C++ Documentation
creating util/NetAccessors/MacOSURLAccess/Makefile
creating util/regx/Makefile
creating validators/Makefile
creating validators/common/Makefile
creating validators/datatype/Makefile
creating validators/DTD/Makefile
creating validators/schema/Makefile
creating framework/Makefile
creating dom/Makefile
creating idom/Makefile
creating parsers/Makefile
creating internal/Makefile
creating sax/Makefile
creating sax2/Makefile
creating ../obj/Makefile
In future, you may also directly type the following commands to create the
Makefiles.
export TRANSCODER="NATIVE"
export MESSAGELOADER="INMEM"
export NETACCESSOR="Socket"
export THREADS="pthread"
export CC="gcc"
export CXX="g++"
export CXXFLAGS=" -O -DXML_USE_NATIVE_TRANSCODER -DXML_USE_INMEM_MESSAGELOADER
-DXML_USE_PTHREADS -DXML_USE_NETACCESSOR_SOCKET"
export CFLAGS=" -O -DXML_USE_NATIVE_TRANSCODER -DXML_USE_INMEM_MESSAGELOADER
-DXML_USE_PTHREADS -DXML_USE_NETACCESSOR_SOCKET"
export LDFLAGS=""
export LIBS=" -lpthread "
configure
Note: The error message concerning conf.h is NOT an indication of a problem. This
code has been inserted to make it work on AS/400, but it gives this message which
appears to be an error. The problem will be fixed in future.
So now you see what the wrapper script has actually been doing! It has invoked configure to create
the Makefiles in the individual sub-directories, but in addition to that, it has set a few environment
variables to correctly configure your compiler and compiler flags too.
Now that the Makefiles are all created, you are ready to do the actual build.
gmake
- 16-
Chapter 4 - Building on Windows and UNIX Xerces-C++ Documentation
Building samples
The installation process for the samples is same on all UNIX platforms.
cd xerces-c1_7_0-linux/samples
./runConfigure -p<platform> -c<C_compiler> -x<C++_compiler>
gmake
This will create the object files in each sample directory and the executables in '
xerces-c1_7_0-linux/bin' directory.
Note that runConfigure is just a helper script and you are free to use ./configure with the correct
parameters to make it work on any platform-compiler combination of your choice. The script needs the
following parameters:
Note: NOTE:The code samples in this section assume that you are working on the
Linux binary drop. If you are using some other UNIX flavor, please replace '-linux' with
the appropriate platform name in the code samples.
To delete all the generated object files and executables, type:
gmake clean
- 17-
Chapter 4 - Building on Windows and UNIX Xerces-C++ Documentation
- 18-
5
Building on Other Platforms
- 19-
Chapter 5 - Building on Other Platforms Xerces-C++ Documentation
CL commands):
XERCESCROOT - <the full path up to the Xerces-C++ src directory, but not
including 'src'>
MAKE - '/usr/bin/gmake'
OUTPUTDIR - <identifies target iSeries library for *module, *pgm and *srvpgm
objects>
ICUROOT - (optional if using ICU) <the path of your ICU includes>
· For v4r5m0 systems, add QCXXN, to your build process library list. This results in the resolution of
CRTCPPMOD used by the icc compiler.
You may want to put the environment variables and library list setup instructions in a CL program so you
will not forget these steps during your build.
Configure
To configure the make files for an iSeries build do the following under Qsh:
qsh:
cd <full path to Xerces-C++>/src/xercesc
runConfigure -p os400 -x icc -c icc -m inmem -t Iconv400
Troubleshooting:
error: configure: error: installation or configuration problem:
C compiler cannot create executables.
If during runConfigure you see the above error message, it can mean one of a few things. Either QCXXN
is not on your library list OR the runConfigure cannot create the temporary modules (CONFTest1,
etc) it uses to test out the compiler options or PASE is not installed. The second reason happens because
the test modules already exist from a previous run of runConfigure. To correct the problem, do the
following:
CL:
DLTMOD <OUTPUTDIR library>/CONFT* and
DLTPGM <OUTPUTDIR library>/CONFT*
Build
qsh:
cd <full path to Xerces-C++>/src/xercesc
gmake
The above gmake should result in a service program being created in your specified library and a
symbolic link to that service program placed in <path to Xerces-C++/lib >. It is highly possible that the
service program will not create however due to number of modules and path names, see trouble shooting
for the workaround.
After the service program has successfully been created and a link established, you can either bind your
XML application programs directly to the parser's service program via the BNDSRVPGM option on the
CRTPGM or CRTSRVPGM command or you can specify a binding directory on your icc command. To
specify an archive file to bind to, use the -L, -l binding options on icc. An archive file on iSeries is a
binding directory. To create an archive file, use qar command. (see the iSeries Tools for Developers
write up).
After building the Xerces-C service program, create a binding directory by doing the following (note,
- 20-
Chapter 5 - Building on Other Platforms Xerces-C++ Documentation
this binding directory is used when building the samples. Also, note that the .a file below can have a
different name based on the parser version (using apache xerces versioning)):
qsh:
cd <full path to Xerces-C++>/lib
qar -cuv libxerces-c1_7_0.a *.o
will results in
command = CRTBNDDIR BNDDIR(yourlib/libxercesc)
TEXT('/yourlib/Xerces-C++/lib/libxerces-c1_7_0.a')
command = ADDBNDDIRE BNDDIR(yourlib/libxercesc) OBJ((yourlib/LIBXERCESC *SRVPGM)
)
If this is the case, you can manually create the service program by doing the following:
CL:
CRTSRVPGM (<OUTPUTDIR-library>/libxercesc) MODULE(<OUTPUTDIR-library>/*ALL)
EXPORT(*ALL) TEXT('XML4C parser version xxx')
OPTION(*DUPPROC *DUPVAR)
Note that if you manually create the service program you want to make sure that you do not include any
CONFT* modules or samples modules in the OUTPUTDIR library. After the service program is
manually created you can add a symbolic link to the service program into the appropriate /lib directory by
qsh:
qsh:
cd <full path to Xerces-C++>/lib
ln -s /qsys.lib/<outputdir>.lib/libxercesc.srvpgm libxerces-c1_7_0.o
qar -cuv libxerces-c1_7_0.a *.o
If you are on a v4 system using the ILE C++ PRPQ compiler (which is referred to as the 'old' compiler)
you will get compiler errors requiring a few manual changes to the source:
· src/xercesc/dom/DocumentImpl.cpp
· src/xercesc/dom/DocumentImpl.hpp
· src/xercesc/idom/IDDocumentImpl.cpp
· src/xercesc/idom/IDDocumentImpl.hpp
· src/xercesc/validators/common/ContentSpecNode.hpp
Update the following routines in src/xercesc/dom/DocumentImpl.cpp as follows:
- 21-
Chapter 5 - Building on Other Platforms Xerces-C++ Documentation
void* DocumentImpl::getUserData(NodeImpl* n)
{
if (userData)
#ifdef __OS400__
return (void*)userData->get((void*)n);
#else
return userData->get((void*)n);
#endif
else
return null;
}
#ifdef __OS400__
RefHashTableOf<char> *userData;
#else
RefHashTableOf<void> *userData;
#endif
- 22-
Chapter 5 - Building on Other Platforms Xerces-C++ Documentation
else
return 0;
}
To update src/xercesc/idom/IDDocumentImpl.hpp:
#ifdef __OS400__
RefHashTableOf<char> *fUserData;
#else
RefHashTableOf<void> *fUserData;
#endif
#ifndef __OS400__
inline
#endif
ContentSpecNode::~ContentSpecNode()
- 23-
Chapter 5 - Building on Other Platforms Xerces-C++ Documentation
Note that the samples will create programs bind to the BND directory object created by qar referenced
above.
qsh
cd <full path to Xerces-C++>/samples
runConfigure -p os400 -x icc -c icc
gmake
8. You should now have a xerces-c.dll and xerces-c.lib. The library file is an import
library for the DLL.)
Packaging the Binaries
There is an Object Rexx program that will package the binaries and headers. (See step 1 of the "From
scratch" method on how to switch to Object Rexx.) The packageBinaries.cmd file is in the
xerces-c-src1_7_0\Projects\OS2\VACPP40 directory. Run packageBinaries, giving
the source and target directories like this:
packageBinaries -s x:\xerces-c-src1_7_0 -o x:\temp\xerces-c1_7_0-os2
(Match the source directory to your system; the target directory can be anything you want.)
Note: If you don't want to use the Object Rexx program, you'll need to manually copy
the "*.hpp" and "*.c" files to an include directory. (Be sure to maintain the same directory
structure that you find under xerces-c-src1_7_0.)
Building Samples
Building the Xerces-C++ samples using IBM Visual Age C++ Professional 4.0 for OS/2 (VAC++).
· In the XercesCSrcInstallDir;\samples\Projects\OS2\VACPP40 directory, find and
edit the VAC++ configuration file basedir.icc.
· All of the directories used to build the samples are defined in basedir.icc. You need to edit the
directories to match your system. Here are the directories you need to assign: SRC_DIR --
XercesCSrcInstallDir; This is where VAC++ should look to find the samples directories
containing the source files. BASE_DIR -- The install directory XercesCSrcInstallDir;.
VAC++ will store the compiled samples in the bin directory under BASE_DIR. It will also look for
the xerces-c.lib file in the lib directory under BASE_DIR. Other directories are set based on
these two. You can choose to override them if you wish.
· Save basedir.icc
· Start the Command Line in the VAC++ folder.
· Navigate to the XercesCSrcInstallDir;\samples\Projects\OS2\VACPP40 directory.
· Run bldsamples.cmd
· When build.cmd finishes, review the file compiler.errors. This file should contain only
informational messages, almost all complaining about constant values in comparisons.
· You should now have several executable files.
Rebuilding the Configuration Files
Although it shouldn't be necessary, if you want to rebuild the VAC++ configuration files, you'll need to
have Object Rexx running on your system:
· If you are not currently running Object Rexx, run the SWITCHRX command from a command line,
answer "yes" to switching to Object Rexx, and follow the instructions to reboot. (Note: You can
switch back to "Classic Rexx" by running SWITCHRX again. But you probably won't need to switch
back since Object Rexx runs almost 100% of Classic Rexx programs.)
· In the Projects\OS2\VACPP40 directory, run genICC.cmd. This builds the VAC++ configuration files
for the samples you have on your system.
· Go to the first step above in the "Building samples for OS/2" section.
- 25-
Chapter 5 - Building on Other Platforms Xerces-C++ Documentation
2. Has a Mac OS native transcoder that utilizes the built-in Mac OS Unicode converter
[MacOSUnicodeConverter].
3. Has two Mac OS native netaccessor classes. The first is based on Carbon and and classic supported
URLAccess and may be used in the broadest variety of configurations [MacOSURLAccess]. The
second [MacOSURLAccessCF] is based on CFURLAccess, which requires either Carbon or Mac
OS X CoreServices.framework. This second NetAccessor is useful in Mac OS X configurations
where reliance on the full Carbon.framework would prohibit execution of the Xerces code in a
remote context that has no access to the GUI.
4. Supports builds from Metroworks CodeWarrior, Apple Project Builder, and Mac OS X shell.
· The Xerces-C++ sample programs are written to assume a command line interface. To avoid making
Macintosh-specific changes to these command line programs, we have opted to instead require that
you make a small extension to your CodeWarrior runtime that supports such command line programs.
Please read and follow the usage notes in XercesSampleSupport/XercesSampleStartupFragment.c.
Building Xerces-C++ with Project Builder
Projects are included to build the Xerces-C++ library and DOMPrint sample under Apple's Project
Builder for Mac OS X. The following notes apply:
· Since you are running under Mac OS X, and if you are not also performing CodeWarrior builds, it is
not necessary to shorten file names or set the type/creator codes as required for CodeWarrior.
· The Project Builder project builds XercesLib as the framework Xerces.framework. This framework,
however, does not currently include a correct set of public headers. Any referencing code must have
an include path directive that points into the Xerces-C++ src directory.
· The DOMPrint project illustrates one such usage of the Xerces.framework.
- 27-
Chapter 5 - Building on Other Platforms Xerces-C++ Documentation
entities. If URLAccess is not installed, any such references will fail; the absence of URLAccess,
however, will not in itself prevent Xerces-C++ from running. If Xerces-C++ is configured to use
MacOSURLAccessCF, then URLAccess (and thus Carbon) is not required, but
CoreServices.framework is required for Mac OS X.
· Multiprocessing library. Provides mutual exclusion support. Once again, the routines will back down
gracefully if Multiprocessing support is not available.
· HFS+ APIs. If HFS+ APIs are available, all file access is performed using the HFS+ fork APIs to
support long file access, and to support long unicode compliant file names. In the absence of HFS+
APIs, classic HFS APIs are used instead.
- 28-
6
Other Build Instructions
(Match the source directory to your system; the target directory can be anything you want.)
If everything is setup right and works right, then you should see a binary drop created in the target
directory specified above. This script will build both ICU and Xerces-C++, copy the files (relevant to the
- 29-
Chapter 6 - Other Build Instructions Xerces-C++ Documentation
- 30-
Chapter 6 - Other Build Instructions Xerces-C++ Documentation
The user documentation (this very page that you are reading on the browser right now), was generated
using an XML application called StyleBook. This application makes use of Xerces-J and Xalan to create
the HTML file from the XML source files. The XML source files for the documentation are part of the
Xerces-C++ module. These files reside in the doc directory.
Pre-requisites for building the user documentation are:
· JDK 1.2.2 (or later).
· Xerces-J 1.0.1.bundled
· Xalan-J 0.19.2.bundled
· Stylebook 1.0-b2. bundled
· The Apache Style files (dtd's and .xsl files).bundled
Invoke a command window and setup PATH to include the JDK 1.2.2 bin directory
Next, cd to the Xerces-C++ source drop root directory, and enter
· Under Windows:
createDocs
· Under Unix's:
sh createDocs.bat
This should generate the .html files in the 'doc/html' directory.
- 31-
Chapter 6 - Other Build Instructions Xerces-C++ Documentation
4. Once this is done, you will then need to implement a version of the platform utilities for your
platform. Each operating system has a file which implements some methods of the
XMLPlatformUtils class, specific to that operating system. These are not terribly complex, so it
should not be a lot of work. The Win32 version is called Win32PlatformUtils.cpp, the AIX
version is AIXPlatformUtils.cpp and so on. Create one for your platform, with the correct
name, and empty out all of the implementation so that just the empty shells of the methods are there
(with dummy returns where needed to make the compiler happy.) Once you've done that, you can
start to get it to build without any real implementation.
5. Once you have the system building, then start implementing your own platform utilities methods.
Follow the comments in the Win32 version as to what they do, the comments will be improved in
subsequent versions, but they should be fairly obvious now. Once you have these implementations
done, you should be able to start debugging the system using the demo programs.
Other concerns are:
· Does ICU compile on your platform? If not, then you'll need to create a transcoder implementation
that uses your local transcoding services. The Iconv transcoder should work for you, though perhaps
with some modifications.
· What message loader will you use? To get started, you can use the "in memory" one, which is very
simple and easy. Then, once you get going, you may want to adapt the message catalog message
loader, or write one of your own that uses local services.
That is the work required in a nutshell!
- 32-
7
API Documentation
- 33-
8
Xerces-C++ Samples
Introduction
Xerces-C++ comes packaged with 15 sample applications that demonstrate salient features of the parser
using simple applications written on top of the SAX and DOM APIs provided by the parser. Sample
XML data files are provided in the samples/data directory.
Once you have set up your PATH variable, you can run the samples by opening a command window (or
your shell prompt for UNIX environments).
Xerces-C++ Samples
· SAXCount
SAXCount counts the elements, attributes, spaces and characters in an XML file.
· SAXPrint
SAXPrint parses an XML file and prints it out.
· DOMCount
DOMCount counts the elements in a XML file.
· DOMPrint
DOMPrint parses an XML file and prints it out.
· MemParse
MemParse parses XML in a memory buffer, outputing the number of elements and attributes.
· Redirect
- 34-
Chapter 8 - Xerces-C++ Samples Xerces-C++ Documentation
- 35-
9
Xerces-C++ Sample 1: SAXCount
SAXCount
SAXCount is the simplest application that counts the elements and characters of a given XML file using
the (event based) SAX API.
Running SAXCount
The SAXCount sample parses an XML file and prints out a count of the number of elements in the file.
To run SAXCount, enter the following
SAXCount <XML File>
Usage:
SAXCount [options] <XML file | List file>
This program invokes the SAX Parser, and then prints the
number of elements, attributes, spaces and characters found
in each XML file, using SAX API.
Options:
-l Indicate the input file is a List File that has a list of xml
files.
Default to off (Input file is an XML file).
-v=xxx Validation scheme [always | never | auto*].
-n Enable namespace processing. Defaults to off.
-s Enable schema processing. Defaults to off.
-f Enable full schema constraint checking. Defaults to off.
-? Show this help.
- 36-
Chapter 9 - Xerces-C++ Sample 1: SAXCount Xerces-C++ Documentation
cd xerces-c1_7_0-linux/samples/data
SAXCount -v=always personal.xml
personal.xml: 60 ms (37 elems, 12 attrs, 134 spaces, 134 chars)
Running SAXCount with the validating parser gives a different result because ignorable white-space is
counted separately from regular characters.
SAXCount -v=never personal.xml
personal.xml: 10 ms (37 elems, 12 attrs, 0 spaces, 268 chars)
Note that the sum of spaces and characters in both versions is the same.
Note: The time reported by the program may be different depending on your machine
processor.
- 37-
10
Xerces-C++ Sample 2: SAXPrint
SAXPrint
SAXPrint uses the SAX APIs to parse an XML file and print it back. Do note that the output of this
sample is not exactly the same as the input (in terms of whitespaces, first line), but the output has the
same information content as the input.
Running SAXPrint
The SAXPrint sample parses an XML file and prints out the contents again in XML (some loss occurs).
To run SAXPrint, enter the following
SAXPrint <XML file>
Usage:
SAXPrint [options] <XML file>
This program invokes the SAX Parser, and then prints the
data returned by the various SAX handlers for the specified
XML file.
Options:
-u=xxx Handle unrepresentable chars [fail | rep | ref*].
-v=xxx Validation scheme [always | never | auto*].
-n Enable namespace processing.
-s Enable schema processing.
-f Enable full schema constraint checking.
-x=XXX Use a particular encoding for output (LATIN1*).
-? Show this help.
- 38-
Chapter 10 - Xerces-C++ Sample 2: SAXPrint Xerces-C++ Documentation
<person id="Big.Boss">
<name><family>Boss</family> <given>Big</given></name>
<email>[email protected]</email>
<link subordinates="one.worker two.worker three.worker
four.worker five.worker"></link>
</person>
<person id="one.worker">
<name><family>Worker</family> <given>One</given></name>
<email>[email protected]</email>
<link manager="Big.Boss"></link>
</person>
<person id="two.worker">
<name><family>Worker</family> <given>Two</given></name>
<email>[email protected]</email>
<link manager="Big.Boss"></link>
</person>
<person id="three.worker">
<name><family>Worker</family> <given>Three</given></name>
<email>[email protected]</email>
<link manager="Big.Boss"></link>
</person>
<person id="four.worker">
<name><family>Worker</family> <given>Four</given></name>
<email>[email protected]</email>
<link manager="Big.Boss"></link>
</person>
<person id="five.worker">
<name><family>Worker</family> <given>Five</given></name>
<email>[email protected]</email>
<link manager="Big.Boss"></link>
</person>
</personnel>
- 39-
Chapter 10 - Xerces-C++ Sample 2: SAXPrint Xerces-C++ Documentation
Note: SAXPrint does not reproduce the original XML file. SAXPrint and DOMPrint
produce different results because of the way the two APIs store data and capture
events.
- 40-
11
Xerces-C++ Sample 3: DOMCount
DOMCount
DOMCount uses the provided DOM API to parse an XML file, constructs the DOM tree and walks
through the tree counting the elements (using just one API call).
Running DOMCount
The DOMCount sample parses an XML file and prints out a count of the number of elements in the file.
To run DOMCount, enter the following
DOMCount <XML file>
Usage:
DOMCount [options] <XML file | List file>
This program invokes the DOM parser, builds the DOM tree,
and then prints the number of elements found in each XML file.
Options:
-l Indicate the input file is a List File that has a list of xml
files.
Default to off (Input file is an XML file).
-v=xxx Validation scheme [always | never | auto*].
-n Enable namespace processing. Defaults to off.
-s Enable schema processing. Defaults to off.
-f Enable full schema constraint checking. Defaults to off.
-? Show this help.
- 41-
Chapter 11 - Xerces-C++ Sample 3: DOMCount Xerces-C++ Documentation
Note: The time reported by the system may be different, depending on your processor
type.
- 42-
12
Xerces-C++ Sample 4: DOMPrint
DOMPrint
DOMPrint parses an XML file, constructs the DOM tree, and walks through the tree printing each
element. It thus dumps the XML back (output same as SAXPrint).
Running DOMPrint
The DOMPrint sample parses an XML file, using either a validating or non-validating DOM parser
configuration, builds a DOM tree, and then walks the tree and outputs the contents of the nodes in a
'canonical' format. To run DOMPrint, enter the following:
DOMPrint <XML file>
This program invokes the DOM parser, and builds the DOM tree.
It then traverses the DOM tree and prints the contents of the
tree for the specified XML file.
Options:
-e create entity reference nodes. Default is no expansion.
-u=xxx Handle unrepresentable chars [fail | rep | ref*].
-v=xxx Validation scheme [always | never | auto*].
-n Enable namespace processing. Default is off.
-s Enable schema processing. Default is off.
-f Enable full schema constraint checking. Defaults to off.
-x=XXX Use a particular encoding for output. Default is
the same encoding as the input XML file. UTF-8 if
input XML file has not XML declaration.
-? Show this help.
- 43-
Chapter 12 - Xerces-C++ Sample 4: DOMPrint Xerces-C++ Documentation
<person id="Big.Boss">
<name><family>Boss</family> <given>Big</given></name>
<email>[email protected]</email>
<link subordinates="one.worker two.worker three.worker
four.worker five.worker"></link>
</person>
<person id="one.worker">
<name><family>Worker</family> <given>One</given></name>
<email>[email protected]</email>
<link manager="Big.Boss"></link>
</person>
<person id="two.worker">
<name><family>Worker</family> <given>Two</given></name>
<email>[email protected]</email>
<link manager="Big.Boss"></link>
</person>
<person id="three.worker">
<name><family>Worker</family> <given>Three</given></name>
<email>[email protected]</email>
<link manager="Big.Boss"></link>
</person>
<person id="four.worker">
<name><family>Worker</family> <given>Four</given></name>
<email>[email protected]</email>
<link manager="Big.Boss"></link>
</person>
<person id="five.worker">
- 44-
Chapter 12 - Xerces-C++ Sample 4: DOMPrint Xerces-C++ Documentation
<name><family>Worker</family> <given>Five</given></name>
<email>[email protected]</email>
<link manager="Big.Boss"></link>
</person>
</personnel>
Note that DOMPrint does not reproduce the original XML file. DOMPrint and SAXPrint produce
different results because of the way the two APIs store data and capture events.
- 45-
13
Xerces-C++ Sample 5: MemParse
MemParse
MemParse uses the Validating SAX Parser to parse a memory buffer containing XML statements, and
reports the number of elements and attributes found.
Running MemParse
This program uses the SAX Parser to parse a memory buffer containing XML statements, and reports the
number of elements and attributes found.
The following parameters may be set from the command line
Usage:
MemParse [options]
Options:
-v=xxx Validation scheme [always | never | auto*].
-n Enable namespace processing. Defaults to off.
-s Enable schema processing. Defaults to off.
-f Enable full schema constraint checking. Defaults to off.
-? Show this help.
<company>
<product>XML4C</product>
<category idea='great'>XML Parsing Tools</category>
<developedAt>
IBM Center for Java Technology, Silicon Valley, Cupertino, CA
</developedAt>
</company>
Running MemParse with the validating parser gives a different result because ignorable white-space is
counted separately from regular characters.
MemParse -v=never
<company>
<product>XML4C</product>
<category idea='great'>XML Parsing Tools</category>
<developedAt>
IBM Center for Java Technology, Silicon Valley, Cupertino, CA
</developedAt>
</company>
Note that the sum of spaces and characters in both versions is the same.
Note: The time reported by the system may be different, depending on your processor
type.
- 47-
14
Xerces-C++ Sample 6: Redirect
Redirect
Redirect uses the SAX EntityResolver handler to redirect the input stream for external entities. It installs
an entity resolver, traps the call to the external DTD file and redirects it to another specific file which
contains the actual DTD.
Running Redirect
This program illustrates how a XML application can use the SAX EntityResolver handler to redirect the
input stream for external entities. It installs an entity resolver, traps the call to the external DTD file and
redirects it to another specific file which contains the actual DTD.
The program then counts and reports the number of elements and attributes in the given XML file.
Redirect <XML file>
External files required to run this sample are 'personal.xml', 'personal.dtd' and 'redirect.dtd', which are all
present in the 'samples/data' directory. Make sure that you run redirect in the samples/data directory.
The 'resolveEntity' callback in this sample looks for an external entity with system id as 'personal.dtd'.
When it is asked to resolve this particular external entity, it creates and returns a new InputSource for the
file 'redirect.dtd'.
A real-world XML application can similarly do application specific processing when encountering
external entities. For example, an application might want to redirect all references to entities outside of its
domain to local cached copies.
Note: The time reported by the program may be different depending on your machine
processor.
- 48-
15
Xerces-C++ Sample 7: PParse
PParse
PParse demonstrates progressive parsing.
In this example, the programmer doesn't have to depend upon throwing an exception to terminate the
parsing operation. Calling parseFirst() will cause the DTD to be parsed (both internal and external
subsets) and any pre-content, i.e. everything up to but not including the root element. Subsequent calls to
parseNext() will cause one more piece of markup to be parsed, and spit out from the core scanning code
to the parser. You can quit the parse any time by just not calling parseNext() anymore and breaking out of
the loop. When you call parseNext() and the end of the root element is the next piece of markup, the
parser will continue on to the end of the file and return false, to let you know that the parse is done.
Running PParse
PParse parses an XML file and prints out a count of the number of elements in the file
Usage:
PParse [options] <XML file>
Options:
-v=xxx - Validation scheme [always | never | auto*].
-n - Enable namespace processing [default is off].
-s - Enable schema processing [default is off].
-f - Enable full schema constraint checking [default is off].
-? - Show this help.
- 49-
Chapter 15 - Xerces-C++ Sample 7: PParse Xerces-C++ Documentation
Running PParse with the validating parser gives a different result because ignorable white-space is
counted separately from regular characters.
PParse -v=never personal.xml
personal.xml: 10 ms (37 elems, 12 attrs, 0 spaces, 268 chars)
Note that the sum of spaces and characters in both versions is the same.
Note: The time reported by the program may be different depending on your machine
processor.
- 50-
16
Xerces-C++ Sample 8: StdInParse
StdInParse
StdInParse demonstrates streaming XML data from standard input.
Running StdInParse
The StdInParse sample parses an XML file from standard input and prints out a count of the number of
elements in the file. To run StdInParse, enter the following:
StdInParse < <XML file>
Usage:
StdInParse [options] < <XML file>
Options:
-v=xxx Validation scheme [always | never | auto*].
-n Enable namespace processing. Defaults to off.
-s Enable schema processing. Defaults to off.
-f Enable full schema constraint checking. Defaults to off.
-? Show this help.
- 51-
Chapter 16 - Xerces-C++ Sample 8: StdInParse Xerces-C++ Documentation
Running StdInParse with the validating parser gives a different result because ignorable white-space is
counted separately from regular characters.
StdInParse -v=never < personal.xml
stdin: 10 ms (37 elems, 12 attrs, 0 spaces, 268 chars)
Note that the sum of spaces and characters in both versions is the same.
Note: The time reported by the program may be different depending on your machine
processor.
- 52-
17
Xerces-C++ Sample 9: EnumVal
EnumVal
EnumVal shows how to enumerate the markup decls in a DTD Grammar.
Running EnumVal
This program parses the specified XML file, then shows how to enumerate the contents of the DTD
Grammar.
Usage:
EnumVal <XML file>
This program parses the specified XML file, then shows how to
enumerate the contents of the DTD Grammar. Essentially,
shows how one can access the DTD information stored in internal
data structures.
ELEMENTS:
----------------------------
Name: personnel
Content Model: (person)+
Name: person
Content Model: (name,email*,url*,link?)
Attributes:
Name:id, Type: ID
Name: name
Content Model: (#PCDATA|family|given)*
Name: email
Content Model: (#PCDATA)*
Name: url
- 53-
Chapter 17 - Xerces-C++ Sample 9: EnumVal Xerces-C++ Documentation
Name: link
Content Model: EMPTY
Attributes:
Name:subordinates, Type: IDREF(S)
Name:manager, Type: IDREF(S)
Name: family
Content Model: (#PCDATA)*
Name: given
Content Model: (#PCDATA)*
- 54-
18
Xerces-C++ Sample 10: CreateDOMDocument
CreateDOMDocument
CreateDOMDocument, illustrates how you can create a DOM tree in memory from scratch. It then reports
the elements in the tree that was just created.
Running CreateDOMDocument
The CreateDOMDocument sample illustrates how you can create a DOM tree in memory from scratch.
To run CreateDOMDocument, enter the following
CreateDOMDocument
- 55-
19
Xerces-C++ Sample 11: SAX2Count
SAX2Count
SAX2Count is the simplest application that counts the elements and characters of a given XML file using
the (event based) SAX2 API.
Running SAX2Count
The SAX2Count sample parses an XML file and prints out a count of the number of elements in the file.
To run SAX2Count, enter the following
SAX2Count <XML File>
Usage:
SAX2Count [options] <XML file | List file>
Options:
-l Indicate the input file is a List File that has a list of xml
files.
Default to off (Input file is an XML file).
-v=xxx Validation scheme [always | never | auto*].
-f Enable full schema constraint checking processing. Defaults to
off.
-n Disable namespace processing. Defaults to on.
NOTE: THIS IS OPPOSITE FROM OTHER SAMPLES.
-s Disable schema processing. Defaults to on.
NOTE: THIS IS OPPOSITE FROM OTHER SAMPLES.
-? Show this help.
document
Here is a sample output from SAX2Count
cd xerces-c1_7_0-linux/samples/data
SAX2Count -v=always personal.xml
personal.xml: 60 ms (37 elems, 12 attrs, 134 spaces, 134 chars)
Running SAX2Count with the validating parser gives a different result because ignorable white-space is
counted separately from regular characters.
SAX2Count -v=never personal.xml
personal.xml: 10 ms (37 elems, 12 attrs, 0 spaces, 268 chars)
Note that the sum of spaces and characters in both versions is the same.
Note: The time reported by the program may be different depending on your machine
processor.
- 57-
20
Xerces-C++ Sample 12: SAX2Print
SAX2Print
SAX2Print uses the SAX2 APIs to parse an XML file and print it back. Do note that the output of this
sample is not exactly the same as the input (in terms of whitespaces, first line), but the output has the
same information content as the input.
Running SAX2Print
The SAX2Print sample parses an XML file and prints out the contents again in XML (some loss occurs).
To run SAX2Print, enter the following
SAX2Print <XML file>
Usage:
SAX2Print [options] <XML file>
Options:
-u=xxx Handle unrepresentable chars [fail | rep | ref*].
-v=xxx Validation scheme [always | never | auto*].
-e Expand Namespace Alias with URI's.
-x=XXX Use a particular encoding for output (LATIN1*).
-f Enable full schema constraint checking processing. Defaults to
off.
-s Disable schema processing. Defaults to on.
NOTE: THIS IS OPPOSITE FROM OTHER SAMPLES.
-? Show this help.
- 58-
Chapter 20 - Xerces-C++ Sample 12: SAX2Print Xerces-C++ Documentation
<person id="Big.Boss">
<name><family>Boss</family> <given>Big</given></name>
<email>[email protected]</email>
<link subordinates="one.worker two.worker three.worker
four.worker five.worker"></link>
</person>
<person id="one.worker">
<name><family>Worker</family> <given>One</given></name>
<email>[email protected]</email>
<link manager="Big.Boss"></link>
</person>
<person id="two.worker">
<name><family>Worker</family> <given>Two</given></name>
<email>[email protected]</email>
<link manager="Big.Boss"></link>
</person>
<person id="three.worker">
<name><family>Worker</family> <given>Three</given></name>
<email>[email protected]</email>
<link manager="Big.Boss"></link>
</person>
<person id="four.worker">
<name><family>Worker</family> <given>Four</given></name>
<email>[email protected]</email>
<link manager="Big.Boss"></link>
</person>
<person id="five.worker">
<name><family>Worker</family> <given>Five</given></name>
<email>[email protected]</email>
<link manager="Big.Boss"></link>
- 59-
Chapter 20 - Xerces-C++ Sample 12: SAX2Print Xerces-C++ Documentation
</person>
</personnel>
Note: SAX2Print does not reproduce the original XML file. SAX2Print and DOMPrint
produce different results because of the way the two APIs store data and capture
events.
- 60-
21
Xerces-C++ Sample 13: IDOMCount
IDOMCount
IDOMCount uses the provided IDOM API to parse an XML file, constructs the DOM tree and walks
through the tree counting the elements (using just one API call).
Running IDOMCount
The IDOMCount sample parses an XML file and prints out a count of the number of elements in the file.
To run IDOMCount, enter the following
IDOMCount <XML file>
Usage:
IDOMCount [options] <XML file | List file>
This program invokes the IDOM parser, builds the DOM tree,
and then prints the number of elements found in each XML file.
Options:
-l Indicate the input file is a List File that has a list of xml
files.
Default to off (Input file is an XML file).
-v=xxx Validation scheme [always | never | auto*].
-n Enable namespace processing. Defaults to off.
-s Enable schema processing. Defaults to off.
-f Enable full schema constraint checking. Defaults to off.
-? Show this help.
- 61-
Chapter 21 - Xerces-C++ Sample 13: IDOMCount Xerces-C++ Documentation
Note: The time reported by the system may be different, depending on your processor
type.
- 62-
22
Xerces-C++ Sample 14: IDOMPrint
IDOMPrint
IDOMPrint parses an XML file, constructs the DOM tree, and walks through the tree printing each
element. It thus dumps the XML back (output same as SAXPrint).
Running IDOMPrint
The IDOMPrint sample parses an XML file, using either a validating or non-validating IDOM parser
configuration, builds a DOM tree, and then walks the tree and outputs the contents of the nodes in a
'canonical' format. To run IDOMPrint, enter the following:
IDOMPrint <XML file>
Usage:
IDOMPrint [options] <XML file>
This program invokes the IDOM parser, and builds the DOM tree.
It then traverses the DOM tree and prints the contents of the
tree for the specified XML file.
Options:
-e create entity reference nodes. Default is no expansion.
-u=xxx Handle unrepresentable chars [fail | rep | ref*].
-v=xxx Validation scheme [always | never | auto*].
-n Enable namespace processing. Default is off.
-s Enable schema processing. Default is off.
-f Enable full schema constraint checking. Defaults is off.
-x=XXX Use a particular encoding for output. Default is
the same encoding as the input XML file. UTF-8 if
input XML file has not XML declaration.
-? Show this help.
- 63-
Chapter 22 - Xerces-C++ Sample 14: IDOMPrint Xerces-C++ Documentation
<person id="Big.Boss">
<name><family>Boss</family> <given>Big</given></name>
<email>[email protected]</email>
<link subordinates="one.worker two.worker three.worker
four.worker five.worker"></link>
</person>
<person id="one.worker">
<name><family>Worker</family> <given>One</given></name>
<email>[email protected]</email>
<link manager="Big.Boss"></link>
</person>
<person id="two.worker">
<name><family>Worker</family> <given>Two</given></name>
<email>[email protected]</email>
<link manager="Big.Boss"></link>
</person>
<person id="three.worker">
<name><family>Worker</family> <given>Three</given></name>
<email>[email protected]</email>
<link manager="Big.Boss"></link>
</person>
<person id="four.worker">
<name><family>Worker</family> <given>Four</given></name>
<email>[email protected]</email>
<link manager="Big.Boss"></link>
</person>
<person id="five.worker">
- 64-
Chapter 22 - Xerces-C++ Sample 14: IDOMPrint Xerces-C++ Documentation
<name><family>Worker</family> <given>Five</given></name>
<email>[email protected]</email>
<link manager="Big.Boss"></link>
</person>
</personnel>
Note that IDOMPrint does not reproduce the original XML file. IDOMPrint and SAXPrint produce
different results because of the way the two APIs store data and capture events.
- 65-
23
Xerces-C++ Sample 9: SEnumVal
SEnumVal
SEnumVal shows how to enumerate the markup decls in a Schema Grammar.
Running SEnumVal
This program parses the specified XML file, then shows how to enumerate the contents of the Schema
Grammar.
Usage:
SEnumVal <XML file>
Name: personnel
Model Type: Children
Create Reason: Declared
ContentType: OneOrMore
Content Model: (person)+
ComplexType:
TypeName: ,C0
ContentType: OneOrMore
--------------------------------------------
Name: person
Model Type: Children
Create Reason: Declared
ContentType: Sequence
Content Model: (name,email*,url*,link?)
ComplexType:
TypeName: ,C1
ContentType: Sequence
Attributes:
- 66-
Chapter 23 - Xerces-C++ Sample 9: SEnumVal Xerces-C++ Documentation
Name: salary
Type: CDATA
Default Type: #IMPLIED
Base Datatype: Decimal
Facets:
fractionDigits=0
Name: id
Type: ID
Default Type: #REQUIRED
Base Datatype: ID
Name: contr
Type: CDATA
Default Type: #DEFAULT
Value: false
Base Datatype: string
Enumeration:
true
false
Name: note
Type: CDATA
Default Type: #IMPLIED
Base Datatype: string
--------------------------------------------
Name: name
Model Type: Children
Create Reason: Declared
ContentType: All
Content Model: All(family,given)
ComplexType:
TypeName: ,C3
ContentType: All
--------------------------------------------
Name: family
Model Type: Simple
Create Reason: Declared
Base Datatype: string
--------------------------------------------
Name: given
Model Type: Simple
Create Reason: Declared
Base Datatype: string
--------------------------------------------
Name: email
Model Type: Simple
Create Reason: Declared
Base Datatype: string
--------------------------------------------
- 67-
Chapter 23 - Xerces-C++ Sample 9: SEnumVal Xerces-C++ Documentation
Name: url
Model Type: Empty
Create Reason: Declared
Content Model: EMPTY
ComplexType:
TypeName: ,C4
Attributes:
Name: href
Type: CDATA
Default Type: #DEFAULT
Value: http://
Base Datatype: string
--------------------------------------------
Name: link
Model Type: Empty
Create Reason: Declared
Content Model: EMPTY
ComplexType:
TypeName: ,C5
Attributes:
Name: subordinates
Type: IDREFS
Default Type: #IMPLIED
Base Datatype: List
Name: manager
Type: IDREF
Default Type: #IMPLIED
Base Datatype: IDREF
--------------------------------------------
- 68-
24
Schema
Introduction
This package contains an implementation of the W3C XML Schema Language, a recommendation of the
Worldwide Web Consortium available in three parts: XML Schema: Primer [23] and XML Schema:
Structures [24] and XML Schema: Datatypes [25] . We consider this implementation complete except for
the limitations cited below.
We would very much appreciate feedback on the package via the Xerces-C++ mailing list
[email protected] [15] , and we encourage the submission of bugs as described in
Bug-Reporting page. Please read this document before using this package.
Limitations
· No interface is provided for exposing the post-schema validation infoset , beyond that provided by
DOM or SAX;
· Due to the way in which the parser constructs content models for elements with complex content,
specifying large values for the minOccurs or maxOccurs attributes may cause a stack overflow or
very poor performance in the parser. Large values for minOccurs should be avoided, and
unbounded should be used instead of a large value for maxOccurs.
Usage
Here is an example how to turn on schema processing in DOMParser (default is off). Note that you must
also turn on namespace support (default is off) for schema processing.
// Instantiate the DOM parser.
DOMParser parser;
parser.setDoNamespaces(true);
parser.setDoSchema(true);
parser.parse(xmlFile);
- 69-
Chapter 24 - Schema Xerces-C++ Documentation
parser->setProperty(
XMLUni::fgSAX2XercesSchemaExternalNoNameSpaceSchemaLocation,
propertyValue);
parser.parse("test.xml");
Here is an example with a target namespace. Note that it is an error to specify a different namespace in
setExternalSchemaLocation than the target namespace defined in the Schema.
- 70-
Chapter 24 - Schema Xerces-C++ Documentation
parser->setProperty(
XMLCh XMLUni::fgSAX2XercesSchemaExternalSchemaLocation,
propertyValue);
parser.parse("test.xml");
Here is an example with a target namespace. Note that it is an error to specify a different namespace in
xsi:schemaLocation attribute than the target namespace defined in the Schema.
<?xml version="1.0" encoding="UTF-8"?>
<personnel xmlns="http://my.com"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://my.com personal.xsd http://my2.com
test2.xsd">
...
</personnel>
- 71-
25
FAQs
Distributing Xerces-C++
What compilers are being used on the supported platforms?
Xerces binaries has been built on the following platforms with these compilers
Operating System Compiler
Windows NT 4.0 SP5/2000 MSVC 6.0 SP3
Redhat Linux 7.2 g++ 2.96
AIX 4.3 xlC_r 5.0.2
Solaris 2.6 Forte C++ Version 6 Update 2
HP-UX 11.0 aCC A.03.13 with pthreads
- 72-
Chapter 25 - FAQs Xerces-C++ Documentation
However, if you are using the XML4C binaries then in addition to the library file mentioned above, you
also need to ship:
1. ICU shared library file:
icuuc.dll for Windows NT/2000, or
libicuuc.a for AIX, or
libicuuc.so for Solaris/Linux, or
libicuuc.sl for HP-UX.
2. ICU converter data shared library file:
icudata.dll for Windows NT/2000, or
libicudata.a for AIX, or
libicudata.so for Solaris/Linux, or
libicudata.sl for HP-UX.
How do I package the sources to create a binary drop?
You have to first compile the sources inside your IDE to create the required DLLs and EXEs. Then you
need to copy over the binaries to another directory for the binary drop. A perl script has been provided to
give you a jump start. You need to install perl on your machine for the script to work. If you have
changed your source tree, you have to modify the script to suit your current directory structure. To invoke
the script, go to the \<Xerces>\scripts directory, and type:
perl packageBinaries.pl
You will get a message that somewhat looks like this (changes always happen, we are evolving you see!):
Usage is: packageBinaries <options>
options are: -s <source_directory>
-o <target_directory>
-c <C compiler name> (e.g. gcc or xlc_r)
-x <C++ compiler name> (e.g. g++ or xlC_r)
-m <message loader> can be 'inmem', 'icu' or 'iconv'
-n <net accessor> can be 'fileonly' or 'libwww'
-t <transcoder> can be 'icu' or 'native'
-r <thread option> can be 'pthread' or 'dce' (only used on HP-11)
-h to get help on these commands
Example: perl packageBinaries.pl -s$HOME/xerces-c-src1_7_0
-o$HOME/xerces-c1_7_0
-cgcc -xg++ -minmem
-nfileonly -tnative
Make sure that your compiler can be invoked from the command line and follow the instructions to
produce a binary drop.
I do not see binaries for my platform. When will they be available?">
The reason why you see binaries only for some specific platforms is that we have had the maximum
requests for them. Moreover, we have limited resources and hence cannot publish binaries for every
platform. If you wish to contribute your time and effort in building binaries for a specific
platform/environment then please send a mail to the Xerces-C++ mailing list [15] . We can definitely use
any extra help in this open source project
When will a port to my platform be available?
We would like to see Xerces ported to as many platforms as there are. Again, due to limited resources we
- 73-
Chapter 25 - FAQs Xerces-C++ Documentation
cannot do all the ports. We will help you make this port happen. Here are some Porting Guidelines.
We strongly encourage you to submit the changes that are required to make it work on another platform.
We will incorporate these changes in the source code base and make them available in the future releases.
All porting changes may be sent to the Xerces-C++ mailing list [15] .
How can I port Xerces to my favourite platform?
Some porting information is mentioned on the build page.
What application do you use to create the documentation?
We have used an internal XML based application to create the documentation. The documentation files
are all written in XML and the application, internally codenamed StyleBook, makes use of XSL to
transform it into an HTML document that you are seeing right now. It is currently available on the
Apache [30] open source website as Cocoon [31] .
The API documentation is automatically generated using Doxygen [21] and GraphViz [22] .
Can I use Xerces in my product?
Yes! Read the license agreement first and if you still have further questions, then please address them to
the Xerces-C++ mailing list [15] .
How do I uninstall Xerces-C++?
Xerces-C++ only installs itself in a single directory and does not set any registry entries. Thus, to
uninstall, you only need to remove the directory where you installed it, and all Xerces-C++ related files
will be removed.
I am getting a tar checksum error on Solaris. What's the problem?
The problem is caused by a limitation in the original tar spec, which prevented it from archiving files with
long pathnames. Unfortunately, various current versions of tar use different extensions for eliminating
this restriction which are incompatible with each other (or they do not remove the restriction at all).
Rather than altering the pathnames for the Xerces-C++ package, which would make them compatible
with the original tar spec but make it more difficult to know what was where, it was decided to use GNU
tar (gtar), which handles arbitrarily long pathnames and is freely available on every platform on which
Xerces-C++ is supported. If you don't already have GNU tar installed on your system, you can obtain it
from the Free Software Foundation http://www.gnu.org/software/tar/tar.html [32] . For additional
background information on this problem, see the online manual GNU tar and POSIX tar [33] for the
utility.
- 74-
Chapter 25 - FAQs Xerces-C++ Documentation
- 75-
Chapter 25 - FAQs Xerces-C++ Documentation
If you are using the enhanced version of this parser from IBM, you will need to put in two additional
DLLs. In the Windows build these are icuuc.dll and icudata.dll which must be available from
your PATH settings. On UNIX, these libraries are called libicuuc.so and libicudata.so (or
.sl for HP-UX or .a for AIX) which must be available from your library search path.
Why does my application crash on AIX when I run it under a multi-threaded environment?
AIX maintains two kinds of libraries on the system, thread-safe and non-thread safe. Multi-threaded
libraries on AIX follow a different naming convention, Usually the multi-threaded library names are
followed with "_r". For example, libc.a is single threaded whereas libc_r.a is multi-threaded.
To make your multi-threaded application run on AIX, you must ensure that you do not have a "system
library path" in your LIBPATH environment variable when you run the application. The appropriate
libraries (threaded or non-threaded) are automatically picked up at runtime. An application usually
crashes when you build your application for multi-threaded operation but don't point to the thread-safe
version of the system libraries. For example, LIBPATH can be simply set as:
LIBPATH=$HOME/<Xerces>/lib
Where <Xerces> points to the directory where the Xerces application resides.
If, for any reason unrelated to Xerces, you need to keep a "system library path" in your LIBPATH
environment variable, you must make sure that you have placed the thread-safe path before you specify
the normal system path. For example, you must place /lib/threads before /lib in your LIBPATH variable.
That is to say your LIBPATH may look like this:
export LIBPATH=$HOME/<Xerces>/lib:/usr/lib/threads:/usr/lib
try {
XMLPlatformUtils::Initialize();
}
catch (const XMLException& toCatch) {
// Do your failure processing here
}
This initializes the Xerces system and sets its internal variables. Note that you must the include
xercesc/util/PlatformUtils.hpp file for this to work.
Why does deleting a transcoded string result in assertion on windows?
Both your application program and the Xerces DLL must use the same *DLL* version of the runtime
library. If either statically links to the runtime library, the problem will still occur. For example, for a
- 76-
Chapter 25 - FAQs Xerces-C++ Documentation
Win32/VC6 build, the runtime library build setting MUST be "Multithreaded DLL" for release builds and
"Debug Multithreaded DLL" for debug builds.
The libs/dll's I downloaded keep me from using the debugger in VC6.0. I am using the 'D', debug
versions of them. "no symbolic information found" is what it says. Do I have to compile everything
from source to make it work?
Unless you have the .pdb files, all you are getting with the debug library is that it uses the debug heap
manager, so that you can compile your stuff in debug mode and not be dangerous. If you want full
symbolic info for the Xerces-C++ library, you'll need the .pdb files, and to get those, you'll need to
rebuild the Xerces-C++ library.
"First-chance exception in DOMPrint.exe (KERNEL32.DLL): 0xE06D7363: Microsoft C++
Exception." I am always getting this message when I am using the parser. My programs are
terminating abnormally. Even the samples are giving this exception. I am using Visual C++ 6.0 with
latest service pack installed.
Xerces-C++ uses C++ exceptions internally, as part of its normal operation. By default, the MSVC
debugger will stop on each of these with the "First-chance exception ..." message.
To stop this from happening do this:
· start debugging (so the debug menu appears)
· from the debug menu select "Exceptions"
· from the box that opens select "Microsoft C++ Exception" and set it to "Stop if not handled" instead
of "stop always".
You'll still land in the debugger if your program is terminating abnormally, but it will be at your problem,
not from the internal Xerces-C++ exceptions.
"Fatal Error: Cannot open include file: XXX: No such file or directory"?
Due to the recent directory change, you may need to either update your project file, makefile, or your
source/header file, for details, please refer to Directory Change.
Programming/Parsing FAQs
Does Xerces-C++ support Schema?
Yes. The Xerces-C++ 1.7.0 contains an implementation of the W3C XML Schema Language, a
recommendation of the Worldwide Web Consortium available in three parts: XML Schema: Primer [23]
and XML Schema: Structures [24] and XML Schema: Datatypes [25] . We consider this implementation
complete. See the Schema page for limitations.
Why Xerces-C++ does not support this particular Schema feature?
The Xerces-C++ 1.7.0 contains an implementation of the W3C XML Schema Language, a
recommendation of the Worldwide Web Consortium available in three parts: XML Schema: Primer [23]
and XML Schema: Structures [24] and XML Schema: Datatypes [25] . We consider this implementation
complete. See the Schema page for limitations.
If you find any Schema feature which is specified in the W3C XML Schema Language Recommendation
does not work with Xerces-C++ 1.7.0, we encourage the submission of bugs as described in
Bug-Reporting page.
Why does my application crash when instantiating the parser?
In order to work with the Xerces-C++ parser, you have to first initialize the XML subsystem. The most
common mistake is to forget this initialization. Before you make any calls to Xerces-C++ APIs, you must
call XMLPlatformUtils::Initialize():
- 77-
Chapter 25 - FAQs Xerces-C++ Documentation
try {
XMLPlatformUtils::Initialize();
}
catch (const XMLException& toCatch) {
// Do your failure processing here
}
This initializes the Xerces system and sets its internal variables. Note that you must the include
xercesc/util/PlatformUtils.hpp file for this to work.
Is it OK to call the XMLPlatformUtils::Initialize/Terminate pair of routines multiple times in one
program?
Yes. Since Xerces-C++ Version 1.5.2., the code has been enhanced so that calling
XMLPlatformUtils::Initialize/Terminate pair of routines multiple times in one process is now allowed.
But the application needs to guarantee that only one thread has entered either the method
XMLPlatformUtils::Initialize() or the method XMLPlatformUtils::Terminate() at any one time.
If you are calling XMLPlatformUtils::Initialize() a number of times, and then follow with
XMLPlatformUtils::Terminate() the same number of times, only the first XMLPlatformUtils::Initialize()
will do the initialization, and only the last XMLPlatformUtils::Terminate() will clean up the memory. The
other calls are ignored.
To ensure all the memory held by the parser are freed, the number of XMLPlatformUtils::Terminate()
calls should match the number of XMLPlatformUtils::Initialize() calls.
Consider the following code snippets (for illustration simplicity the following sample code is not coded in
try/catch clause):
SAXParser parser;
parser.parse(xmlFile);
SAXParser parser;
parser.parse(xmlFile);
SAXParser parser;
parser.parse(xmlFile);
1: {
2: XMLPlatformUtils::Initialize();
3: DOMString c("hello");
4: XMLPlatformUtils::Terminate();
5: }
The DOMString object "c" is destructed when going out of scope at line 5 before the closing brace. As a
result, DOMString destructor is called at line 5 after XMLPlatformUtils::Terminate() which is wrong.
- 79-
Chapter 25 - FAQs Xerces-C++ Documentation
1: {
2: XMLPlatformUtils::Initialize();
2a: {
3: DOMString c("hello");
3a: }
4: XMLPlatformUtils::Terminate();
5: }
The extra pair of braces (line 2a and 3a) ensures that all implicit destructors are called before terminating
Xerces-C++.
In addition the application also needs to guarantee that only one thread has entered either the method
XMLPlatformUtils::Initialize() or the method XMLPlatformUtils::Terminate() at any one time.
Is Xerces-C++ thread-safe?
This is not a question that has a simple yes/no answer. Here are the rules for using Xerces-C++ in a
multi-threaded environment:
Within an address space, an instance of the parser may be used without restriction from a single thread, or
an instance of the parser can be accessed from multiple threads, provided the application guarantees that
only one thread has entered a method of the parser at any one time.
When two or more parser instances exist in a process, the instances can be used concurrently, without
external synchronization. That is, in an application containing two parsers and two threads, one parser can
be running within the first thread concurrently with the second parser running within the second thread.
The same rules apply to Xerces-C++ DOM documents. Multiple document instances may be
concurrently accessed from different threads, but any given document instance can only be accessed by
one thread at a time.
DOMStrings allow multiple concurrent readers. All DOMString const methods are thread safe, and can
be concurrently entered by multiple threads. Non-const DOMString methods, such as appendData(),
are not thread safe and the application must guarantee that no other methods (including const methods)
are executed concurrently with them.
The application also needs to guarantee that only one thread has entered either the method
XMLPlatformUtils::Initialize() or the method XMLPlatformUtils::Terminate() at any one time.
I am seeing memory leaks in Xerces-C++. Are they real?
The Xerces-C++ library allocates and caches some commonly reused items. The storage for these may be
reported as memory leaks by some heap analysis tools; to avoid the problem, call the function
XMLPlatformUtils::Terminate() before your application exits. This will free all memory that
was being held by the library.
For most applications, the use of Terminate() is optional. The system will recover all memory when
the application process shuts down. The exception to this is the use of Xerces-C++ from DLLs that will
be repeatedly loaded and unloaded from within the same process. To avoid memory leaks with this kind
of use, Terminate() must be called before unloading the Xerces-C++ library
To ensure all the memory held by the parser are freed, the number of XMLPlatformUtils::Terminate()
calls should match the number of XMLPlatformUtils::Initialize() calls.
I find memory leaks in Xerces-C++. How do I eliminate it?
- 80-
Chapter 25 - FAQs Xerces-C++ Documentation
The "leaks" that are reported through a leak-detector or heap-analysis tools aren't really leaks in most
application, in that the memory usage does not grow over time as the XML parser is used and re-used.
What you are seeing as leaks are actually lazily evaluated data allocated into static variables. This data
gets released when the application ends. You can make a call to
XMLPlatformUtil::terminate() to release all the lazily allocated variables before you exit your
program.
To ensure all the memory held by the parser are freed, the number of XMLPlatformUtils::Terminate()
calls should match the number of XMLPlatformUtils::Initialize() calls.
Can I use Xerces to perform "write validation" (which is having an appropriate DTD and being able to
add elements to the DOM whilst validating against the DTD)? Is there a function that I have totally
missed that creates an XML file from a DTD, (obviously with the values missing, a skeleton, as it
were.)
The answers are: "No" and "No." Write Validation is a commonly requested feature, but Xerces-C++
does not have it yet.
The best you can do for now is to create the DOM document, write it back as XML and re-parse it.
Is there a facility in Xerces-C++ to validate the data contained in a DOM tree? That is, without saving
and re-parsing the source document?
No. This is a frequently requested feature, but at this time it is not possible to feed XML data from the
DOM directly back to the DTD validator. The best option for now is to generate XML source from the
DOM and feed that back into the parser.
How to write out a DOM tree into an XML file?
This feature is not yet available in the parser. Take a look at the DOMPrint sample for an example on
parsing XML file, then writing it out back to the screen. You can use that code.
Why DOM_Node::cloneNode() does not clone the pointer assigned to a DOM_Node via
DOM_Node::setUserData()?
There are several possible options for how cloneNode should handle userData:
· 1) Copy the pointer. May be a Very Bad Idea if you really wanted the data associated with a particular
node object.
· 2) Clone the object being pointed at. Maybe a Very Bad Idea if that object, in turn, wasn't designed to
be cloned at this time.
· 3) A complex call-back API has been proposed which would allow the userData object to tell the
DOM which of these three options should be taken, but that would require that only objects
implementing that API be registered as userData. That doesn't seem to be a good option.
· 4) Do nothing. This is by far the lowest-overhead and safest choice. And since cloneNode is a DOM
operation, and userData is _not_ defined by the standard DOM API, one can make a very strong case
for this being the "most correct" option.
We chose (4), very deliberately. If you want one of the others, you can implement it by creating your own
wrapper operation for cloneNode() and calling that.
NOTE that userData should be considered a non-portable, experimental feature of the Xerces DOM. It
may evaporate entirely in favor of a scheme based on the DOM Level 3 "node key" mechanism, when
that becomes officially available.
How are entity reference nodes handled in DOM?
If you are using the native DOM classes, the function setExpandEntityReferences controls how
entities appear in the DOM tree. When setExpandEntityReferences is set to false (the default), an
- 81-
Chapter 25 - FAQs Xerces-C++ Documentation
occurrence of an entity reference in the XML document will be represented by a subtree with an
EntityReference node at the root whose children represent the entity expansion. Entity expansion will be a
DOM tree representing the structure of the entity expansion, not a text node containing the entity
expansion as text.
If setExpandEntityReferences is true, an entity reference in the XML document is represented by only the
nodes that represent the entity expansion. The DOM tree will not contain any entityReference nodes.
What kinds of URLs are currently supported in Xerces-C++?
The XMLURL class provides for limited URL support. It understands the file://, http://, and
ftp:// URL types, and is capable or parsing them into their constituent components, and normalizing
them. It also supports the commonly required action of conglomerating a base and relative URL into a
single URL. In other words, it performs the limited set of functions required by an XML parser.
Another thing that URLs commonly do are to create an input stream that provides access to the entity
referenced. The parser, as shipped, only supports this functionality on URLs in the form file:/// and
file://localhost/, i.e. only when the URL refers to a local file.
You may enable support for HTTP and FTP URLs by implementing and installing a NetAccessor object.
When a NetAccessor object is installed, the URL class will use it to create input streams for the remote
entities referred to by such URLs.
How can I add support for URLs with HTTP/FTP protocols?
Support for the http: protocol is now included by default on all platforms.
To address the need to make remote connections to resources specified using additional protocols, ftp for
example, Xerces-C++ provides the NetAccessor interface. The header file is
src/xercesc/util/XMLNetAccessor.hpp. This interface allows you to plug in your own
implementation of URL networking code into the Xerces-C++ parser.
Can I use Xerces-C++ to parse HTML?
Yes, but only if the HTML follows the rules given in the XML specification [2] . Most HTML, however,
does not follow the XML rules, and will generate XML well-formedness errors.
I keep getting an error: "invalid UTF-8 character". What's wrong?
Most commonly, the XML encoding = declaration is either incorrect or missing. Without a
declaration, XML defaults to the use utf-8 character encoding, which is not compatible with the default
text file encoding on most systems.
The XML declaration should look something like this:
<?xml version="1.0" encoding="iso-8859-1"?>
Make sure to specify the encoding that is actually used by file. The encoding for "plain" text files depends
both on the operating system and the locale (country and language) in use.
Another common source of problems is that some characters are not allowed in XML documents,
according to the XML spec. Typical disallowed characters are control characters, even if you escape them
using the Character Reference form. See the XML spec [35] , sections 2.2 and 4.1 for details. If the parser
is generating an Invalid character (Unicode: 0x???) error, it is very likely that there's a
character in there that you can't see. You can generally use a UNIX command like "od -hc" to find it.
What encodings are supported by Xerces-C / XML4C?
Xerces-C has intrinsic support for ASCII, UTF-8, UTF-16 (Big/Small Endian), UCS4 (Big/Small
Endian), EBCDIC code pages IBM037 and IBM1140 encodings, ISO-8859-1 (aka Latin1) and
Windows-1252. This means that it can parse input XML files in these above mentioned encodings.
- 82-
Chapter 25 - FAQs Xerces-C++ Documentation
XML4C -- the version of Xerces-C available from IBM -- combines Xerces-C and International
Components for Unicode (ICU) [11] and extends the encoding support to over 100 different encodings
that are allowed by ICU. In particular, all the encodings registered with the Internet Assigned Numbers
Authority (IANA) [36] are supported in XML4C.
Some implementations or ports of Xerces-C provide support for additional encodings. The exact set will
depend on the supplier of the parser and on the character set transcoding services in use.
What character encoding should I use when creating XML documents?
The best choice in most cases is either utf-8 or utf-16. Advantages of these encodings include:
· The best portability. These encodings are more widely supported by XML processors than any others,
meaning that your documents will have the best possible chance of being read correctly, no matter
where they end up.
· Full international character support. Both utf-8 and utf-16 cover the full Unicode character set, which
includes all of the characters from all major national, international and industry character sets.
· Efficient. utf-8 has the smaller storage requirements for documents that are primarily composed of
characters from the Latin alphabet. utf-16 is more efficient for encoding Asian languages. But both
encodings cover all languages without loss.
The only drawback of utf-8 or utf-16 is that they are not the native text file format for most systems,
meaning that common text file editors and viewers can not be directly used.
A second choice of encoding would be any of the others listed in the table above. This works best when
the xml encoding is the same as the default system encoding on the machine where the XML document is
being prepared, because the document will then display correctly as a plain text file. For UNIX systems in
countries speaking Western European languages, the encoding will usually be iso-8859-1.
The versions of Xerces distributed by IBM, both C and Java (known respectively as XML4C and
XML4J), include all of the encodings listed in the above table, on all platforms.
A word of caution for Windows users: The default character set on Windows systems is windows-1252,
not iso-8859-1. While Xerces-C++ does recognize this Windows encoding, it is a poor choice for
portable XML data because it is not widely recognized by other XML processing tools. If you are using a
Windows-based editing tool to generate XML, check which character set it generates, and make sure that
the resulting XML specifies the correct name in the encoding="..." declaration.
Is EBCDIC supported?
Yes, Xerces-C++ supports EBCDIC. When creating EBCDIC encoded XML data, the preferred
encoding is ibm1140. Also supported is ibm037 (and its alternate name, ebcdic-cp-us); this encoding is
almost the same as ibm1140, but it lacks the Euro symbol.
These two encodings, ibm1140 and ibm037, are available on both Xerces-C and IBM XML4C, on all
platforms.
On IBM System 390, XML4C also supports two alternative forms, ibm037-s390 and ibm1140-s390.
These are similar to the base ibm037 and ibm1140 encodings, but with alternate mappings of the
EBCDIC new-line character, which allows them to appear as normal text files on System 390s. These
encodings are not supported on other platforms, and should not be used for portable data.
XML4C on System 390 and AS/400 also provides additional EBCDIC encodings, including those for the
character sets of different countries. The exact set supported will be platform dependent, and these
encodings are not recommended for portable XML data.
Why does deleting a transcoded string result in assertion on windows?
Both your application program and the Xerces DLL must use the same *DLL* version of the runtime
- 83-
Chapter 25 - FAQs Xerces-C++ Documentation
library. If either statically links to the runtime library, the problem will still occur. For example, for a
Win32/VC6 build, the runtime library build setting MUST be "Multithreaded DLL" for release builds and
"Debug Multithreaded DLL" for debug builds.
How do I transcode to/from something besides the local code page?
XMLString::transcode() will transcode from XMLCh to the local code page, and other APIs which take a
char* assume that the source text is in the local code page. If this is not true, you must transcode the text
yourself. You can do this using local transcoding support on your OS, such as Iconv on Unix or IBM's
ICU package. However, if your transcoding needs are simple, you can achieve some better portability by
using the Xerces parser's transcoder wrappers. You get a transcoder like this:
· 1. Call XMLPlatformUtils::fgTransServer- >MakeNewTranscoderFor() and provide the name of the
encoding you wish to create a transcoder for. This will return a transcoder to you, which you own and
must delete when you are through with it. NOTE: You must provide a maximum block size that you
will pass to the transcoder at one time, and you must blocks of characters of this count or smaller
when you do your transcoding. The reason for this is that this is really an internal API and is used by
the parser itself to do transcoding. The parser always does transcoding in known block sizes, and this
allows transcoders to be much more efficient for internal use since it knows the max size it will ever
have to deal with and can set itself up for that internally. In general, you should stick to block sizes in
the 4 to 64K range.
· 2. The returned transcoder is something derived from XMLTranscoder, so they are all returned to you
via that interface.
· 3. This object is really just a wrapper around the underlying transcoding system actually in use by
your version of Xerces, and does whatever is necessary to handle differences between the XMLCh
representation and the representation used by that underlying transcoding system.
· 4. The transcoder object has two primary APIs, transcodeFrom() and transcodeTo(). These transcode
between the XMLCh format and the encoding you indicated.
· 5. These APIs will transcode as much of the source data as will fit into the outgoing buffer you
provide. They will tell you how much of the source they ate and how much of the target they filled.
You can use this information to continue the process until all source is consumed.
· 6. char* data is always dealt with in terms of bytes, and XMLCh data is always dealt with in terms of
characters. Don't mix up which you are dealing with or you will not get the correct results, since many
encodings don't have a one to one relationship of characters to bytes.
· 7. When transcoding from XMLCh to the target encoding, the transcodeTo() method provides an
'unrepresentable flag' parameter, which tells the transcoder how to deal with an XMLCh code point
that cannot be converted legally to the target encoding, which can easily happen since XMLCh is
Unicode and can represent thousands of code points. The options are to use a default replacement
character (which the underlying transcoding service will choose, and which is guaranteed to be legal
for the target encoding), or to throw an exception.
Why does SAX2XMLReader::setProperty not work?
The function SAX2XMLReader::setProperty(const XMLCh* const name, void*
value) takes a void pointer for the property value. Application is required to initialize this void pointer
to a correct type. See SAX2 Programming Guide to learn exactly what type of property value that each
property expects for processing. Passing a void pointer that was initialized with a wrong type will lead to
unexpected result.
Why does SAX2XMLReader::getProperty not work?
The function void* SAX2XMLReader::getProperty(const XMLCh* const name)
returns a void pointer for the property value. See SAX2 Programming Guide to learn exactly what type of
object each property returns.
- 84-
Chapter 25 - FAQs Xerces-C++ Documentation
The parser owns the returned pointer. The memory allocated for the returned pointer will be destroyed
when the parser is deleted. To ensure accessibility of the returned information after the parser is deleted,
callers need to copy and store the returned information somewhere else; otherwise you may get
unexpected result. Since the returned pointer is a generic void pointer, see SAX2 Programming Guide to
learn exactly what type of property value each property returns for replication.
Why does the parser still try to locate the DTD even validation is turned off and how to ignore external
DTD reference?
When DTD is referenced, the parser will try to read it, because DTDs can provide a lot more information
than just validation. It defines entities and notations, external unparsed entities, default attributes,
character entities, etc... So it will always try to read it if present, even if validation is turned off.
To ignore the DTD, the only way to get around this is to install an EntityResolver (see the Redirect
sample for an example of how this is done), and reset the DTD file to "".
- 85-
Chapter 25 - FAQs Xerces-C++ Documentation
if you've fixed a problem or enhanced the code in some way, we really would like to get your changes,
and will take them in any reasonable form.
Generally a diff of the changed files against the current sources from CVS is good, along with some kind
of description of what the change is. (Working with the current sources is important!)
Where can I get predefined character entity definitions??
Download http://www.w3.org/TR/xhtml1/xhtml1.zip. [39]
- 86-
26
Programming Guide
- 87-
27
SAX1 Programming Guide
Constructing a parser
In order to use Xerces-C++ to parse XML files, you will need to create an instance of the SAXParser
class. The example below shows the code you need in order to create an instance of SAXParser. The
DocumentHandler and ErrorHandler instances required by the SAX API are provided using the
HandlerBase class supplied with Xerces-C++.
int main (int argc, char* args[]) {
try {
XMLPlatformUtils::Initialize();
}
catch (const XMLException& toCatch) {
cout << "Error during initialization! :\n"
<< DOMString(toCatch.getMessage()) << "\n";
return 1;
}
try {
parser->parse(xmlFile);
}
catch (const XMLException& toCatch) {
cout << "Exception message is: \n"
<< DOMString(toCatch.getMessage()) << "\n" ;
return -1;
}
catch (const SAXParseException& toCatch) {
cout << "Exception message is: \n"
<< DOMString(toCatch.getMessage()) << "\n" ;
- 88-
Chapter 27 - SAX1 Programming Guide Xerces-C++ Documentation
return -1;
}
catch (...) {
cout << "Unexpected Exception \n" ;
return -1;
}
}
MySAXHandler::MySAXHandler()
{
}
- 89-
Chapter 27 - SAX1 Programming Guide Xerces-C++ Documentation
cout << "I saw element: " << transcode(name) << endl;
}
The XMLCh and AttributeList types are supplied by Xerces-C++ and are documented in the include
files. Examples of their usage appear in the source code to the sample applications.
- 90-
28
SAX2 Programming Guide
try {
XMLPlatformUtils::Initialize();
}
catch (const XMLException& toCatch) {
cout << "Error during initialization! :\n"
<< DOMString(toCatch.getMessage()) << "\n";
return 1;
}
try {
parser->parse(xmlFile);
}
catch (const XMLException& toCatch) {
cout << "Exception message is: \n"
<< DOMString(toCatch.getMessage()) << "\n" ;
return -1;
}
catch (const SAXParseException& toCatch) {
cout << "Exception message is: \n"
<< DOMString(toCatch.getMessage()) << "\n" ;
return -1;
- 91-
Chapter 28 - SAX2 Programming Guide Xerces-C++ Documentation
}
catch (...) {
cout << "Unexpected Exception \n" ;
return -1;
}
}
MySAX2Handler::MySAX2Handler()
{
}
- 92-
Chapter 28 - SAX2 Programming Guide Xerces-C++ Documentation
The XMLCh and Attributes types are supplied by Xerces-C++ and are documented in the include files.
Examples of their usage appear in the source code to the sample applications.
http://xml.org/sax/features/namespace-prefixes
true: Report the original prefixed names and
attributes used for Namespace declarations
false: Do not report attributes used for Namespace
declarations, and optionally do not report
original prefixed names. (default)
http://xml.org/sax/features/validation
true: Report all validation errors. (default)
false: Do not report validation errors.
http://apache.org/xml/features/validation/dynamic
true: The parser will validate the document only if
a grammar is specified.
(http://xml.org/sax/features/validation must
be true)
false: Validation is determined by the state of the
http://xml.org/sax/features/validation feature
(default)
- 93-
Chapter 28 - SAX2 Programming Guide Xerces-C++ Documentation
http://apache.org/xml/features/validation/schema
true: Enable the parser's schema support.
(default)
false: Disable the parser's schema support.
http://apache.org/xml/features/validation/schema-full-checking
true: Enable full schema constraint checking,
including checking which may be
time-consuming or memory intensive.
Currently, particle unique attribution
constraint checking and particle derivation
restriction checking are controlled by this
option.
false: Disable full schema constraint checking
(default).
http://apache.org/xml/features/validation/reuse-grammar
true: The parser will reuse grammar information
from previous parses in subsequent parses.
false: The parser will not reuse any grammar
information. (default)
http://apache.org/xml/features/validation/reuse-validator
(deprecated)
Please use
http://apache.org/xml/features/validation/reuse-grammar
true: The parser will reuse grammar information
from previous parses in subsequent parses.
false: The parser will not reuse any grammar
information. (default)
- 94-
Chapter 28 - SAX2 Programming Guide Xerces-C++ Documentation
http://apache.org/xml/properties/schema/external-noNamespaceSchemaLocation
Description The XML Schema Recommendation
explicitly states that the inclusion of
schemaLocation/
noNamespaceSchemaLocation attributes in
the instance document is only a hint; it does
not mandate that these attributes must be
used to locate schemas. This property allows
the user to specify the no target namespace
XML Schema Location externally. If
specified, the instance document's
noNamespaceSchemaLocation attribute will
be effectively ignored.
Value The syntax is the same as for the
noNamespaceSchemaLocation attribute that
may occur in an instance document:
e.g."file_name.xsd".
Value Type XMLCh*
- 95-
29
DOM Programming Guide
// C++
#include <xercesc/dom/DOM.hpp>
// Java
import org.w3c.dom.*
The header file <dom/DOM.hpp> includes all the individual headers for the DOM API classes.
Class Names
The C++ class names are prefixed with "DOM_". The intent is to prevent conflicts between DOM class
names and other names that may already be in use by an application or other libraries that a DOM based
application must link with.
The use of C++ namespaces would also have solved this conflict problem, but for the fact that many
compilers do not yet support them.
DOM_Document myDocument; // C++
DOM_Node aNode;
DOM_Text someText;
If you wish to use the Java class names in C++, then you need to typedef them in C++. This is not
advisable for the general case - conflicts really do occur - but can be very useful when converting a body
of existing Java code to C++.
typedef DOM_Document Document;
typedef DOM_Node Node;
- 96-
Chapter 29 - DOM Programming Guide Xerces-C++ Documentation
// This is Java
Node aNode;
aNode = someDocument.createElement("ElementName");
Node docRootNode = someDoc.getDocumentElement();
docRootNode.AppendChild(aNode);
The Java and the C++ are identical on the surface, except for the class names, and this similarity remains
true for most DOM code.
However, Java and C++ handle objects in somewhat different ways, making it important to understand a
little bit of what is going on beneath the surface.
In Java, the variable aNode is an object reference , essentially a pointer. It is initially == null, and
references an object only after the assignment statement in the second line of the code.
In C++ the variable aNode is, from the C++ language's perspective, an actual live object. It is
constructed when the first line of the code executes, and DOM_Node::operator = () executes at the second
line. The C++ class DOM_Node essentially a form of a smart-pointer; it implements much of the
behavior of a Java Object Reference variable, and delegates the DOM behaviors to an implementation
class that lives behind the scenes.
Key points to remember when using the C++ DOM classes:
· Create them as local variables, or as member variables of some other class. Never "new" a DOM
object into the heap or make an ordinary C pointer variable to one, as this will greatly confuse the
automatic memory management.
· The "real" DOM objects - nodes, attributes, CData sections, whatever, do live on the heap, are
created with the create... methods on class DOM_Document. DOM_Node and the other DOM classes
serve as reference variables to the underlying heap objects.
· The visible DOM classes may be freely copied (assigned), passed as parameters to functions, or
returned by value from functions.
· Memory management of the underlying DOM heap objects is automatic, implemented by means of
reference counting. So long as some part of a document can be reached, directly or indirectly, via
reference variables that are still alive in the application program, the corresponding document data
will stay alive in the heap. When all possible paths of access have been closed off (all of the
application's DOM objects have gone out of scope) the heap data itself will be automatically deleted.
· There are restrictions on the ability to subclass the DOM classes.
- 97-
Chapter 29 - DOM Programming Guide Xerces-C++ Documentation
DOMString
Class DOMString provides the mechanism for passing string data to and from the DOM API. DOMString
is not intended to be a completely general string class, but rather to meet the specific needs of the DOM
API.
The design derives from two primary sources: from the DOM's CharacterData interface and from class
java.lang.string.
Main features are:
· It stores Unicode text.
· Automatic memory management, using reference counting.
· DOMStrings are mutable - characters can be inserted, deleted or appended.
When a string is passed into a method of the DOM, when setting the value of a Node, for example, the
string is cloned so that any subsequent alteration or reuse of the string by the application will not alter the
document contents. Similarly, when strings from the document are returned to an application via the
DOM API, the string is cloned so that the document can not be inadvertently altered by subsequent edits
to the string.
Note: The ICU classes are a more general solution to UNICODE character handling for
C++ applications. ICU is an Open Source Unicode library, available at the IBM
DeveloperWorks website [11] .
Equality Testing
The DOMString equality operators (and all of the rest of the DOM class conventions) are modeled after
the Java equivalents. The equals() method compares the content of the string, while the == operator
checks whether the string reference variables (the application program variables) refer to the same
underlying string in memory. This is also true of DOM_Node, DOM_Element, etc., in that operator ==
tells whether the variables in the application are referring to the same actual node or not. It's all very
Java-like
· bool operator == () is true if the DOMString variables refer to the same underlying storage.
· bool equals() is true if the strings contain the same characters.
Here is an example of how the equality operators work:
DOMString a = "Hello";
DOMString b = a;
DOMString c = a.clone();
if (b == a) // This is true
if (a == c) // This is false
if (a.equals(c)) // This is true
b = b + " World";
if (b == a) // Still true, and the string's
// value is "Hello World"
if (a.equals(c)) // false. a is "Hello World";
// c is still "Hello".
Downcasting
Application code sometimes must cast an object reference from DOM_Node to one of the classes
deriving from DOM_Node, DOM_Element, for example. The syntax for doing this in C++ is different
from that in Java.
- 98-
Chapter 29 - DOM Programming Guide Xerces-C++ Documentation
// This is C++
DOM_Node aNode = someFunctionReturningNode();
DOM_Element el = (DOM_Element &) aNode;
// This is Java
Node aNode = someFunctionReturningNode();
Element el = (Element) aNode;
The C++ cast is not type-safe; the Java cast is checked for compatible types at runtime. If necessary, a
type-check can be made in C++ using the node type information:
// This is C++
if (anode.getNodeType() == DOM_Node::ELEMENT_NODE)
el = (DOM_Element &) aNode;
else
// aNode does not refer to an element.
// Do something to recover here.
Subclassing
The C++ DOM classes, DOM_Node, DOM_Attr, DOM_Document, etc., are not designed to be
subclassed by an application program.
As an alternative, the DOM_Node class provides a User Data field for use by applications as a hook for
extending nodes by referencing additional data or objects. See the API description for DOM_Node for
details.
- 99-
30
Experimental IDOM Programming Guide
Experimental
The experimental IDOM API is a new design of the C++ DOM API. Please note that this experimental
IDOM API is only a prototype and is subject to change.
Constructing a parser
In order to use Xerces-C++ to parse XML files using IDOM, you will need to create an instance of the
IDOMParser class. The example below shows the code you need in order to create an instance of the
IDOMParser.
try {
XMLPlatformUtils::Initialize();
}
catch (const XMLException& toCatch) {
cout << "Error during initialization! :\n"
<< DOMString(toCatch.getMessage()) << "\n";
return 1;
}
try {
parser->parse(xmlFile);
}
catch (const XMLException& toCatch) {
cout << "Exception message is: \n"
<< DOMString(toCatch.getMessage()) << "\n" ;
return -1;
}
catch (const SAXParseException& toCatch) {
- 100-
Chapter 30 - Experimental IDOM Programming Xerces-C++ Documentation
Guide
return 0;
}
Class Names
The IDOM class names are prefixed with "IDOM_". The intent is to prevent conflicts between IDOM
class names and DOM class names that may already be in use by an application or other libraries that a
DOM based application must link with.
Objects Management
Applications would use normal C++ pointers to directly access the implementation objects for Nodes in
IDOM C++, while they would use object references in DOM C++.
Consider the following code snippets
- 101-
Chapter 30 - Experimental IDOM Programming Xerces-C++ Documentation
Guide
// IDOM C++
IDOM_Node* aNode;
IDOM_Node* docRootNode;
aNode = someDocument->createElement("ElementName");
docRootNode = someDocument->getDocumentElement();
docRootNode->appendChild(aNode);
// DOM C++
DOM_Node aNode;
DOM_Node docRootNode;
aNode = someDocument.createElement("ElementName");
docRootNode = someDocument.getDocumentElement();
docRootNode.appendChild(aNode);
Memory Management
The C++ IDOM implementation no longer uses reference counting for automatic memory management.
The C++ IDOM uses an independent storage allocator per document. The storage for a DOM document is
associated with the document node object. The advantage here is that allocation would require no
synchronization in most cases (based on the same threading model that we have now - one thread active
per document, but any number of documents running in parallel with separate threads).
The allocator does not support a delete operation at all - all allocated memory would persist for the life of
the document, and then the larger blocks would be returned to the system without separately deleting all
of the individual nodes and strings within the document.
The C++ DOM and IDOM are similar in the use of factory methods in the document class for all object
creation. They differ in the object deletion mechanism.
In C++ DOM, there is no explicit object deletion. The deallocation of memory is automatically taken care
of by the reference counting.
In C++ IDOM, there is an implicit and explicit object deletion.
- 102-
Chapter 30 - Experimental IDOM Programming Xerces-C++ Documentation
Guide
// now the parser has some fresh memory to work on for the following
// big loop
i = 1000;
while (i > 0) {
parser->parse(xmlFile)
IDOM_Document *doc = parser->getDocument();
i--;
}
delete parser;
- 103-
Chapter 30 - Experimental IDOM Programming Xerces-C++ Documentation
Guide
IDOM_DOMImplementation::getImplementation()- >createDocumentType, then the user also needs
to explicitly delete the DocumentType object to free the allocated memory.
· Special case: If a user is creating a DocumentType using the document implementation factory
method, and clone the node WITHOUT assigning a document owner to that DocumentType object,
then the cloned node also needs to be explicitly deleted.
Consider the following code snippets:
myDocType =
createDocumentType(name, 0, 0);
myDocument = IDOM_DOMImplementation::getImplementation()->createDocument(0,
name, myDocType);
root = myDocument->getDocumentElement();
aNode = myDocument->createElement(anElementname);
root->appendChild(aNode);
// need to delete both myDocType and myDocument which are created through DOM
Implementation
delete myDocType;
delete myDocument;
myDocument = IDOM_DOMImplementation::getImplementation()->createDocument();
myDocType = myDocument->createDocumentType(name);
root = myDocument->createElement(name);
aNode = myDocument->createElement(anElementname);
myDocument->appendChild(myDocType);
myDocument->appendChild(root);
root->appendChild(aNode);
myDocType =
createDocumentType(name, 0, 0);
myDocType1 = (IDOM_DocumentType*) myDocType->cloneNode(false);
myDocument = IDOM_DOMImplementation::getImplementation()->createDocument(0,
name, myDocType);
root = myDocument->getDocumentElement();
aNode = myDocument->createElement(anElementname);
root->appendChild(aNode);
// myDocType does not have an owner yet when myDocType1 was cloned.
// thus need to explicitly delete myDocType1
delete myDocType1;
delete myDocType;
delete myDocument;
myDocType =
createDocumentType(name, 0, 0);
myDocument = IDOM_DOMImplementation::getImplementation()->createDocument(0,
name, myDocType);
myDocType1 = (IDOM_DocumentType*) myDocType->cloneNode(false);
root = myDocument->getDocumentElement();
aNode = myDocument->createElement(anElementname);
root->appendChild(aNode);
- 105-
Chapter 30 - Experimental IDOM Programming Xerces-C++ Documentation
Guide
// myDocType already has myDocument as the owner when myDocType1 was cloned
// thus NO need to explicitly delete myDocType1
delete myDocType;
delete myDocument;
//C++ IDOM
const XMLCh* nodeValue = aNode->getNodeValue();
//C++ DOM
DOMString nodeValue = aNode.getNodeValue();
- 106-
31
Migration
either change the include search path in the Makefile to " -I <installroot>/include/xercesc",
or
change the relevant #include instances in the source/header files as shown above.
- 108-
Chapter 31 - Migration Xerces-C++ Documentation
Migration Archive
For migration information to Xerces-C++ 1.6.0 or earlier, please refer to Migration Archive.
- 109-
32
Migration Archive
- 110-
Chapter 32 - Migration Archive Xerces-C++ Documentation
General Improvements
The new version is improved in many ways. Some general improvements are: significantly better
conformance to the XML spec, cleaner internal architecture, many bug fixes, and faster speed.
Compliance
Except for a couple of the very obscure (mostly related to the 'standalone' mode), this version should be
quite compliant to XML 1.0 [2] . It also tracks the latest changes to DOM, SAX and Namespace
Specification. We have more than a thousand tests, some collected from various public sources and some
IBM generated, which are used to do regression testing. The C++ parser is now passing all but a handful
of them.
Bug Fixes
- 111-
Chapter 32 - Migration Archive Xerces-C++ Documentation
This version has many bug fixes since last release. Some of these were reported by users and some were
brought up by way of the conformance testing.
Speed
Much work was done to speed up this version. Some of the new features, such as experimental IDOM
ended up eating up some of these gains, but overall the new version is significantly faster than previous
versions, even while doing more.
DTDValidator
DTDValidator was design to scan, validate and store the DTD in Xerces-C++ 1.4.0 or earlier. In
Xerces-C++ 1.5.2, this process is broken down into three components:
· new class DTDScanner - to scan the DTD
· new class DTDGrammar - to store the DTD Grammar
· DTDValidator - to validate the DTD only
Experimental IDOM
The experimental IDOM API is a new design of the C++ DOM API. If you would like to migrate from
DOM to the experimental IDOM, please refer to IDOM programming guide. Please note that this
experimental IDOM API is only a prototype and is subject to change.
- 112-
Chapter 32 - Migration Archive Xerces-C++ Documentation
· Compliance
· Bug Fixes
· Speed
· Summary of changes required to migrate from XML4C 2.x to Xerces-C++ 1.4.0
· The Samples
· Parser Classes
· DOM Level 2 support
· Progressive Parsing
· Namespace support
· Moved Classes to src/framework
· Loadable Message Text
· Pluggable Validators
· Pluggable Transcoders
· Util directory Reorganization
· util - The platform independent utility stuff
General Improvements
The new version is improved in many ways. Some general improvements are: significantly better
conformance to the XML spec, cleaner internal architecture, many bug fixes, and faster speed.
Compliance
Except for a couple of the very obscure (mostly related to the 'standalone' mode), this version should be
quite compliant. We have more than a thousand tests, some collected from various public sources and
some IBM generated, which are used to do regression testing. The C++ parser is now passing all but a
handful of them.
Bug Fixes
This version has many bug fixes with regard to XML4C version 2.x. Some of these were reported by
users and some were brought up by way of the conformance testing.
Speed
Much work was done to speed up this version. Some of the new features, such as namespaces, and
conformance checks ended up eating up some of these gains, but overall the new version is significantly
faster than previous versions, even while doing more.
- 113-
Chapter 32 - Migration Archive Xerces-C++ Documentation
· The following methods now have different set of parameters because the underlying base class
methods have changed in the 3.x release. These methods belong to one of XMLDocumentHandler,
XMLErrorReporter or DocTypeHandler interfaces.
· [Non]Validating[DOM/SAX]Parser::docComment
· [Non]Validating[DOM/SAX]Parser::doctypePI
· [Non]ValidatingSAXParser::elementDecl
· [Non]ValidatingSAXParser::endAttList
· [Non]ValidatingSAXParser::entityDecl
· [Non]ValidatingSAXParser::notationDecl
· [Non]ValidatingSAXParser::startAttList
· [Non]ValidatingSAXParser::TextDecl
· [Non]ValidatingSAXParser::docComment
· [Non]ValidatingSAXParser::docPI
· [Non]Validating[DOM/SAX]Parser::endElement
· [Non]Validating[DOM/SAX]Parser::startElement
· [Non]Validating[DOM/SAX]Parser::XMLDecl
· [Non]Validating[DOM/SAX]Parser::error
· The following methods/data members changed visibility from protected in 2.3.x to private
(with public setters and getters, as appropriate).
· [Non]ValidatingDOMParser::fDocument
· [Non]ValidatingDOMParser::fCurrentParent
· [Non]ValidatingDOMParser::fCurrentNode
· [Non]ValidatingDOMParser::fNodeStack
· The following files have moved, possibly requiring changes in the #include statements.
· MemBufInputSource.hpp
· StdInInputSource.hpp
· URLInputSource.hpp
· All the DTD validator code was moved from internal to separate validators/DTD directory.
· The error code definitions which were earlier in internal/ErrorCodes.hpp are now split up
into the following files:
· framework/XMLErrorCodes.hpp - Core XML errors
· framework/XMLValidityCodes.hpp - DTD validity errors
· util/XMLExceptMsgs.hpp - C++ specific exception codes.
The Samples
The sample programs no longer use any of the unsupported util/xxx classes. They only existed to allow us
to write portable samples. But, since we feel that the wide character APIs are supported on a lot of
platforms these days, it was decided to go ahead and just write the samples in terms of these. If your
system does not support these APIs, you will not be able to build and run the samples. On some
platforms, these APIs might perhaps be optional packages or require runtime updates or some such action.
More samples have been added as well. These highlight some of the new functionality introduced in the
new code base. And the existing ones have been cleaned up as well.
The new samples are:
1. PParse - Demonstrates 'progressive parse' (see below)
2. StdInParse - Demonstrates use of the standard in input source
3. EnumVal - Shows how to enumerate the markup decls in a DTD Validator
Parser Classes
- 114-
Chapter 32 - Migration Archive Xerces-C++ Documentation
In the XML4C 2.x code base, there were the following parser classes (in the src/parsers/ source
directory): NonValidatingSAXParser, ValidatingSAXParser, NonValidatingDOMParser,
ValidatingDOMParser. The non-validating ones were the base classes and the validating ones just
derived from them and turned on the validation. This was deemed a little bit overblown, considering the
tiny amount of code required to turn on validation and the fact that it makes people use a pointer to the
parser in most cases (if they needed to support either validating or non-validating versions.)
The new code base just has SAXParer and DOMParser classes. These are capable of handling both
validating and non-validating modes, according to the state of a flag that you can set on them. For
instance, here is a code snippet that shows this in action.
void ParseThis(const XMLCh* const fileToParse,
const bool validate)
{
//
// Create a SAXParser. It can now just be
// created by value on the stack if we want
// to parse something within this scope.
//
SAXParser myParser;
We feel that this is a simpler architecture, and that it makes things easier for you. In the above example,
for instance, the parser will be cleaned up for you automatically upon exit since you don't have to allocate
it anymore.
Progressive Parsing
The new parser classes support, in addition to the parse() method, two new parsing methods, parseFirst()
and parseNext(). These are designed to support 'progressive parsing', so that you don't have to depend
upon throwing an exception to terminate the parsing operation. Calling parseFirst() will cause the DTD
(or in the future, Schema) to be parsed (both internal and external subsets) and any pre-content, i.e.
everything up to but not including the root element. Subsequent calls to parseNext() will cause one more
pieces of markup to be parsed, and spit out from the core scanning code to the parser (and hence either on
to you if using SAX or into the DOM tree if using DOM.) You can quit the parse any time by just not
calling parseNext() anymore and breaking out of the loop. When you call parseNext() and the end of the
root element is the next piece of markup, the parser will continue on to the end of the file and return false,
- 115-
Chapter 32 - Migration Archive Xerces-C++ Documentation
to let you know that the parse is done. So a typical progressive parse loop will look like this:
// Create a progressive scan token
XMLPScanToken token;
if (!parser.parseFirst(xmlFile, token))
{
cerr << "scanFirst() failed\n" << endl;
return 1;
}
//
// We started ok, so lets call scanNext()
// until we find what we want or hit the end.
//
bool gotMore = true;
while (gotMore && !handler.getDone())
gotMore = parser.parseNext(token);
In this case, our event handler object (named 'handler' surprisingly enough) is watching form some
criteria and will return a status from its getDone() method. Since the handler sees the SAX events coming
out of the SAXParser, it can tell when it finds what it wants. So we loop until we get no more data or our
handler indicates that it saw what it wanted to see.
When doing non-progressive parses, the parser can easily know when the parse is complete and insure
that any used resources are cleaned up. Even in the case of a fatal parsing error, it can clean up all
per-parse resources. However, when progressive parsing is done, the client code doing the parse loop
might choose to stop the parse before the end of the primary file is reached. In such cases, the parser will
not know that the parse has ended, so any resources will not be reclaimed until the parser is destroyed or
another parse is started.
This might not seem like such a bad thing; however, in this case, the files and sockets which were opened
in order to parse the referenced XML entities will remain open. This could cause serious problems.
Therefore, you should destroy the parser instance in such cases, or restart another parse immediately. In a
future release, a reset method will be provided to do this more cleanly.
Also note that you must create a scan token and pass it back in on each call. This insures that things don't
get done out of sequence. When you call parseFirst() or parse(), any previous scan tokens are invalidated
and will cause an error if used again. This prevents incorrect mixed use of the two different parsing
schemes or incorrect calls to parseNext().
Namespace support
The C++ parser now supports namespaces. With current XML interfaces (SAX/DOM) this doesn't mean
very much because these APIs are incapable of passing on the namespace information. However, if you
are using our internal APIs to write your own parsers, you can make use of this new information. Since
the internal event APIs must be able to now support both namespace and non-namespace information,
they have more parameters. These allow namespace information to be passed along.
Most of the samples now have a new command line parameter to turn on namespace support. You turn on
namespaces like this:
SAXParser myParser;
// Tell it whether to do namespace
myParser.setDoNamespaces(true);
- 116-
Chapter 32 - Migration Archive Xerces-C++ Documentation
Pluggable Validators
In a preliminary move to support Schemas, and to make them first class citizens just like DTDs, the
system has been reworked internally to make validators completely pluggable. So now the DTD validator
code is under the src/validators/DTD/ directory, with a future Schema validator probably going into the
src/validators. The core scanner architecture now works completely in terms of the
framework/XMLValidator abstract interface and knows almost nothing about DTDs or Schemas. For
now, if you don't pass in a validator to the parsers, they will just create a DTDValidator. This means that,
theoretically, you could write your own validator. But we would not encourage this for a while, until the
semantics of the XMLValidator interface are completely worked out and proven to handle DTD and
Schema cleanly.
Pluggable Transcoders
Another abstract framework added in the src/util/ directory is to support pluggable transcoding services.
The XMLTransService class is an abstract API that can be derived from, to support any desired
transcoding service. XMLTranscoder is the abstract API for a particular instance of a transcoder for a
particular encoding. The platform driver file decides what specific type of transcoder to use, which allows
- 117-
Chapter 32 - Migration Archive Xerces-C++ Documentation
each platform to use its native transcoding services, or the ICU service if desired.
Implementations are provided for Win32 native services, ICU services, and the iconv services available
on many Unix platforms. The Win32 version only provides native code page services, so it can only
handle XML code in the intrinsic encodings ASCII, UTF-8, UTF-16 (Big/Small Endian), UCS4
(Big/Small Endian), EBCDIC code pages IBM037 and IBM1140 encodings, ISO-8859-1 (aka Latin1)
and Windows-1252. The ICU version provides all of the encodings that ICU supports. The iconv version
will support the encodings supported by the local system. You can use transcoders we provide or create
your own if you feel ours are insufficient in some way, or if your platform requires an implementation
that we do not provide.
- 118-
33
Releases
- 120-
Chapter 33 - Releases Xerces-C++ Documentation
- 121-
Chapter 33 - Releases Xerces-C++ Documentation
2002-01-02 Tinny Ng Schema Fix: should not store a temp value as the key in
the element pool and the attribute pool.
2001-12-22 Jason Stewart [Bug 4953] Propagate existing CFLAGS and CXXFLAGS.
2001-12-21 Jason Stewart [Bug 5514] XMLEnumerator needs virtual destructor.
2001-12-21 Tinny Ng [Bug 2680] Remove '-instances=static' from the compile
step.
2001-12-21 Tinny Ng [Bug 1833] LexicalHandler::startDTD not called correctly.
2001-12-21 Frank Balluffi [Bug 5466] Memory Leak: ElementImpl.cpp's
ElementImpl::ElementImpl copy constructor does not
cleanup attributes before assignment.
2001-12-21 Frank Balluffi [Bug 5464] Memory Leak: DocumentImpl::importNode
does not delete old attribute if its reference count equals
zero.
2001-12-21 Tinny Ng Schema fix: leading whitespace should be preserved for
CData type.
2001-12-14 Khaled Noaman Add surrogate support to comments and processing
instructions.
2001-12-14 Tinny Ng Performance: Do not transcode twice in DOMString
constructor.
2001-12-14 Tinny Ng update BUILDINSTRUCTIONS.TXT to be in sync with
build instruction in build*.xml.
2001-12-13 PeiYong Zhang Fix: Invalid Argument to FreeLibrary (Hint: 0x0000000).
2001-12-13 Linda Swan iSeries (AS/400) documentation update and other iSeries
related fixes.
2001-12-13 Khaled Noaman [Bug 5410] non-schema <attribute> attributes cause error.
2001-12-12 Tinny Ng Fix typos in messages.
2001-12-12 PeiYong Zhang Memory leak: fRedefineList.
2001-12-12 Tinny Ng [Bug 5367] Progressive parse does not throw error when
file is empty.
2001-12-12 Tinny Ng Performance: Remove obsolete code in ElemStack.
2001-12-11 Max Gotlib More changes to IconvFBSDTransService. Allow using
"old" TransServece implementation (via '-t native' option to
runConfigure) or to employ libiconv (it is a part of FreeBSD
ports-collection) services.
2001-12-11 Christopher Just [Bug 5320] 1.5.2 Build fails on IRIX. The variable
"atomicOpsMutex" has been defined twice.
2001/12/10 PeiYong Zhang Swap checking to avoid "dangling pointer" reported by
BoundsChecker.
2001-12-10 PeiYong Zhang Memory Leak: fLeafNameTypeVector.
- 122-
Chapter 33 - Releases Xerces-C++ Documentation
- 125-
Chapter 33 - Releases Xerces-C++ Documentation
Releases Archive
For release information about Xerces-C++ 1.5.2 or earlier, please refer to Releases Archive.
- 126-
Chapter 33 - Releases Xerces-C++ Documentation
Releases Plan
For future release plan about Xerces-C++, please refer to Releases Plan.
- 127-
34
Releases Archive
- 128-
Chapter 34 - Releases Archive Xerces-C++ Documentation
- 132-
Chapter 34 - Releases Archive Xerces-C++ Documentation
- 133-
Chapter 34 - Releases Archive Xerces-C++ Documentation
- 134-
Chapter 34 - Releases Archive Xerces-C++ Documentation
- 135-
Chapter 34 - Releases Archive Xerces-C++ Documentation
2001-06-04 PeiYong Zhang The start tag "<?xml" could be followed by (#x20 | #x9
| #xD | #xA)+.
2001-06-04 James Berry Add support for tracking error count during parse;
enables simple parse without requiring error handler.
2001-06-01 Tinny Ng /scripts/packageSources.pl
Keep the BCB4 project files in the source package.
2001-05-22 James Berry Check for existence of MacOS Unicode Converter
routines prior to instantiating our transcoder object;
Xerces will thus panic, rather than crash, if they don't
exist. Add support to check for existence of MacOS
Unicode Converter to avoid calling through NULL
pointer.
2001-05-16 Henry Zongaro IDOM: Add DeepNodeList support.
2001-05-16 Henry Zongaro IDOM: Add namespace support.
2001-05-10 Christian Schuhegger [Bug 1158] built-in buffer limit could be smaller than
system limit, use PATH_MAX instead.
2001-05-10 Arnaud LeHors [Bug 1605] AttrNSImpl.cpp: fixed typo in constructor.
2001-05-09 Curt Arnold [Bug 1500] The public id was set twice and the system
id was not set on Notations.
2001-05-04 Tinny Ng DOMPrint: Check error before continuing.
2001-05-03 Tinny Ng ICU 1.8 update.
2001-05-03 Khaled Noaman Added new option to the parsers so that the NEL
(0x85) char can be treated as a newline character.
2001-04-23 Erik Rydgren DTDScanner: Reuse grammar should allow users to
use any stored element decl as root.
2001-04-19 William L Hopper Win32PlatformUtils: InterlockedCompareExchange on
different Windows.
2001-04-19 William L Hopper BCB project changes.
2001-04-16 James Berry MacOSUnicodeConverter: Fix include path, Updates
to reflect changes for Mac OS X final and Update
MacOS projects for Mac OS X final ProjectBuilder.
2001-04-11 Arnaud LeHors [Bug 1303] AttrImpl: allow value to be set to null.
2001-04-11 Tinny Ng DOMParser: Attribute default values not printed in
document type internal subset interface.
2001-04-10 Tinny Ng createdocs.bat: fix PDF generation.
2001-04-04 Alberto Massari DTDElementDecl: Error checking for null content
spec.
2001-04-02 Andy Heninger IDOM: imported.
2001-04-02 Andy Heninger IThreadTest: imported.
2001-03-30 Tinny Ng [Bug 1150] Problems with Namespaces and validating
parsing.
2001-03-27 Roman Sulzhyk [Bug 1069] Explicit Makefile dependency for 'lib' build.
2001-03-26 PeiYong Zhang When Standalone="yes", it is NOT supposed to
accept element which is defined in external DTD with
#FIXED attribute.
2001-03-26 Andy Heninger Update packageBinaries.pl for ICU 1.8. ICU debug .lib
file names and locations changed.
2001-03-23 Jeff Harrell [Bug 1018] AutoSense looks for "IRIX" when it should
look for "sgi" or "__sgi".
- 136-
Chapter 34 - Releases Archive Xerces-C++ Documentation
2001-03-22 Roman Sulzhyk [Bug 1069] The Makefiles fail to locate .cpp - > .o
dependency and rebuild .o all the time.
2001-03-22 John Rope [Bug 1021] Accessing an XML file using the file
"protocol" and a UNC path fails to open the file.
2001-03-09 Tinny Ng [Bug 733] Seg fault when trying to parse empty
filename.
2001-03-06 Tinny Ng [Bug 677] Infinite loop caused by malformed XML.
Happen when namespace is on.
2001-03-02 Martin Kalen Enabling libWWW NetAccessor support under UNIX.
Tested with latest tarball of libWWW
(w3c-libwww-5.3.2) under RedHat Linux 6.1.
2001-02-27 Tinny Ng [Bug 676] Linux for S/390 build requires -fPIC.
2001-02-22 Tinny Ng [Bug 678] StdInParse doesn't output filename or
duration.
2001-02-21 Matt Lovett ICUTranscoder::transcodeFrom() expects ICU
function ucnv_toUnicode to return an extra element in
fSrcOffsets to allow us to figure out the last char size,
which in fact it is not. The fix is to compute the last
char size ourselves using the total bytes used.
2001/02/16 Andy Heninger Change limit test to reduce spurious pointer
assignment warnings from BoundsChecker.
2001-02-14 Bob Kline Better FAQ for the checksum error.
2001-02-14 Mark Everline Core dump when UTF-16 encoding contradicts actual
encoding.
2001-02-13 Hiram Clawson Update samples/tests files for on UnixWare 7.1.1 with
gcc 2.95. Add UNIXWARE platform defines to
Makefile.incl, add recognition of sysv5uw7 to
configure.in, and add unixware as recognized platform
to runConfigure.
2001-02-09 Martin Kalen Update support for SCO UnixWare 7 (gcc). Tested
under UnixWare 7.1.1 with gcc version 2.95.2
19991024 (release) with gmake 3.79.1.
2001-02-08 Martin Kalen Enable COMPAQ Tru64 UNIX machines to build
xerces-c with gcc (tested using COMPAQ gcc
version2.95.2 19991024 (release) and Tru64 V5.0
1094).
2001-02-07 Bill Schindler Rearranged statements in Initialize() so that
platformInit() is called before an XMLMutex is created.
2001-02-07 Richard Ko Storage overlay in ucnv_setFromUCallBack.
2001-02-05 Tinny Ng [Bug 766] /src/util/Compilers/CSetDefs.hpp: define
NO_NATIVE_BOOL macro only if not
pre-defined/reserved.
2001-02-05 Jordan Naftolin Add createPDF.jar and apachPDFStyle.xsl to convert
documentation xml files to pdf format.
- 137-
Chapter 34 - Releases Archive Xerces-C++ Documentation
2001-01-25 Arnaud LeHors Added a flag to turn off error checking in the DOM, this
is primarily used while building the DOM from the parser
to get better performance.
2001-01-25 Khaled Noaman Let users add their encoding to the intrinsic mapping
table.
2001-01-25 Khaled Noaman const should be used instead of static const. And other
clean up bug fixes.
2001-01-24 Arnaud LeHors Fixed replaceChild to handle the case where a node is
replaced by itself. Cleaned up insertBefore.
2001-01-24 Tinny Ng Guard the use of '-ptr${OUTDIR}' in
EnumVal/Makefile.in
2001-01-22 Curt Arnold. Loads winsock dynamically.
2001-01-19 Curt Arnold. COM various updates: updated the GUID's so both can
coexist, better error reporting and fixed a new minor
bugs.
2001-01-18 Bill Schindler FAQ spell check, fix typos, fix grammar, readability
editing, clean up formatting, re-organize so related
topics appear together.
2001-01-18 Bill Schindler Project file updated due to removal of
ChildAndParentNode.cpp.
2001-01-17 Arnaud LeHors DOM Implementation Optimization.
2001-01-17 Volker Krause ElementImpl::getAttributeNS should check null pointer.
2001-01-17 Arnaud LeHors Have a single counter global to the document. Removed
node basis change counter.
2001-01-17 Arnaud LeHors Removed unused field in NodeImpl that was left over.
2001-01-17 Tinny Ng Access violations and stack overflows in insertBefore.
2001-01-15 David Bertoni Performance Patches.
2001-01-12 Tinny Ng Fix style-ibm.zip for documentation generation.
2001-01-12 Tinny Ng Remove the two obsolete file: stylesheets\Copy of
book2project.xsl and stylesheets\Copy of
document2html.xsl in style-apachexml.jar
2001-01-12 Tinny Ng Documentation Enhancement: explain values of
Val_Scheme.
2001-01-12 Tinny Ng Documentation Enhancement: Add list of SAX2 feature
strings that are supported.
2001-01-04 Khaled Noaman Assertion `size > 0' failure when cloning a node if the
last attributes has been removed.
2000-12-28 James Berry Omit include carbon.h in favor of specific include files.
2000-12-28 James Berry Add or modify cvs header in various files.
2000-12-28 James Berry Eliminate compiler warning in RangeImpl.cpp.
2000-12-28 James Berry Replace include of Carbon.h with specific include files.
2000-12-28 James Berry Move away from include of Carbon.h; include only
needed files instead. Fix bug in parsing of upwardly
relative paths under classic (thanks to Lawrence You).
2000-12-22 Tinny Ng XMLUni::fgEmptyString which is defined as "EMPTY" is
incorrectly used as an empty string; in fact
XMLUni::fgZeroLenString should be used instead.
2000-12-22 Tinny Ng Add the new header LexicalHandler.hpp to Makefile.in.
2000-12-22 Murray Cumming removes '-instances=static' from the Linux link sections.
- 138-
Chapter 34 - Releases Archive Xerces-C++ Documentation
- 139-
Chapter 34 - Releases Archive Xerces-C++ Documentation
2000-10-19 Andy Heninger SAXCount sample, allow multiple files on command line.
DOMCount sample, rename error handler class to say
that it is an error handler.
2000-10-18 James Berry MacOS project file updates. Small code optimization.
Add comments to clarify and to reflect new fixed XMLCh
size.
2000-10-17 Andy Heninger Bug Fix - problems with multi-byte characters on input
buffer boundaries.
2000-10-17 Andy Heninger DOMPRintFormatTarget, bad override of writeChars
fixed (missing const). XMLFormatTarget, removed
version of writeChars with no length. Can not be safely
used, and obscured other errors.
2000-10-16 Andy Heninger Change XMLCh back to unsigned short on all platforms
2000-10-13 Devin Barnhart COM: interpret BSTR as UTF-16 in documents
2000-10-13 Edward Bortner Solaris: change detection for native support for type bool
to defined(_BOOL).
2000-10-13 Nadav Aharoni MXLString::trim() bug fix: failure to null terminate result.
2000-10-10 Bill Schindler XMLFormatter: Fix problems with output to multi-byte
encodings.
2000-10-10 Andy Heninger From Janitor, remove the addition that is having compile
problems in MSVC.
2000-10-10 James Berry Fix a bug in returned length of transcoded string. Add a
few comments.
2000-10-09 James Berry ProjectBuilder project to build Xerces.
2000-10-09 James Berry Numerous Changes: - Increase environmental
sensitivity with hope of supporting pre OS 9 OS
versions. - Enhanced path creation/interpretation to
support proper unix style paths under Mac OS X instead
of the volume rooted paths we previously used. Paths
under Classic remain the same. - Better timer
resolution. - Detect functionality via unresolved symbols
rather than Gestalt where possible. - Softly back away
from URLAccess...if it's not installed, we just don't
support a net accessor. - Additional support for
XMLCh/UniChar size differences under GCC on Mac OS
X. - Fix Mac OS X support. GCC in this environment
sets wchar_t to a 32 bit value which requires an
additional transcoding stage (bleh...) - Improve
sensitivity to environment in order to support a broader
range of system versions. - Fix a few compiler
sensitivities. - Carbon.h header support
2000-10-09 James Berry Add some auto_ptr functionality to allow modification of
monitored pointer value. This eases use of Janitor in
some situations.
2000-10-09 James Berry Autosense.hpp: modify sensing of Mac OS X.
2000-09-28 Andy Heninger DOM_Document::putIdentifier() removed. There never
was an implementation for this function.
2000-09-28 Curt Arnold COM wrappers updated.
2000-09-28 Linda Swan AS400 related changes.
- 140-
Chapter 34 - Releases Archive Xerces-C++ Documentation
- 142-
Chapter 34 - Releases Archive Xerces-C++ Documentation
- 143-
Chapter 34 - Releases Archive Xerces-C++ Documentation
2000-07-20 Andy Heninger Improved net access (parse of URLs). Still weak,
though.
2000-07-20 Erik Schroeder XMLScaner.cpp bugfix: call startDocument() at
beginning of scan.
2000-07-20 Arundhati Bhowmick DOMCount exception handling cleaned up.
2000-07-19 Todd Collins runConfigure: modified to take "configureoptions"
2000-07-19 <check> Add 'make install' target to
src/util/Platforms/Makefile.in
2000-07-19 <check> DOM: BugFix: DocumentType nodes can not
have children.
2000-07-19 <check> DOM: Bug in NodeIDMap constructor.
2000-07-18 Anupam Bagchi Documentation generation tools updated.
2000-07-17 James Berry Mac OS port brought up to date (was very old)
2000-07-17 Andy Heninger Change windows project to link with ws2_32.lib
instead of winsock32.lib
2000-07-17 Grace Yan, Joe Kesselman DOM NodeIterator: bug fix for SHOW_ELEMENT
flag incorrectly being retrieved.
2000-07-17 Joe Polastre switched scanMisc() with endDoc() in scanNext.
Pointed out by Dean Roddey.
2000-07-17 Jim Reitz fix for uninitialized variable gotData bug in
XMLScanner.cpp.
2000-07-12 Arundhati Bhowmick DOM: fix bug in setting previous sibling pointer
during insertNode
2000-07-07 Joe Polastre Update to use of hashtables.
2000-07-07 Joe Polastre DOM userdata: several bug fixes.
2000-07-06 Andy Heninger Speedups in XMLScanner, XMLReader
2000-07-07 <check> bug fixes in IXMLDOM*
2000-07-06 Joe Polastre Performance tweaks, added more inlines.
2000-07-05 Anupam Bagchi Documentation updates.
2000-07-05 Joe Polastre DOM: Attribute node default value handling
implemented.
2000-07-05 Joe Polastre DOM Attr nodes - fixed setting of specified when
cloning. (change may be in error)
2000-07-04 Dean Roddey Fixed a memory leak when namespaces are
enabled.
2000-06-28 Curt Arnold COM object usage documentation update.
2000-06-28 Joe Polastre DOM Userdata - put pointers in a hash table
rather than having one pre-allocated per node.
Memory footprint reduction.
2000-06-27 Joe Polastre extended the (implementation) hash table
classes.
2000-06-26 John [email protected] Bug fix: check if initialized in Terminate() to stop
access violations.
2000-06-26 <check> Solaris build - template directory related
changes.
2000-06-22 Joe Polastre DOM Attr nodes, specified flag not set correctly by
parser. Fixed.
2000-06-20 Rahul, Joe, Arundhati Many doc updates in preparation for release of
version 1.2
2000-06-19 Rahul Jain Update Package Binaries script to build Xerces with
ICU.
2000-06-19 Joe Polastre Added help messages to PParse and StdInParse
samples.
2000-06-19 Joe Polastre Changed "XML4C" to "Xerces-C" in DOMPrint.
(Missed in earlier mass name change.)
2000-06-19 Arundhati Bhowmick Moved version.incl up one directory level.
2000-06-19 Curt Arnold Improved Windows project file.
2000-06-16 John Smirl Bug Fix: Document Handler was not called for PIs
occurring before the document element. Bug
identified by John Smirl and Rich Taylor
2000-06-16 Rahul Jain DOMPrint, SAXPrint: remove extra space in printing
PIs.
2000-06-16 Rahul Jain Windows Debug Build: add 'D' suffix to DLL name in
VCPPDefs.hpp
2000-06-16 Rahul Jain Samples: added -v option (validate always). Needed
for testing scripts.
2000-06-14 Joe Polastre Fixed null ptr failures in DOM NamedNodeMap
2000-06-12 Andy Heninger Fixed bug in XMLString::trim(), reported by Michele
Laghi
2000-06-07 Joe Polastre DOM: reduced memory usage for elements with no
attributes.
2000-06-01 Andy Heninger DOMString - add const to return type of const
XMLCh *DOMString::rawBuffer()
2000-06-01 Arundhati Bhowmick Fix crash with Solaris optimized build. Modified
XMLURL.cpp to dodge compiler code generation
error.
2000-06-01 Joe Polastre Bug fix: DOM Attr Specified flag was incorrectly set
when cloning or importing attributes.
2000-05-31 Andy Heninger MSVC projects modified to produce separate debug
and release versions of Xerces lib and dll.
2000-05-31 Rahul Jain Bug fix: DOMPrint, SAXPrint produced garbage
output on Solaris. Solaris library problem.
2000-05-31 Joe Polastre Fixed incorrect error check for end of file in Win32
platform utils.
2000-05-31 Rahul Jain DOMPrint enhancements. Add options for specifying
character encoding of the output, better control over
escaping of characters, better handling of CDATA
sections. Default validation is now "auto"
2000-05-22 Dean Roddey XMLFormatter now escapes characters, as reqd.,
occurring midway in strings. Reported by Hugo
Duncan.
2000-05-22 Andy Heninger Bug fix in implementation of
DOM_Document::GetElementById()
- 145-
Chapter 34 - Releases Archive Xerces-C++ Documentation
2000-05-18 Anupam Bagchi Documentation, DTD for source xml files moved into
xerces-c project, sbk: prefixes removed, xml can
now be validated locally.
2000-05-15 Dean Fixed 'fatal error' when 'reusing the validator' problem
reported
by Rocky Raccoon ([email protected]). Fix
submitted by
Dean Roddey ([email protected]).
2000-05-15 James Berry Changed #include <memory.h> to <string.h>
everywhere. <[email protected]>
2000-05-15 Andy H. DOMTest: removed incorrectly failing entity tests
2000-05-12 Andy H. Revised implementation of
DOMDocument::getElementsById(), removed
memory leaks, new test program for it.
2000-05-12 Dean Bug fix - A PE ref appearing at the start of a skipped
conditional section
was incorrectly being processed rather than ignored.
Fix from Dean Roddey.
2000-05-11 Rahul Jain Start using the socket based netaccessor by default
on most Unix platforms.
2000-05-11 Rahul Jain Update ICUTransService to work with latest revision
of ICU which provides a hard linked data DLL. i.e.
icudata.dll will be loaded when xerces-c is loaded.
2000-05-05 Dean Problem with progressive parsing. parseNext() would
through an exception when the document contains
entities, either or external.
2000-05-11 Sean MacRoibeaird Add missing validity checks for stand-alone
documents, character range
and Well-formed parsed entities.
2000-05-10 Radovan Chytracek Fix compilation problems on MSVC 5.
[email protected]>
2000-05-10 Dean Fix XMLReader defect reported by SHOGO SAWAKI
2000-05-09 Andy H Fix problem with Windows filenames containing '\' in
Japanese and Korean encodings.
2000-05-08 Andy H Memory Cleanup. XMLPlatformUtils::Terminate()
deletes all lazily allocated memory
2000-05-05 Dean Fixed defect in progressive parsing 'parseNext()'
reported by Tim Johnston
2000-05-03 Tom Jordahl Fixed Solaris build problems with static character
constants. Tom Jordahl <[email protected]>
2000-04-28 Arnaud LeHors Reduced memory usage for DOM Attributes.
2000-04-28 [email protected] New runConfigure options -P and -C
2000-04-27 Andy H Memory leaks in TransService. Joseph Chen
[email protected]>
2000-04-27 Arnaud LeHors DOM - storage requirements for nodes substantially
reduced.
2000-04-27 Arundhati Added DOM XMLDecl node type; provides access to
XML declaration.
- 146-
Chapter 34 - Releases Archive Xerces-C++ Documentation
- 147-
Chapter 34 - Releases Archive Xerces-C++ Documentation
- 148-
Chapter 34 - Releases Archive Xerces-C++ Documentation
- 149-
35
Future Releases Plan
Current Status
Xerces-C++ 1.7.0 - released on March 8, 2002.
- 150-
Chapter 35 - Future Releases Plan Xerces-C++ Documentation
- 151-
36
Bug Reporting
Search first
Check the Bugzilla [20] database before submitting your bug report to avoid creating a duplicate report.
Even the bug has been reported already, you may add a comment to the existing report since your
contribution may lead to a quicker identification/resolution to the bug reported.
- 152-
Chapter 36 - Bug Reporting Xerces-C++ Documentation
are all necessary information to allow developer to reproduce, identify, evaluate and eventually, fix the
bug, which is the very purpose of your reporting of the bug.
- 153-
37
Feedback Procedures
Questions or Comments
Please browse through this bundled documentation completely. Most of the common questions have been
answered in the FAQ's. Specifically, do read the answer to " Is there any kind of support available for
Xerces-C++?". Browsing this documentation, may be the quickest way to get an answer. Of course, if all
else fails, as mentioned in the link above, you can post a question to the Xerces-C++ mailing list [15] .
See Bug Reporting if you would like to report a defect (greatly appreciated!).
Acknowledgements
Ever since this source code base was initially created, many people have helped to port the code to
different platforms, and provided patches for both new features and bug fixes.
Listed below are some names (in alphabetical order) of people to whom we would like to give special
thanks.
· Nadav Aharoni
· Curt Arnold
· Edward Avis
· Anupam Bagchi
· Torbjörn Bäckström
· Frank Balluffi
· Matthew Baker
· Devin Barnhart
· James Berry
· David Bertoni
· John Bellardo
· Arundhati Bhowmick
· Edward Bortner
· Sean Bright
· Phil Brown
· Sumit Chawla
· Nick Chiang
· Chih Hsiang Chou
· Radovan Chytracek
· Hiram Clawson
· John Clayton
· Todd Collins
- 154-
Chapter 37 - Feedback Procedures Xerces-C++ Documentation
· Michael Crawford
· Murray Cumming
· Helmut Eiken
· Mark Everline
· Simon Fell
· Paul Ferguson
· Pierpaolo Fumagalli
· Gary Gale
· Max Gotlib
· Petr Gotthard
· Susan Hardenbrook
· Jeff Harrell
· Andy Heninger
· William L. Hopper
· Michael Huedepohl
· Rahul Jain
· Tom Jordahl
· Christopher Just
· Martin Kalen
· Joe Kesselman
· Artur Klauser
· Bob Kline
· Richard Ko
· Paul Kramer
· Volker Krause
· Arnaud LeHors
· Andy Levine
· Jeff Lewis
· Matt Lovett
· Sean MacRoibeaird
· Alberto Massari
· Don Mastrovito
· David McCreedy
· Jordan Naftolin
· Tinny Ng
· David Nickerson
· Khaled Noaman
· Michael Ottati
· Kevin Philips
· Mike Pogue
· Joe Polastre
· John Ponzo
· Shengkai Qu
· Gareth Reakes
· Jim Reitz
· Dean Roddey
· John Roper
· Steven Rosenthal
- 155-
Chapter 37 - Feedback Procedures Xerces-C++ Documentation
· Erik Rydgren
· Bill Schindler
· Erik Schroeder
· Christian Schuhegger
· John Smirl
· Andrei Smirnov
· Gereon Steffens
· Jason Stewart
· Rick J. Stevens
· Roman Sulzhyk
· Linda M. Swan
· Pieter Van-Dyck
· Curtis Walker
· John Warrier
· Tom Watson
· Mark Weaver
· Roger Webster
· Robert Weir
· Carolyn Weiss
· Kari Whitcomb
· Dietrich Wolf
· Kirk Wylie
· Peter A. Volchek
· Grace Yan
· PeiYong Zhang
· Henry Zongaro
- 156-
38
Y2K Compliance
- 157-
39
PDF Documentation
PDF Documentation
You can get the entire Xerces-C++ documentation in PDF format, xerces-c.pdf [40] (or in zipped format
xerces-c.pdf.tar.gz [41] ), for printing and offline reference.
Note: A word of caution! The tools to create the PDF documentation are still
experimental. So the resulting PDF document is not perfect. We would be glad to
receive your comments on the Xerces-C++ mailing list [15] .
- 158-
Appendix A
Links Reference
[1] http://www.w3.org/XML/
[2] http://www.w3.org/TR/REC-xml
[3] http://www.w3.org/TR/REC-DOM-Level-1/
[4] http://www.w3.org/TR/DOM-Level-2-Core/
[5] http://sax.sourceforge.net/?selected=sax1
[6] http://sax.sourceforge.net/
[7] http://www.w3.org/TR/REC-xml-names/
[8] http://www.w3.org/XML/Schema.html
[9] http://www-4.ibm.com/software/ad/vacpp/
[10] http://www-4.ibm.com/software/ad/vacpp/service/csd.html
[11] http://oss.software.ibm.com/icu/
[12] http://www.gnu.org
[13] http://www.gnu.org/software/autoconf/autoconf.html
[14] http://www.gnu.org/software/make/make.html
[15] mailto:[email protected]
[16] http://www.gnu.org/software/gcc/gcc.html
[17] http://oss.software.ibm.com/developerworks/opensource/
[18] http://oss.software.ibm.com/icu/download/index.html
[19] http://marc.theaimsgroup.com/?l=xerces-c-dev
[20] http://nagoya.apache.org/bugzilla/
[21] http://www.stack.nl/~dimitri/doxygen/
[22] http://www.research.att.com/sw/tools/graphviz/
[23] http://www.w3.org/TR/xmlschema-0/
[24] http://www.w3.org/TR/xmlschema-1/
[25] http://www.w3.org/TR/xmlschema-2/
- 159-
Xerces-C++ Documentation
Appendix A - Links Reference
[26] http://www.x.org/terms.htm
[27] http://www.alphaworks.ibm.com/tech/xml4c
[28] http://xml.apache.org/xerces-c/index.html
[29] http://xml.apache.org/dist/xerces-c/
[30] http://xml.apache.org/
[31] http://xml.apache.org/cocoon/index.html
[32] http://www.gnu.org/software/tar/tar.html
[33] http://www.gnu.org/manual/tar/html_node/tar_117.html#SEC112
[34] http://sunsolve.sun.com
[35] http://www.w3.org/TR/REC-xml#charsets
[36] http://www.iana.org/assignments/character-sets
[37] http://xml.apache.org/xerces-j/index.html
[38] http://www.oasis-open.org/cover/xml.html
[39] http://www.w3.org/TR/xhtml1/xhtml1.zip
[40] http://xml.apache.org/xerces-c/pdf/xerces-c.pdf
[41] http://xml.apache.org/xerces-c/pdf/xerces-c.pdf.tar.gz
- 160-