Advances in Applied Artificial Intelligence 1st Edition by John Fulcher 9781591408291 1591408296 PDF Download
Advances in Applied Artificial Intelligence 1st Edition by John Fulcher 9781591408291 1591408296 PDF Download
https://ebookball.com/product/advances-in-applied-artificial-
intelligence-1st-edition-by-john-
fulcher-9781591408291-1591408296-10194/
https://ebookball.com/product/ki-2006-advances-in-artificial-
intelligence-1st-edition-by-anique-isbn-13700/
https://ebookball.com/product/ki-2007-advances-in-artificial-
intelligence-1st-edition-by-anique-isbn-13698/
https://ebookball.com/product/ai-2011-advances-in-artificial-
intelligence-lecture-notes-in-artificial-intelligence-1st-
edition-by-dianhui-wang-mark-
reynolds-7106-9783642258329-3642258328-11140/
https://ebookball.com/product/advances-in-artificial-
intelligence-software-and-systems-engineering-1st-edition-by-
tareq-ahram-3030513270-9783030513276-16056/
Advances in Artificial Intelligence Sbia 2004 1st edition by Ana
Bazzan, Sofiane Labidi 3662203332 978-3662203330
https://ebookball.com/product/advances-in-artificial-
intelligence-sbia-2004-1st-edition-by-ana-bazzan-sofiane-
labidi-3662203332-978-3662203330-13878/
https://ebookball.com/product/advances-in-artificial-
intelligence-sbia-2004-17th-edition-by-ana-bazzan-sofiane-
labidi-3540232370-9783540232377-19498/
https://ebookball.com/product/advances-in-artificial-
intelligence-sbia-2004-17th-edition-by-ana-bazzan-sofiane-
labidi-3540232370-9783540232377-19490/
https://ebookball.com/product/advances-in-logic-artificial-
intelligence-and-robotics-1st-edition-by-jair-abe-joao-ida-silva-
filho-isbn-1586032925-19500/
https://ebookball.com/product/ki-2004-advances-in-artificial-
intelligence-27th-edition-by-susanne-biundo-thom-fra1-4hwirth-
ga1-4nther-palm-9783540302216-19488/
Advances in Applied
Artificial Intelligence
Copyright © 2006 by Idea Group Inc. All rights reserved. No part of this book may be
reproduced, stored or distributed in any form or by any means, electronic or mechanical,
including photocopying, without written permission from the publisher.
Product or company names used in this book are for identification purposes only.
Inclusion of the names of the products or companies does not indicate a claim of
ownership by IGI of the trademark or registered trademark.
All work contributed to this book is new, previously-unpublished material. The views
expressed in this book are those of the authors, but not necessarily of the publisher.
IGP Forthcoming Titles in the
Computational Intelligence and
Its Applications Series
Biometric Image Discrimination Technologies
(February 2006 release)
David Zhang, Xiaoyuan Jing and Jian Yang
ISBN: 1-59140-830-X
Paperback ISBN: 1-59140-831-8
eISBN: 1-59140-832-6
Preface ........................................................................................................................viii
Chapter I
Soft Computing Paradigms and Regression Trees in Decision Support Systems .......1
Cong Tran, University of South Australia, Australia
Ajith Abraham, Chung-Ang University, Korea
Lakhmi Jain, University of South Australia, Australia
Chapter II
Application of Text Mining Methodologies to Health Insurance Schedules .............. 29
Ah Chung Tsoi, Monash University, Australia
Phuong Kim To, Tedis P/L, Australia
Markus Hagenbuchner, University of Wollongong, Australia
Chapter III
Coordinating Agent Interactions Under Open Environments .................................... 52
Quan Bai, University of Wollongong, Australia
Minjie Zhang, University of Wollongong, Australia
Chapter IV
Literacy by Way of Automatic Speech Recognition ................................................... 68
Russell Gluck, University of Wollongong, Australia
John Fulcher, University of Wollongong, Australia
Chapter V
Smart Cars: The Next Frontier ................................................................................ 120
Lars Petersson, National ICT Australia, Australia
Luke Fletcher, Australian National University, Australia
Nick Barnes, National ICT Australia, Australia
Alexander Zelinsky, CSIRO ICT Centre, Australia
Chapter VI
The Application of Swarm Intelligence to Collective Robots .................................... 157
Amanda J. C. Sharkey, University of Sheffield, UK
Noel Sharkey, University of Sheffield, UK
Chapter VII
Self-Organising Impact Sensing Networks in Robust Aerospace Vehicles ........... 186
Mikhail Prokopenko, CSIRO Information and Communication
Technology Centre and CSIRO Industrial Physics, Australia
Geoff Poulton, CSIRO Information and Communication
Technology Centre and CSIRO Industrial Physics, Australia
Don Price, CSIRO Information and Communication
Technology Centre and CSIRO Industrial Physics, Australia
Peter Wang, CSIRO Information and Communication
Technology Centre and CSIRO Industrial Physics, Australia
Philip Valencia, CSIRO Information and Communication
Technology Centre and CSIRO Industrial Physics, Australia
Nigel Hoschke, CSIRO Information and Communication
Technology Centre and CSIRO Industrial Physics, Australia
Tony Farmer, CSIRO Information and Communication
Technology Centre and CSIRO Industrial Physics, Australia
Mark Hedley, CSIRO Information and Communication
Technology Centre and CSIRO Industrial Physics, Australia
Chris Lewis, CSIRO Information and Communication
Technology Centre and CSIRO Industrial Physics, Australia
Andrew Scott, CSIRO Information and Communication
Technology Centre and CSIRO Industrial Physics, Australia
Chapter VIII
Knowledge Through Evolution .................................................................................. 234
Russell Beale, University of Birmingham, UK
Andy Pryke, University of Birmingham, UK
Chapter IX
Neural Networks for the Classification of Benign and Malignant Patterns in
Digital Mammograms ............................................................................................... 251
Brijesh Verma, Central Queensland University, Australia
Rinku Panchal, Central Queensland University, Australia
Chapter X
Swarm Intelligence and the Taguchi Method for Identification of Fuzzy Models .... 273
Arun Khosla, National Institute of Technology, Jalandhar, India
Shakti Kumar, Haryana Engineering College, Jalandhar, India
K. K. Aggarwal, GGS Indraprastha University, Delhi, India
Preface
Similarly, the rationale behind agent programs is based on a belief that we become
intelligent within our social groups. A single human raised in isolation will never be as
intelligent as one who comes into daily contact with others throughout his or her
developing life. Note that for this to be true, it is also required that the agent be able to
learn in some way to modulate its actions and responses to those of the group. There-
fore, a pre-programmed agent will not be as strong as an agent which is given the ability
to dynamically change its behaviour over time. The evolutionary approach too shares
this view in that the final population is not a pre-programmed solution to a problem, but
rather emerges through the processes of survival-of-the fittest and their reproduction
with inaccuracies.
Whether any one technology will prove to be the central one in creating artificial
intelligence or whether a combination of technologies will be necessary to create an
artificial intelligence is still an open question, so many scientists are experimenting
with mixtures of such techniques.
In this volume, we see such questions implicitly addressed by scientists tackling
specific problems which require intelligence with both individual and combinations of
specific artificial intelligence techniques.
authors give an excellent review of the main techniques currently used including hid-
den Markov models, linear predictive coding, dynamic time warping, and artificial neu-
ral networks with the authors’ familiarity with the nuts-and-bolts of the techniques
being evident in the detail with which they discuss each technique. For example, the
artificial neural network section discusses not only the standard back propagation
algorithm and self-organizing maps, but also recurrent neural networks and the related
time-delay neural networks. However, the main topic of the chapter is the review of the
draw-talk-write approach to literacy which has been ongoing research for almost a
decade. Most recent work has seen this technique automated using several of the
techniques discussed above. The result is a socially-useful method which is still in
development but shows a great deal of potential.
Petersson, Fletcher, Barnes, and Zelinsky turn our attention to their Smart Cars
project in Chapter V. This deals with the intricacies of Driver Assistance Systems,
enhancing the driver’s ability to drive rather than replacing the driver. Much of their
work is with monitoring systems, but they also have strong reasoning systems which,
since the work involves keeping the driver in the loop, must be intuitive and explana-
tory. The system involves a number of different technologies for different parts of the
system: Naturally, since this is a real-world application, much of the data acquired is
noisy, so statistical methods and probabilistic modelling play a big part in their system,
while support vectors are used for object-classification.
Amanda and Noel Sharkey take a more technique-driven approach in Chapter VI
when they investigate the application of swarm techniques to collective robotics. Many
of the issues such as communication which arise in swarm intelligence mirror those of
multi-agent systems, but one of the defining attributes of swarms is that the individual
components should be extremely simple, a constraint which does not appear in multi-
agent systems. The Sharkeys enumerate the main components of such a system as
being composed of a group of simple agents which are autonomous, can communicate
only locally, and are biologically inspired. Each of these properties is discussed in
some detail in Chapter VI. Sometimes these techniques are combined with artificial
neural networks to control the individual agents or genetic algorithms, for example, for
developing control systems. The application to robotics gives a fascinating case-study.
In Chapter VII, the topic of structural health management (SHM) is introduced.
This “is a new approach to monitoring and maintaining the integrity and performance
of structures as they age and/or sustain damage”, and Prokopenko and his co-authors
are particularly interested in applying this to aerospace systems in which there are
inherent difficulties, in that they are operating under extreme conditions. A multi-agent
system is created to handle the various sub-tasks necessary in such a system, which is
created using an interaction between top-down dissection of the tasks to be done with
a bottom-up set of solutions for specific tasks. Interestingly, they consider that most of
the bottom-up development should be based on self-organising principles, which means
that the top-down dissection has to be very precise. Since they have a multi-agent
system, communication between the agents is a priority: They create a system whereby
only neighbours can communicate with one another, believing that this gives robust-
ness to the whole system in that there are then multiple channels of communication.
Their discussion of chaotic regimes and self-repair systems provides a fascinating
insight into the type of system which NASA is currently investigating. This chapter
places self-referentiability as a central factor in evolving multi-agent systems.
xi
In Chapter VIII, Beale and Pryke make an elegant case for using computer algo-
rithms for the tasks for which they are best suited, while retaining human input into any
investigation for the tasks for which the human is best suited. In an exploratory data
investigation, for example, it may one day be interesting to identify clusters in a data
set, another day it may be more interesting to identify outliers, while a third day may see
the item of interest shift to the manifold in which the data lies. These aspects are
specific to an individual’s interests and will change in time; therefore, they develop a
mechanism by which the human user can determine the criterion of interest for a spe-
cific data set so that the algorithm can optimise the view of the data given to the human,
taking into account this criterion. They discuss trading accuracy for understanding in
that, if presenting 80% of a solution makes it more accessible to human understanding
than a possible 100% solution, it may be preferable to take the 80% solution. A combi-
nation of evolutionary algorithms and a type of spring model are used to generate
interesting views.
Chapter IX sees an investigation by Verma and Panchal into the use of neural
networks for digital mammography. The whole process is discussed here from collec-
tion of data, early detection of suspicious areas, area extraction, feature extraction and
selection, and finally the classification of patterns into ‘benign’ or ‘malignant’. An
extensive review of the literature is given, followed by a case study on some benchmark
data sets. Finally the authors make a plea for more use of standard data sets, something
that will meet with heartfelt agreement from other researchers who have tried to com-
pare different methods which one finds in the literature.
In Chapter X, Khosla, Kumar, and Aggarwal report on the application of particle
swarm optimisation and the Taguchi method to the derivation of optimal fuzzy models
from the available data. The authors emphasize the importance of selecting appropriate
PSO strategies and parameters for such tasks, as these impact significantly on perfor-
mance. Their approach is validated by way of data from a rapid Ni-Cd battery charger.
As we see, the chapters in this volume represent a wide spectrum of work, and
each is self-contained. Therefore, the reader can dip into this book in any order he/she
wishes. There are also extensive references within each chapter which an interested
reader may wish to pursue, so this book can be used as a central resource from which
major avenues of research may be approached.
Chapter I
Soft Computing
Paradigms and
Regression Trees in
Decision Support Systems
Cong Tran, University of South Australia, Australia
ABSTRACT
Decision-making is a process of choosing among alternative courses of action for
solving complicated problems where multi-criteria objectives are involved. The past
few years have witnessed a growing recognition of soft computing (SC) (Zadeh, 1998)
technologies that underlie the conception, design, and utilization of intelligent
systems. In this chapter, we present different SC paradigms involving an artificial
neural network (Zurada, 1992) trained by using the scaled conjugate gradient
algorithm (Moller, 1993), two different fuzzy inference methods (Abraham, 2001)
optimised by using neural network learning/evolutionary algorithms (Fogel, 1999),
and regression trees (Breiman, Friedman, Olshen, & Stone, 1984) for developing
intelligent decision support systems (Tran, Abraham, & Jain, 2004). We demonstrate
the efficiency of the different algorithms by developing a decision support system for
a tactical air combat environment (TACE) (Tran & Zahid, 2000). Some empirical
comparisons between the different algorithms are also provided.
Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
2 Tran, Abraham & Jain
INTRODUCTION
Several decision support systems have been developed in various fields including
medical diagnosis (Adibi, Ghoreishi, Fahimi, & Maleki, 1993), business management,
control system (Takagi & Sugeno, 1983), command and control of defence and air traffic
control (Chappel, 1992), and so on. Usually previous experience or expert knowledge is
often used to design decision support systems. The task becomes interesting when no
prior knowledge is available. The need for an intelligent mechanism for decision support
comes from the well-known limits of human knowledge processing. It has been noticed
that the need for support for human decision-makers is due to four kinds of limits:
cognitive, economic, time, and competitive demands (Holsapple & Whinston, 1996).
Several artificial intelligence techniques have been explored to construct adaptive
decision support systems. A framework that could capture imprecision, uncertainty,
learn from the data/information, and continuously optimise the solution by providing
interpretable decision rules, would be the ideal technique. Several adaptive learning
frameworks for constructing intelligent decision support systems have been proposed
(Cattral, Oppacher, & Deogo, 1999; Hung, 1993; Jagielska, 1998; Tran, Jain, & Abraham,
2002b). Figure 1 summarizes the basic functional aspects of a decision support system.
A database is created from the available data and human knowledge. The learning
process then builds up the decision rules. The developed rules are further fine-tuned,
depending upon the quality of the solution, using a supervised learning process.
To develop an intelligent decision support system, we need a holistic view on the
various tasks to be carried out including data management and knowledge management
(reasoning techniques). The focus of this chapter is knowledge management (Tran &
Zahid, 2000), which consists of facts and inference rules used for reasoning (Abraham,
2000).
Fuzzy logic (Zadeh, 1973), when applied to decision support systems, provides
formal methodology to capture valid patterns of reasoning about uncertainty. Artificial
neural networks (ANNs) are popularly known as black-box function approximators.
Recent research work shows the capabilities of rule extraction from a trained network
positions neuro-computing as a good decision support tool (Setiono, 2000; Setiono,
Leow, & Zurada, 2002). Recently evolutionary computation (EC) (Fogel, 1999) has been
successful as a powerful global optimisation tool due to the success in several problem
domains (Abraham, 2002; Cortes, Larrañeta, Onieva, García, & Caraballo, 2001;
Ponnuswamy, Amin, Jha, & Castañon, 1997; Tan & Li, 2001; Tan, Yu, Heng, & Lee, 2003).
EC works by simulating evolution on a computer by iterative generation and alteration
processes, operating on a set of candidate solutions that form a population. Due to the
complementarity of neural networks, fuzzy inference systems, and evolutionary compu-
tation, the recent trend is to fuse various systems to form a more powerful integrated
system, to overcome their individual weakness.
Decision trees (Breiman et al., 1984) have emerged as a powerful machine-learning
technique due to a simple, apparent, and fast reasoning process. Decision trees can be
related to artificial neural networks by mapping them into a class of ANNs or entropy nets
with far fewer connections.
In the next section, we present the complexity of the tactical air combat decision
support system (TACDSS) (Tran, Abraham, & Jain, 2002c), followed by some theoretical
foundation on neural networks, fuzzy inference systems, and decision trees in the
Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Soft Computing Paradigms and Regression Trees 3
M a s te r da ta s e t Le ar ning pr oc e s s
Ac c e pta bl e
E nd
following section. We then present different adaptation procedures for optimising fuzzy
inference systems. A Takagi-Sugeno (Takagi & Sugeno, 1983; Sugeno, 1985) and
Mamdani-Assilian (Mamdani & Assilian, 1975) fuzzy inference system learned by using
neural network learning techniques and evolutionary computation is discussed. Experi-
mental results using the different connectionist paradigms follow. Detailed discussions
of these results are presented in the last section, and conclusions are drawn.
where i is called the set index, the symbol “|” is read as “such that” and Rn is the set of
n real numbers. A sub-set “A” of X, denoted A⊆ X, is a set of elements that is contained
within the universal set X. For optimal decision-making, the system should be able to
Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
4 Tran, Abraham & Jain
adaptively process the information provided by words or any natural language descrip-
tion of the problem environment.
To illustrate the proposed approach, we consider a case study based on a tactical
environment problem. We aim to develop an environment decision support system for
a pilot or mission commander in tactical air combat. We will attempt to present the
complexity of the problem with some typical scenarios. In Figure 2, the Airborne Early
Warning and Control (AEW&C) is performing surveillance in a particular area of
operation. It has two Hornets (F/A-18s) under its control at the ground base shown as
“+” in the left corner of Figure 2. An air-to-air fuel tanker (KB707) “o” is on station —
the location and status of which are known to the AEW&C. One of the Hornets is on patrol
in the area of Combat Air Patrol (CAP). Sometime later, the AEW&C on-board sensors
detect hostile aircraft(s) shown as “O”. When the hostile aircraft enter the surveillance
region (shown as a dashed circle), the mission system software is able to identify the
enemy aircraft and estimate their distance from the Hornets in the ground base or in the
CAP.
The mission operator has few options to make a decision on the allocation of
Hornets to intercept the enemy aircraft:
• Send the Hornet directly to the spotted area and intercept,
• Call the Hornet in the area back to ground base or send another Hornet from the
ground base.
• Call the Hornet in the area for refuel before intercepting the enemy aircraft.
The mission operator will base his/her decisions on a number of factors, such as:
• Fuel reserve and weapon status of the Hornet in the area,
• Interrupt time of Hornets in the ground base or at the CAP to stop the hostile,
• The speed of the enemy fighter aircraft and the type of weapons it possesses.
Surveillance
Hostiles
Boundary
Fighter on CAP
Tanker aircraft
Fighters at ground base
Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Soft Computing Paradigms and Regression Trees 5
From the above scenario, it is evident that there are important decision factors of
the tactical environment that might directly affect the air combat decision. For demon-
strating our proposed approach, we will simplify the problem by handling only a few
important decision factors such as “fuel status”, “interrupt time” (Hornets in the ground
base and in the area of CAP), “weapon possession status”, and “situation awareness”
(Table 1). The developed tactical air combat decision rules (Abraham & Jain, 2002c)
should be able to incorporate all the above-mentioned decision factors.
Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
6 Tran, Abraham & Jain
• The decision selection will have a high value if the fuel reserve is full, the interrupt
time is fast enough, the Hornet has high weapon status, and the FOE danger is low.
In TACE, decision-making is always based on all states of all the decision factors.
However, sometimes a mission operator/commander can make a decision based on an
important factor, such as: The fuel reserve of the Hornet is too low (due to high fuel use),
the enemy has more powerful weapons, and the quality and quantity of enemy aircraft.
Table 2 shows the decision score at each stage of the TACE.
Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Soft Computing Paradigms and Regression Trees 7
of our own prejudices about an application of interest. The training patterns can be
thought of as a set of ordered pairs {(x1, y1), (x2, y2) ,..,(xp, y p)} where xi represents an input
pattern and yi represents the output pattern vector associated with the input vector xi.
A valuable property of neural networks is that of generalisation, whereby a trained
neural network is able to provide a correct matching in the form of output data for a set
of previously-unseen input data. Learning typically occurs through training, where the
training algorithm iteratively adjusts the connection weights (synapses). In the conju-
gate gradient algorithm (CGA), a search is performed along conjugate directions, which
produces generally faster convergence than steepest descent directions. A search is
made along the conjugate gradient direction to determine the step size, which will
minimise the performance function along that line. A line search is performed to determine
the optimal distance to move along the current search direction. Then the next search
direction is determined so that it is conjugate to the previous search direction. The
general procedure for determining the new search direction is to combine the new
steepest descent direction with the previous search direction. An important feature of
CGA is that the minimization performed in one step is not partially undone by the next,
as is the case with gradient descent methods. An important drawback of CGA is the
requirement of a line search, which is computationally expensive. The scaled conjugate
gradient algorithm (SCGA) (Moller, 1993) was designed to avoid the time-consuming line
search at each iteration, and incorporates the model-trust region approach used in the
CGA Levenberg-Marquardt algorithm (Abraham, 2002).
Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
8 Tran, Abraham & Jain
We now introduce two different fuzzy inference systems that have been widely
employed in various applications. These fuzzy systems feature different consequents in
their rules, and thus their aggregation and defuzzification procedures differ accordingly.
Most fuzzy systems employ the inference method proposed by Mamdani-Assilian
in which the rule consequence is defined by fuzzy sets and has the following structure
(Mamdani & Assilian, 1975):
Takagi and Sugeno (1983) proposed an inference scheme in which the conclusion
of a fuzzy rule is constituted by a weighted linear combination of the crisp inputs rather
than a fuzzy set, and which has the following structure:
A Takagi-Sugeno FIS usually needs a smaller number of rules, because its output
is already a linear function of the inputs rather than a constant fuzzy set (Abraham, 2001).
Evolutionary Algorithms
Evolutionary algorithms (EAs) are population-based adaptive methods, which may
be used to solve optimisation problems, based on the genetic processes of biological
organisms (Fogel, 1999; Tan et al., 2003). Over many generations, natural populations
evolve according to the principles of natural selection and “survival-of-the-fittest”, first
clearly stated by Charles Darwin in “On the Origin of Species”. By mimicking this process,
EAs are able to “evolve” solutions to real-world problems, if they have been suitably
encoded. The procedure may be written as the difference equation (Fogel, 1999):
Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Soft Computing Paradigms and Regression Trees 9
where x (t) is the population at time t, v is a random operator, and s is the selection
operator. The algorithm is illustrated in Figure 4.
A conventional fuzzy controller makes use of a model of the expert who is in a
position to specify the most important properties of the process. Expert knowledge is
often the main source to design the fuzzy inference systems. According to the perfor-
mance measure of the problem environment, the membership functions and rule bases
are to be adapted. Adaptation of fuzzy inference systems using evolutionary computa-
tion techniques has been widely explored (Abraham & Nath, 2000a, 2000b). In the
following section, we will discuss how fuzzy inference systems could be adapted using
neural network learning techniques.
Neuro-Fuzzy Computing
Neuro-fuzzy (NF) (Abraham, 2001) computing is a popular framework for solving
complex problems. If we have knowledge expressed in linguistic rules, we can build a FIS;
if we have data, or can learn from a simulation (training), we can use ANNs. For building
a FIS, we have to specify the fuzzy sets, fuzzy operators, and the knowledge base.
Similarly, for constructing an ANN for an application, the user needs to specify the
architecture and learning algorithm. An analysis reveals that the drawbacks pertaining
to these approaches seem complementary and, therefore, it is natural to consider building
an integrated system combining the concepts. While the learning capability is an
advantage from the viewpoint of FIS, the formation of a linguistic rule base will be
advantageous from the viewpoint of ANN (Abraham, 2001).
In a fused NF architecture, ANN learning algorithms are used to determine the
parameters of the FIS. Fused NF systems share data structures and knowledge represen-
tations. A common way to apply a learning algorithm to a fuzzy system is to represent
it in a special ANN-like architecture. However, the conventional ANN learning algorithm
(gradient descent) cannot be applied directly to such a system as the functions used in
the inference process are usually non-differentiable. This problem can be tackled by
using differentiable functions in the inference system or by not using the standard neural
learning algorithm. Two neuro-fuzzy learning paradigms are presented later in this
chapter.
Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
10 Tran, Abraham & Jain
CART is the most advanced decision tree technology for data analysis, pre-
processing, and predictive modelling. CART is a robust data-analysis tool that automati-
cally searches for important patterns and relationships and quickly uncovers hidden
structure even in highly complex data. CARTs binary decision trees are more sparing with
data and detect more structure before further splitting is impossible or stopped. Splitting
is impossible if only one case remains in a particular node, or if all the cases in that node
are exact copies of each other (on predictor variables). CART also allows splitting to be
stopped for several other reasons, including that a node has too few cases (Steinberg
& Colla, 1995).
Once a terminal node is found, we must decide how to classify all cases falling within
it. One simple criterion is the plurality rule: The group with the greatest representation
determines the class assignment. CART goes a step further: Because each node has the
potential for being a terminal node, a class assignment is made for every node whether
it is terminal or not. The rules of class assignment can be modified from simple plurality
to account for the costs of making a mistake in classification and to adjust for over- or
under-sampling from certain classes.
A common technique among the first generation of tree classifiers was to continue
splitting nodes (growing the tree) until some goodness-of-split criterion failed to be met.
When the quality of a particular split fell below a certain threshold, the tree was not grown
further along that branch. When all branches from the root reached terminal nodes, the
tree was considered complete. Once a maximal tree is generated, it examines smaller trees
obtained by pruning away branches of the maximal tree. Once the maximal tree is grown
and a set of sub-trees is derived from it, CART determines the best tree by testing for error
rates or costs. With sufficient data, the simplest method is to divide the sample into
learning and test sub-samples. The learning sample is used to grow an overly large tree.
The test sample is then used to estimate the rate at which cases are misclassified (possibly
adjusted by misclassification costs). The misclassification error rate is calculated for the
largest tree and also for every sub-tree.
The best sub-tree is the one with the lowest or near-lowest cost, which may be a
relatively small tree. Cross validation is used if data are insufficient for a separate test
sample. In the search for patterns in databases, it is essential to avoid the trap of over-
fitting or finding patterns that apply only to the training data. CARTs embedded test
disciplines ensure that the patterns found will hold up when applied to new data. Further,
the testing and selection of the optimal tree are an integral part of the CART algorithm.
CART handles missing values in the database by substituting surrogate splitters, which
are back-up rules that closely mimic the action of primary splitting rules. The surrogate
splitter contains information that is typically similar to what would be found in the primary
splitter (Steinberg & Colla, 1995).
Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Soft Computing Paradigms and Regression Trees 11
Suppose there are two input linguistic variables (ILV) X and Y and each ILV has three
membership functions (MF) A1, A2 and A3 and B1, B 2 and B3 respectively, then a Takagi-
Sugeno-type fuzzy if-then rule could be set up as:
O1,x = x
O1,y = y (6)
For TACDSS the four inputs are “fuel status”, “weapons inventory levels”, “time
intercept”, and the “danger situation”.
• Layer 2
The output of nodes in this layer is presented as Ol,ip,i,, where ip is the ILV and m
is the degree of membership function of a particular MF.
Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
12 Tran, Abraham & Jain
With three MFs for each input variable, “fuel status” has three membership
functions: full, half, and low, “time intercept” has fast, normal, and slow, “weapon
status” has sufficient, enough, and insufficient, and the “danger situation” has
very dangerous, dangerous, and endangered.
• Layer 3
The output of nodes in this layer is the product of all the incoming signals, denoted
by:
where i = 1,2, and 3, and n is the number of the fuzzy rule. In general, any T-norm
operator will perform the fuzzy ‘AND’ operation in this layer. With four ILV and
three MFs for each input variable, the TACDSS will have 81 (34 = 81) fuzzy if-then
rules.
• Layer 4
The nodes in this layer calculate the ratio of the i th fuzzy rule firing strength (RFS)
to the sum of all RFS.
wn
81
O4,n = w = n
∑w n where n = 1,2,..,81 (9)
n =1
The number of nodes in this layer is the same as the number of nodes in layer-3.
The outputs of this layer are also called normalized firing strengths.
• Layer 5
The nodes in this layer are adaptive, defined as:
where pn, qn, rn are the rule consequent parameters. This layer also has the same
number of nodes as layer-4 (81 numbers).
• Layer 6
The single node in this layer is responsible for the defuzzification process, using
the centre-of-gravity technique to compute the overall output as the summation of
all the incoming signals:
81
81
∑ w fn n
∑w
n =1
O6,1 = n fn = 81 (11)
n =1
∑ wn
n =1
Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Soft Computing Paradigms and Regression Trees 13
w1 w2 wn
f1 f2 fn
ANFIS output f = O 6,1 = ∑w
n
n + ∑w
n
n +…+ ∑w
n
n
where n is the number of nodes in layer 5. From this, the output can be rewritten as
f = F(i,S) (13)
where F is a function, i is the vector of input variables, and S is a set of total parameters
of consequent of the nth fuzzy rule. If there exists a composite function H such that H ⊕
F is linear in some elements of S, then these elements can be identified by the least square
method. If the parameter set is divided into two sets S1 and S2, defined as:
S = S1 ⊕ S2 (14)
where ⊕ represents direct sum and o is the product rule, such that H o F is linear in the
elements of S2, the function f can be represented as:
Given values of S1, the S training data can be substituted into equation 15. H(f) can
be written as the matrix equation of AX = Y, where X is an unknown vector whose elements
are parameters in S2.
If |S2| = M (M being the number of linear parameters), then the dimensions of
matrices A, X and Y are PM, Ml and Pl, respectively. This is a standard linear least-squares
problem and the best solution of X that minimizes ||AX – Y||2 is the least square estimate
(LSE) X*
X* = (ATA)-1ATY (16)
Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
14 Tran, Abraham & Jain
Si ai+1 yiT+1 - Si
Si+1 = Si - , I = 0,1,…, P -1 (18)
1 + a iT+1Si ai + 1
The LSE X* is equal to Xp. The initial conditions of Xi+1 and Si+1 are X0 = 0 and S0
= gI, where g is a positive large number and I is the identity matrix of dimension M × M.
When hybrid learning is applied in batch mode, each epoch is composed of a forward
pass and a backward pass. In the forward pass, the node output I of each layer is
calculated until the corresponding matrices A and Y are obtained. The parameters of S2
are identified by the pseudo inverse equation as mentioned above. After the parameters
of S2 are obtained, the process will compute the error measure for each training data pair.
In the backward pass, the error signals (the derivatives of the error measure with respect
to each node output) propagates from the output to the input end. At the end of the
backward pass, the parameter S1 is updated by the steepest descent method as follows:
∂E
a = −η (19)
∂α
where a is a generic parameter and η is the learning rate and E is an error measure.
k
η= (20)
∂E 2
∑α ( )
∂α
Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Soft Computing Paradigms and Regression Trees 15
(x11, x21, y1) => [x11 (0.8 in half), x21 (0.2 in fast), y1 (0.6 in acceptable)],
Structure learning
Explanation
Knowledge Fuzzy rule
acquisition based
Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
16 Tran, Abraham & Jain
Assign a degree to each rule. To resolve a possible conflict problem, that is, rules
having the same antecedent but a different consequent, and to reduce the number of
rules, we assign a degree to each rule generated from data pairs and accept only the rule
from a conflict group that has a maximum degree. In other words, this step is performed
to delete redundant rules, and therefore obtain a concise fuzzy rule base. The following
product strategy is used to assign a degree to each rule. The degree of the rule is denoted
by:
Note that if two or more generated fuzzy rules have the same preconditions and
consequents, then the rule that has maximum degree is used. In this way, assigning the
degree to each rule, the fuzzy rule base can be adapted or updated by the relative
weighting strategy: The more task-related the rule becomes, the more weight degree the
rule gains. As a result, not only is the conflict problem resolved, but also the number of
rules is reduced significantly. After the structure-learning phase (if-then rules), the
whole network structure is established, and the network enters the second learning phase
to optimally adjust the parameters of the membership functions using a gradient descent
learning algorithm to minimise the error function:
q
1
d1 − y l )
2
E= ∑∑
2 x l =1
( (27)
where d and y are the target and actual outputs for an input x. This approach is very similar
to the MF parameter tuning in ANFIS.
Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Soft Computing Paradigms and Regression Trees 17
Input Output
to be more effective than manual alteration. A similar approach has been taken to optimise
membership function parameters. A simple way is to represent only the parameter
showing the centre of MFs to speed up the adaptation process and to reduce spurious
local minima over the centre and width.
The EA module for adapting FuNN is designed as a stand-alone system for
optimising the MFs if the rules are already available. Both antecedent and consequent
MFs are optimised. Chromosomes are represented as strings of floating-point numbers
rather than strings of bits. In addition, mutation of a gene is implemented as a re-
initialisation, rather than an alteration of the existing allegation. Figure 7 shows the
chromosome structure, including the input and output MF parameters. One point
crossover is used for the chromosome reproduction.
Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
18 Tran, Abraham & Jain
Figure 8. Membership function of the “fuel reserve” ILV (a) before and (b) after
learning
(a)
(b)
Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Soft Computing Paradigms and Regression Trees 19
Test
1.44 1.22 1.78 1.36 2.661 2.910 1.8583 1.8584
RMSE
Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
20 Tran, Abraham & Jain
Population size = 50
Number of generations = 100
Mutation rate = 0.01
We used the tournament selection strategy, and Figure 10 illustrates the learning
convergence during the 100 generations for Datasets A and B. Fifty-four fuzzy if-then
rules were extracted after the learning process. Table 4 summarizes the training and test
performance.
Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Soft Computing Paradigms and Regression Trees 21
Table 5. Training and test performance of neural networks versus decision trees
Data A Data B
Training Testing Training Testing
RMSE
CART 0.00239 0.00319 0.00227 0.00314
Neural 0.00105 0.00095 0.00041 0.00062
Network
Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
22 Tran, Abraham & Jain
Figure 12. Dataset A - Variation of relative error versus the number of terminal nodes
Figure 13. Dataset B - Variation of relative error versus the number of terminal nodes
Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Soft Computing Paradigms and Regression Trees 23
Figure 16. Test results illustrating the efficiency of the different intelligent paradigms
used in developing the TACDSS
nodes as shown in Figure 14, while for Data B, the rest of the tree had 128 terminal nodes
as depicted in Figure 15. Training and test performance are summarized in Table 5.
Figure 16 compares the performance of the different intelligent paradigms used in
developing the TACDSS (for clarity, we have chosen only 20% of the test results for
Dataset B).
DISCUSSION
The focus of this research is to create accurate and highly interpretable (using rules
or tree structures) decision support systems for a tactical air combat environment
problem.
Experimental results using two different datasets revealed the importance of fuzzy
inference engines to construct accurate decision support systems. As expected, by
providing more training data (90% of the randomly-chosen master data set), the models
were able to learn and generalise more accurately. The Takagi-Sugeno fuzzy inference
system has the lowest RMSE on both test datasets. Since learning involves a complicated
Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
24 Tran, Abraham & Jain
procedure, the training process of the Takagi-Sugeno fuzzy inference system took longer
compared to the Mamdani-Assilian fuzzy inference method; hence, there is a compromise
between performance and computational complexity (training time). Our experiments
using different membership function shapes also reveal that the Gaussian membership
function is the “optimum” shape for constructing accurate decision support systems.
Neural networks can no longer be considered as ‘black boxes’. Recent research
(Setiono, 2000; Setiono, Leow, & Zurada, 2002) has revealed that it is possible to extract
rules from trained neural networks. In our experiments, we used a neural network trained
using the scaled conjugate gradient algorithm. Results depicted in Figure 5 also reveal
with the trained neural network could not learn and generalise accurately compared with
the Takagi-Sugeno fuzzy inference system. The proposed neural network outperformed
both the Mamdani-Assilian fuzzy inference system and CART.
Two important features of the developed classification and regression tree are its
easy interpretability and low complexity. Due to its one-pass training approach, the
CART algorithm also has the lowest computational load. For Dataset A, the best results
were achieved using 122 terminal nodes (relative error = 0.00014). As shown in Figure 12,
when the number of terminal nodes was reduced to 14, the relative error increased to 0.016.
For Dataset B, the best results could be achieved using 128 terminal nodes (relative error
= 0.00010). As shown in Figure 13, when the terminal nodes were reduced to 14, the relative
error increased to 0.011.
CONCLUSION
In this chapter, we have presented different soft computing and machine learning
paradigms for developing a tactical air combat decision support system. The techniques
explored were a Takagi-Sugeno fuzzy inference system trained by using neural network
learning techniques, a Mamdani-Assilian fuzzy inference system trained by using
evolutionary algorithms and neural network learning, a feed-forward neural network
trained by using the scaled conjugate gradient algorithm, and classification and adaptive
regression trees.
The empirical results clearly demonstrate that all these techniques are reliable and
could be used for constructing more complicated decision support systems. Experiments
on the two independent data sets also reveal that the techniques are not biased on the
data itself. Compared to neural networks and regression trees, the Takagi-Sugeno fuzzy
inference system has the lowest RMSE, and the Mamdani-Assilian fuzzy inference
system has the highest RMSE. In terms of computational complexity, perhaps regression
trees are best since they use a one-pass learning approach when compared to the many
learning iterations required by all other considered techniques. An important advantage
of the considered models is fast learning, easy interpretability (if-then rules for fuzzy
inference systems, m-of-n rules from a trained neural network (Setiono, 2000) and
decision trees), efficient storage and retrieval capacities, and so on. It may also be
concluded that fusing different intelligent systems, knowing their strengths and weak-
ness could help to mitigate the limitations and take advantage of the opportunities to
produce more efficient decision support systems than those built with stand-alone
systems.
Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Soft Computing Paradigms and Regression Trees 25
Our future work will be directed towards optimisation of the different intelligent
paradigms (Abraham, 2002), which we have already used, and also to develop new
adaptive reinforcement learning systems that can update the knowledge from data,
especially when no expert knowledge is available.
ACKNOWLEDGMENTS
The authors would like to thank Professor John Fulcher for the editorial comments
which helped to improve the clarity of this chapter.
REFERENCES
Abraham, A. (2001). Neuro-fuzzy systems: State-of-the-art modeling techniques. In J.
Mira & A. Prieto (Eds.), Connectionist models of neurons, learning processes, and
artificial intelligence (pp. 269-276). Berlin, Germany: Springer-Verlag.
Abraham, A. (2002). Optimization of evolutionary neural networks using hybrid learning
algorithms. Proceedings of the IEEE International Joint Conference on Neural
Networks (IJCNN’02): Vol. 3, Honolulu, Hawaii (pp. 2797-2802). Piscataway, NJ:
IEEE Press.
Abraham, A., & Nath, B. (2000a). Evolutionary design of neuro-fuzzy systems: A generic
framework. In A. Namatame, et al. (Eds.), Proceedings of the 4th Japan-Australia
Joint Workshop on Intelligent and Evolutionary Systems (JA2000 - Japan) (pp.
106-113). National Defence Academy (Japan)/University of New South Wales
(Australia).
Abraham, A., & Nath, B. (2000b, December). Evolutionary design of fuzzy control
systems: A hybrid approach. In J. L. Wang (Ed.), Proceedings of the 6th Interna-
tional Conference on Control, Automation, Robotics, and Vision, (ICARCV
2000), Singapore.
Abraham, A., & Nath, B. (2001). A neuro-fuzzy approach for modelling electricity demand
in Victoria. Applied Soft Computing, 1(2), 127-138.
Adibi, J., Ghoreishi, A., Fahimi, M., & Maleki, Z. (1993, April). Fuzzy logic information
theory hybrid model for medical diagnostic expert system. Proceedings of the 12th
Southern Biomedical Engineering Conference, Tulane University, New Orleans,
LA (pp. 211-213).
Breiman, L., Friedman, J., Olshen, R., & Stone, C. J. (1984). Classification and regression
trees. New York: Chapman and Hall.
Cattral R., Oppacher F., & Deogo, D. (1999, July 6-9). Rule acquisition with a genetic
algorithm. Proceedings of the Congress on Evolution Computation: Vol. 1,
Washington, DC (pp. 125-129). Piscataway, NJ: IEEE Press.
Chappel, A. R. (1992, October 5-8). Knowledge-based reasoning in the Paladin tactical
decision generation system. Proceedings of the 11th AIAA Digital Avionics
Systems Conference, Seattle, WA (pp. 155-160).
Cortés, P., Larrañeta, J., Onieva, L., García, J. M., & Caraballo, M. S. (2001). Genetic
algorithm for planning cable telecommunication networks. Applied Soft Comput-
ing, 1(1), 21-33.
Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
26 Tran, Abraham & Jain
Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Soft Computing Paradigms and Regression Trees 27
Militallo, L. G., & Hutton, R. J. B. (1998). Applied cognitive task analysis (ACTA): A
practitioner’s toolkit for understanding cognitive. Ergonomics, 41(11), 1618-1642.
Moller, A. F. (1993). A scaled conjugate gradient algorithm for fast supervised learning.
Neural Networks, 6, 525-533.
Perneel, C., & Acheroy, M. (1994, December 12-13). Fuzzy reasoning and genetic
algorithm for decision making problems in uncertain environment. Proceedings of
the Industrial Fuzzy Control and Intelligent Systems Conference/NASA joint
Technology Workshop on Neural Networks and Fuzzy Logic — NAFIPS/IFIS/
NASA 94, San Antonio, TX (pp. 115-120).
Ponnuswamy, S., Amin, M. B., Jha, R., & Castañon, D. A. (1997). A C3I parallel benchmark
based on genetic algorithms implementation and performance analysis. Journal of
Parallel and Distributed Computing, 47(1), 23-38.
Sanderson, P. M. (1998, November 29-December 4). Cognitive work analysis and the
analysis, design, and evaluation of human computer interactive systems. Proceed-
ings of the Annual Conference of the Computer-Human Interaction Special
Interest Group (CHISIG) of the Ergonomics Society of Australia (OzCHI98),
Adelaide, South Australia (pp. 40-45).
Setiono, R. (2000). Extracting M-of-N rules from trained neural networks. IEEE Transac-
tions on Neural Networks, 11(2), 512-519.
Setiono, R., Leow, W. K., & Zurada, J. M. (2002). Extraction of rules from artificial neural
networks for nonlinear regression. IEEE Transactions on Neural Networks, 13(3),
564-577.
Steinberg, D., & Colla, P. L. (1995). CART: Tree-structured non-parametric data analy-
sis. San Diego, CA: Salford Systems.
Sugeno, M. (1985). Industrial applications of fuzzy control. Amsterdam: Elsevier
Science Publishing Company.
Takagi, T., & Sugeno, M. (1983, December 15-18). Derivation of fuzzy control rules from
human operator’s control actions. Proceedings of the IFAC Symposium on Fuzzy
Information, Knowledge Representation and Decision Analysis, Marseilles, France
(pp. 55-60).
Tan, K. C., & Li, Y. (2001). Performance-based control system design automation via
evolutionary computing. Engineering Applications of Artificial Intelligence,
14(4), 473-486.
Tan, K. C., Yu, Q., Heng, C. M., & Lee, T. H. (2003). Evolutionary computing for knowledge
discovery in medical diagnosis. Artificial Intelligence in Medicine, 27(2), 129-154.
Tran, C., Abraham, A., & Jain, L. (2004). Modeling decision support systems using hybrid
neurocomputing. Neurocomputing, 61C, 85-97.
Tran, C., Jain, L., & Abraham, A. (2002a, December 2-6). Adaptation of Mamdani fuzzy
inference system using neuro — genetic approach for tactical air combat decision
support system. Proceedings of the 15th Australian Joint Conference on Artificial
Intelligence (AI’02), Canberra, Australia (pp. 402-410). Berlin: Springer Verlag.
Tran C., Jain, L., & Abraham, A. (2002b), Adaptive database learning in decision support
system using evolutionary fuzzy systems: A generic framework, hybrid informa-
tion systems. In A. Abraham & M. Oppen (Eds.), Advances in soft computing (pp.
237-252). Berlin: Physica Verlag.
Tran, C., Jain, L., & Abraham, A. (2002c). TACDSS: Adaptation of a Takagi-Sugeno
hybrid neuro-fuzzy system. Proceedings of the 7th Online World Conference on
Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
28 Tran, Abraham & Jain
Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Application of Text Mining Methodologies to Health Insurance Schedules 29
Chapter II
ABSTRACT
This chapter describes the application of a number of text mining techniques to
discover patterns in the health insurance schedule with an aim to uncover any
inconsistency or ambiguity in the schedule. In particular, we will apply first a simple
“bag of words” technique to study the text data, and to evaluate the hypothesis: Is there
any inconsistency in the text description of the medical procedures used? It is found
that the hypothesis is not valid, and hence the investigation is continued on how best
to cluster the text. This work would have significance to health insurers to assist them
to differentiate descriptions of the medical procedures. Secondly, it would also assist
the health insurer to describe medical procedures in an unambiguous manner.
Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
30 Tsoi, To & Hagenbuchner
Group A2
...
Other Non-preferred
Group A15
Group A1
Medical (Emergency)
General Practitioner
Group C3 Category 1
Prosthodontic Professional Attendance Group D1
Misc.
Group C2
Maxilloacial
Group D2
Category 7 Category 2 Nuclear
Group C1 Cleft Lip and Cleft Diagnostic Procedures
Orthodontic Pallate Services
Group T1
Group P11
Misc.
Specimen referred MBS
. 1999
.
. Category 3
Category 6 Group T2
Therapeutic Procedures
Pathology Services Radiation
Group P2
.
Chemical
..
Group P1
Group T9
Heamatology
Amesthesia
Category 5 Category 4
Diagnostic Imaging Oral Services
Group I5
Magnetic Resonance
Group O1
Consultation
..
. Group O2
Group I2 .
.. Assistance
Tomography
Group I1 Group O9
Ultrasound Nerve Blocks
Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Application of Text Mining Methodologies to Health Insurance Schedules 31
The MBS is divided into seven categories, each of which describes a collection of
treatments related to a particular type, such as diagnostic treatments, therapeutic
treatments, oral treatments, and so on. Each category is further divided into groups. For
example, in category 1, there are 15 groups, A1, A2, …, A15. Within each group, there are
a number of medical procedures which are denoted by unique item numbers. In other
words, the MBS is arranged in a hierarchical tree manner, designed so that it is easy for
medical service providers to find appropriate items which represent the medical proce-
dures provided to the patient.2 This underlying MBS structure is outlined in Figure 1.
This chapter evaluates the following:
• Hypothesis — Given the arrangement of the items in the way they are organised in
the MBS (Figure 1), are there any ambiguities within this classification? Here,
ambiguity is measured in terms of a confusion table comparing the classification
given by the application of text mining techniques and the classification given in
the MBS. Ideally, if the items are arranged without any ambiguities at all (as
measured by text mining techniques), the confusion table should be diagonal with
zero off diagonal terms.
• Optimal grouping — Assuming that the classification given in MBS is ambiguous
(as revealed in our subsequent investigation of the hypothesis), what is the
“optimal” arrangement of the item descriptions using text mining techniques (here
“optimal” is measured with respect to text mining techniques)? In other words, we
wish to find an “optimal” grouping of the item descriptions together such that there
will be a minimum of misclassifications.
Obviously, the validity of the described method lies in the validity of text mining
techniques in unambiguously classifying a set of documents. Unfortunately, this may
not be the case, as new text mining techniques are constantly being developed.
However, the value of the work presented in this paper lies in the ability to use
existing text mining techniques and to discover, as far as possible, any ambiguities within
the MBS. This is bound to be a conservative measure, as we can only discover
ambiguities as far as possible given the existing tools. There will be other ambiguities
Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
32 Tsoi, To & Hagenbuchner
which remain uncovered by current text mining techniques. But at least, using our
approach will clear up some of the existing ambiguities. In other words, the text mining
techniques do not claim to be exhaustive. Instead, they will indicate ambiguities as far
as possible, given their limitations.
The structure of this chapter is as follows: In the next section, we describe what text
mining is, and how our proposed techniques fall into the general fabric of text mining
research. In the following section, we will describe the “bag of words” approach to text
mining. This is the simplest method in that it does not take any cognizance of semantics
among the words; each word is treated in isolation. In addition, this will give an answer
to the hypothesis as stated above. If ambiguities are discovered by using such a simple
text mining technique, then there must exist ambiguities in the set of documents
describing the medical procedures. This will give us a repository of results to compare
with those when we use other text mining techniques. In the next section, we describe
briefly the latent semantic kernel (LSK) technique to pre-process the feature vectors
representing the text. In this technique, the intention is that it is possible to manipulate
the original feature vectors representing the documents and to shorten them so that they
can better represent the “hidden” message in the documents. We show results which do
not assume the categories as given in the MBS.
TEXT MINING
In text mining, there are two main issues: retrieval and classification (Berry, 2004).
• Retrieval techniques — used to retrieve the particular document:
¡ Keyword-based search — this is the simplest method in that it will retrieve
a document or documents which matches a particular set of key words
provided by the user. This is often called “queries”.
¡ Vector space-based retrieval method — this is often called a “bag of words”
approach. It represents the document in terms of a set of feature vectors. Then,
the vectors can be manipulated so as to show patterns, for example, by
grouping similar vectors into clusters (Nigam, McCallum, Thrun, & Mitchell,
2000; Salton, 1983).
¡ Latent semantic analysis — this is to study the latent or hidden structure of
the set of documents with respect to “semantics”. Here “semantics” is taken
to mean “correlation” within the set of documents; it does not mean that the
technique will discover the “semantic” relationships between words in the
sense of linguistics (Salton, 1983).
¡ Probabilistic latent semantic analysis — this is to consider the correlation
within the set of documents within a probabilistic setting (Hofmann, 1999a).
• Classification techniques — used to assign data to classes.
¡ Manual classification — a set of documents is classified manually into a set
of classes or sub-classes.
¡ Rule-based classification — a set of rules as determined by experts is used
to classify a set of documents.
¡ Naïve Bayes classification — this uses Bayes’ theorem to classify a set of
documents, with some additional assumptions (Duda, 2001).
Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Application of Text Mining Methodologies to Health Insurance Schedules 33
This chapter explores the “bag of words” technique to classify the set of documents
into clusters and compare them with those given in the MBS. The chapter also employs
the latent semantic kernel technique, a technique from kernel machine methods (based
on support vector machine techniques) to manipulate the features of the set of docu-
ments before subjecting them to clustering techniques.
BAG OF WORDS
If we are given a set of m documents D = [d1, d2,..., dm], it is quite natural to represent
them in terms of vector space representation. From this set of documents it is simple to
find out the set of vocabularies used. In order that the set of vocabularies would be
meaningful, care is taken by using the stemmisation technique which regards words of
the same stem to be one word. For example, the words “representation” and “represent”
are considered as one word, rather than two distinct words, as they have the same stem.
Secondly, in order that the set of vocabularies would be useful to distinguish documents,
we eliminate common words, like “the”, “a”, and “is” from the set of vocabularies. Thus,
after these two steps, it is possible to have a set of vocabularies w1, w2,..., wn which
represents the words used in the set of documents D. Then, each document can be
represented as an n-vector with elements which denote the frequency of occurrence of
the word in the document di, and 0 if the word does not occur in the document di. Thus,
from a representation point of view, the set of documents D can be equivalently
represented by a set of vectors V - [v1, v2,..., vm] , where vi is an n-vector. Note that this set
of vectors V may be sparse, as not every word in the vocabulary occurs in the document
(Nigam et al., 2000). The set of vectors V can be clustered together to form clusters using
standard techniques (Duda, 2001).
Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
34 Tsoi, To & Hagenbuchner
Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Application of Text Mining Methodologies to Health Insurance Schedules 35
Table 3. Category-4 Items 52000, 52003, 52006, and 52009 misclassified by the naïve
Bayes method as Category-3 items
Table 4. Some items in Category 3 which are similar to items 52000, 52003, 52006, and
52009
It is observed that the way items 5200X are described is very similar to those
represented in items 300YY. For example, item 52000 describes a medical procedure to
repair small superficial cuts on the face or neck. On the other hand, item 30026 describes
the same medical procedure except that it indicates that the wounds are not on the face
Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
36 Tsoi, To & Hagenbuchner
or neck, with the distinguishing feature that this is not a service to which another item
in Group T4 applies. It is noted that the description of item 30026 uses the word “not”
to distinguish this from that of item 52000, as well as appending an extra phrase “not being
a service to which another item in Group T4 applies”. From a vector space point of view,
the vector representing item 52000 is very close3 to item 30026, closer than other items
in category-4, due to the few extra distinguishing words between the two. Hence, item
52000 is classified as “one” in category-3, instead of “one” in category-4. Similar
observations can be made for other items shown in Table 3, when compared to those
shown in Table 4.
20 Professional attendance (not being a service to which any other item applies) at
a nursing home including aged persons' accommodation attached to a nursing
home or aged persons' accommodation situated within a complex that includes
a nursing home (other than a professional attendance at a self contained unit) or
professional attendance at consulting rooms situated within such a complex
where the patient is accommodated in a nursing home or aged persons'
accommodation (not being accommodation in a self contained unit) by a
general practitioner for an obvious problem characterised by the
straightforward nature of the task that requires a short patient history and, if
required, limited examination and management -- an attendance on 1 or more
patients at 1 nursing home on 1 occasion -- each patient
Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Application of Text Mining Methodologies to Health Insurance Schedules 37
On the other hand, Tables 5 and 6 show items which are correctly classified in
category-1 and category-5 respectively. It is observed that items shown in Table 5 are
distinct from those shown in Table 6 in their descriptions. A careful examination of
correctly-classified category-1 items, together with a comparison of their descriptions
with those correctly-classified category-5 items confirms the observations shown in
Tables 5 and 6. In other words, the vectors representing correctly-classified category-
1 items are closer to other vectors in the same category than other vectors representing
other categories.
Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Other documents randomly have
different content
“‘A Month in a Dandi’ is full of instruction. It shows a great deal of ability and
determination to express truths, even if they be unpalatable. The chapters on the
vexed questions of Baboo culture and Indian Congress are well worth reading.”—
Manchester Guardian.
“Miss Bremner’s style is chastened for the most part, humorous, faithful to
detail, and oftentimes polished to literary excellence. The earlier chapters are full
of raciness and agreeable personality.”—Hull Daily Mail.
“‘A Month in a Dandi’ describes the writer’s wanderings in Northern India,
following upon a shrewdly observant account of the seamy side of Anglo-Indian
Society. The subject throughout is approached from a political economist’s point of
view. The chapter on the growing poverty of India sounds a warning note.”—
Gentlewoman.
“The author of a ‘Month in a Dandi’ is evidently a keen observer of men and
things, and we know that her opinion is shared by many of our countrymen who
have had a much larger experience of India and Indian affairs than herself. The
book is full of the most exquisite word pictures, pictures that are full of light,
beauty, and grace, but, unfortunately, some of them have more shade than we
care to see; but, doubtless, Miss Bremner’s treatment is correct and life-like.”—Hull
Daily News.
“Quite up to Date.”—Hull Daily Mail.
Crown 8vo., 140 pp.; fancy cover, 1s.; cloth bound, 2s.
STEPPING-STONES TO SOCIALISM.
BY DAVID MAXWELL, C.E.
CONTENTS
In a reasonable and able manner Mr. Maxwell deals with the
following topics:—The Popular Meaning of the Term Socialism—Lord
Salisbury on Socialism—Why There is in Many Minds an Antipathy to
Socialism—On Some Socialistic Views of Marriage—The Question of
Private Property—The Old Political Economy is not the Way of
Salvation—Who is My Neighbour?—Progress, and the Condition of
the Labourer—Good and Bad Trade: Precarious Employment—All
Popular Movements are Helping on Socialism—Modern Literature in
Relation to Social Progress—Pruning the Old Theological Tree—The
Churches,—Their Socialistic Tendencies—The Future of the Earth in
Relation to Human Life—Socialism is Based on Natural Laws of Life—
Humanity in the Future—Preludes to Socialism—Forecasts of the
Ultimate Form of Society—A Pisgah-top View of the Promised Land.
PRESS OPINIONS.
The following are selected from a large number of favourable notices:—
“The author has evidently reflected deeply on the subject of Socialism, and his
views are broad, equitable, and quite up to date. In a score or so of chapters he
discusses Socialism from manifold points of view, and in its manifold aspects. Mr.
Maxwell is not a fanatic; his book is not dull, and his style is not amateurish.”—Hull
Daily Mail.
“There is a good deal of charm about Mr. Maxwell’s style.”—Northern Daily
News.
“The book is well worthy of perusal.”—Hull News.
“The reader who desires more intimate acquaintance with a subject that is often
under discussion at the present day, will derive much interest from a perusal of
this little work. Whether it exactly expresses the views of the various socialists
themselves is another matter, but inasmuch as these can seldom agree even
among themselves, the objection is scarcely so serious as might otherwise be
thought.”—Publisher’s Circular.
Elegantly bound in cloth gilt, crown 8vo., 340 pp., 4/4 nett.
PRESS OPINIONS.
“Mr. Lloyd has redeemed his story by sprightly incident and some admirable
character sketches. Madge, whom the hero eventually marries, is a charming
creation, and yet ‘not too light and good for human nature’s daily food.’ Her sister
and her husband, Tom Coltman, are also a fine couple, and Mr. Lloyd introduces us
to some very clever scenes at the theatre at which they perform. The hero’s sister,
Gladys, is another favourite, and the family to which she is introduced consists of
many persons in whom the reader is bound to take an interest. Mr. Lloyd works up
the climax in a truly masterly manner, and the discovery of the father of the
‘children of chance,’ is ingenious and clever. In short we have little but praise for
this book.... The reader’s interest is aroused from the first and is sustained to the
end. There is pathos in the story and there is humour, and Mr. Lloyd writes very
gracefully and tenderly where grace and tenderness are needed.”—Birmingham
Daily Gazette.
“The story ... is full of action and movement, and is never dull.”—The Scotsman.
“Messrs. William Andrews & Co., of Hull, have opened their ‘Library of Popular
Fiction’ with a brightly-written novel by Herbert Lloyd, entitled ‘Children of Chance.’
The treatment of the story is distinctly above the average.... The character of
Richard Framley, though a minor one, is very cleverly limned, and a forcible piece
of writing in the last chapter but one, will leave a vivid impression even to the
reader who merely skims the book. Altogether the ‘Library’ has reached a high
standard with its initial volume.”—Eastbourne Observer.
“Those who can appreciate a good story told in plain and simple language will
probably find a good deal of pleasure in perusing ‘Children of Chance,’ by Herbert
Lloyd. It is altogether devoid of sensationalism. At the same time one feels an
interest in the various couples who are introduced, and whose love-making is
recorded in a very agreeable manner.... Mr. Lloyd succeeds in depicting an
effective scene at the final denouement, the period before it being attractively
filled in. It is artistically worked out.”—Sala’s Journal.
“The story is strengthened by the interest attaching to its women, and by a
certain lightness of touch and naturalness in the portrayal of the life of an artistic
family. Some of the characters are both well drawn and likeable, and one or two
strong incidents redeem the general tone of the plot.”—Glasgow Herald.
“This is decidedly a good novel, and the plot is sufficiently exciting to attract a
reader and hold him to the end.”—The Publishers’ Circular.
“The author of ‘Children of Chance,’ grasps one of the first essentials of fiction,
dramatic effect.... There is no lack of new ideas, and the story is not
uninteresting.”—The Literary World.
“The plot of ‘Children of Chance,’ by Herbert Lloyd, is in many ways a powerful
one.... There are several strong situations, and the book is well worth reading.”—
The Yorkshire Post.
“‘Children of Chance,’ which inaugurates Andrews’ ‘Library of Popular Fiction,’
enforces the lesson of evil consequences that may be expected to follow upon foul
deeds deliberately wrought.... The interest in the career of Cecil Studholme and
his children is kept well alive.”—The Academy.
“This is a well-balanced and cleverly written novel. Some fine realistic work is
displayed in the delineation of several characters, a trait which shows that the
author has kept a high ideal before him in his constructive processes.... Love
episodes come in, and the conversation is exceedingly healthy and natural. The
volume is beautifully got-up.”—The Perthshire Advertiser.
“There is plenty of love-making in the story, several of the characters are well
drawn, and the plot is an ingenious one.”—Northern Evening Mail.
“Much of Mr. Lloyd’s book is bright, fresh, and ingenious.... The plot is cleverly
conceived, and shows careful treatment from beginning to end.... There are in
‘Children of Chance’ notable instances where a deep insight into human nature is
perceptible; many scenes, such as that which closes on the life of the deserted
wife, show a touch of pathos of which many a more noted author might feel justly
proud; while at times the dialogue is far from indifferent.”—Hull News.
“‘Children of Chance’ is the pioneer volume of Andrews’ ‘Library of Fiction.’ It
ought to win its way to popular favour. Its attractive binding and excellent printing
are commendable features, while the story itself displays high literary merit. Mr.
Lloyd does not lack the modern fiction writer’s capacity for the creation of
sensational incidents; but he manages his plots with ingenuity and success, and
his morality is thoroughly sound.”—North Eastern Daily Gazette.
1.D. The copyright laws of the place where you are located also
govern what you can do with this work. Copyright laws in most
countries are in a constant state of change. If you are outside
the United States, check the laws of your country in addition to
the terms of this agreement before downloading, copying,
displaying, performing, distributing or creating derivative works
based on this work or any other Project Gutenberg™ work. The
Foundation makes no representations concerning the copyright
status of any work in any country other than the United States.
1.E.6. You may convert to and distribute this work in any binary,
compressed, marked up, nonproprietary or proprietary form,
including any word processing or hypertext form. However, if
you provide access to or distribute copies of a Project
Gutenberg™ work in a format other than “Plain Vanilla ASCII” or
other format used in the official version posted on the official
Project Gutenberg™ website (www.gutenberg.org), you must,
at no additional cost, fee or expense to the user, provide a copy,
a means of exporting a copy, or a means of obtaining a copy
upon request, of the work in its original “Plain Vanilla ASCII” or
other form. Any alternate format must include the full Project
Gutenberg™ License as specified in paragraph 1.E.1.
• You pay a royalty fee of 20% of the gross profits you derive
from the use of Project Gutenberg™ works calculated using the
method you already use to calculate your applicable taxes. The
fee is owed to the owner of the Project Gutenberg™ trademark,
but he has agreed to donate royalties under this paragraph to
the Project Gutenberg Literary Archive Foundation. Royalty
payments must be paid within 60 days following each date on
which you prepare (or are legally required to prepare) your
periodic tax returns. Royalty payments should be clearly marked
as such and sent to the Project Gutenberg Literary Archive
Foundation at the address specified in Section 4, “Information
about donations to the Project Gutenberg Literary Archive
Foundation.”
• You comply with all other terms of this agreement for free
distribution of Project Gutenberg™ works.
1.F.
Most people start at our website which has the main PG search
facility: www.gutenberg.org.
ebookball.com