See discussions, stats, and author profiles for this publication at: [Link]
net/publication/266971832
A Study on Classification Learning Algorithms to Predict Crime Status
Article in International Journal of Digital Content Technology and its Applications · May 2013
DOI: 10.4156/jdcta.vol7.issue9.43
CITATIONS READS
42 8,120
4 authors:
Somayeh Shojaee Aida Mustapha
Universiti Putra Malaysia Universiti Malaysia Pahang Al-Sultan Abdullah
6 PUBLICATIONS 365 CITATIONS 411 PUBLICATIONS 5,613 CITATIONS
SEE PROFILE SEE PROFILE
Fatimah Sidi Marzanah A. Jabar
Universiti Putra Malaysia Universiti Putra Malaysia
121 PUBLICATIONS 1,261 CITATIONS 63 PUBLICATIONS 776 CITATIONS
SEE PROFILE SEE PROFILE
All content following this page was uploaded by Somayeh Shojaee on 19 October 2014.
The user has requested enhancement of the downloaded file.
A Study on Classification Learning Algorithms to Predict Crime Status
Somayeh Shojaee, Aida Mustapha, Fatimah Sidi, Marzanah A. Jabar
A Study on Classification Learning Algorithms to Predict Crime Status
1
Somayeh Shojaee, 2Aida Mustapha, 3Fatimah Sidi, 4Marzanah A. Jabar
1,2,3,4
Faculty of Computer Science and Information Technology, Universiti Putra Malaysia,
{somayeh_shojaee, aida, fatimahcd, marzanah}@[Link]
Abstract
In the recent past, there has been a huge increase in the crime rate, hence the significance of task to
predict, prevent or solve the crimes. In this paper, we conducted an experiment to obtain better
supervised classification learning algorithms to predict crime status by using two different feature
selection methods tested on real dataset. Comparisons in terms of Area Under Curve (AUC), that
Naïve Bayesian (0.898), k-Nearest Neighbor (k-NN) (0.895) and Neural Networks
(MultilayerPerceptron) (0.892) are better classifiers against Decision Tree (J48) (0.727), and Support
Vector Machine (SVM) (0.678). Furthermore, the performance of mining results is improved by using
Chi-square feature selection technique.
Keywords: Classification, AUC, Naïve Bayesian, Neural Networks, k-Nearest Neighbor, Decision
Tree, Support Vector Machine and Chi-square
1. Introduction
Due to increasing the amount of data, a need to develop technologies to analyze data in different
fields, such as business, medicine and education, has emerged [1]. Therefore, data mining methods
have become the main tools to analyze data and to discover knowledge from them [2]. Here, data
mining refers to an integration of multiple methods such as classification, clustering, evaluation, and
data visualization [3].
One of these data which needs data mining techniques to discover and predict underlying patterns
are crime data [4]. A high number of crimes in different countries have forced governments to use
modern technologies and methods to control and to prevent crimes. Data mining techniques are able to
identify patterns rapidly for detecting future criminal actions [5]. This is because manual interpretations
of crime data are limited due to the size of data as well as the complexity among different crime
attributes. Data mining methods accelerate crime analytics, provide better analysis and produce real-
time solutions to save considerable resources and time [6].
Today, a high number of crimes are causing a lot of problems in many different countries. In fact,
scientists are spending time studying crime and criminal behaviors in order to understand the
characteristics of crime and to discover crime patterns. It is known that criminals follow repetitive
behaviour patterns, so analyzing their behaviors can help to capture relations among events from past
crimes [7]. In this research, crime research studies are integrated by data mining techniques to identify
the patterns and to achieve more accurate results.
To analyze crimes, there are several characteristics such as different races in a society, income
groups, age groups, family structure (single, divorced, married), level of education, the locality where
people live, number of police officers allocated to a locality, number of employed and unemployed
people among others [8].
Dealing with crime data is very challenging as the size of crime data grows very fast, so it can cause
storage and analysis problems. In particular, issues arise as to how to choose accurate techniques for
analyzing data due to the inconsistency and inadequacy of these kinds of data. These issues motivate
scientists to conduct research on these kinds of data to enhance crime data analysis. The objective of
this evaluation is twofold. First, it determines whether the feature selection technique is useful to infer
better classification accuracy and performance. Second, it compares the different classifiers in terms of
AUC for choosing more accurate algorithms to classify crime status in the United States of America for
obtaining a deeper insight into crime.
In this study, a real crime dataset is used for data mining from UCI Machine Learning Repository.
Five different Classification Algorithms are used to classify dataset based on a binominal class, the
crime status. Examined classifiers are Naïve Bayesian, Decision Tree (J48), Support Vector Machine
(SVM), Neural Networks (MultilayerPerceptron) and k-Nearest Neighbor. By experiment, results of
International Journal of Digital Content Technology and its Applications(JDCTA) 361
Volume7,Number9,May 2013
doi:10.4156/jdcta.vol7.issue9.43
A Study on Classification Learning Algorithms to Predict Crime Status
Somayeh Shojaee, Aida Mustapha, Fatimah Sidi, Marzanah A. Jabar
five algorithms for two features stets are studied and compared, and the more efficient algorithms in
predicting the goal class (crime status) are then identified. There are many tools available for data
mining. For this study, the Rapidminer is chosen as it is freely available from the website
[Link].
This paper is made up of the following sections. Section 2 presents the related works, Section 3
discusses the dataset and pre-processing, Section 4 on the selected classifiers, Section 5 on results and
discussions and Section 6 on the conclusion.
2. Related Work
Based on existing research, it has been identified that data mining techniques aid the process of
crime detection. Some examples of data mining techniques usage to analyze crime data are
classification and machine learning algorithms. Yu et al. [9] employ an ensemble of data mining
classification techniques for crime forecasting. Several classification methods that are included in the
study are One Nearest Neighbor (1NN), Decision Tree (J48), Support Vector Machine (SVM), Neural
Network (Neural) with 2-layer network, and Naïve Bayesian (Bayes).
Detecting patterns of serial criminal behaviors and crime activity geographically by using clustering
in [6] are also used for pattern recognitions and predictions. Nath [10], in his study, applies clustering
by considering the geographical approach which shows regional crimes on a map and clusters crimes
according to their types by using a combination of K-means Clustering Algorithm and Weighting
Algorithm.
Clustering and graph representations are also used to obtain similar crime and group classes of
criminals, as well as to visualize the results. Clustering features such as shape, size, and distribution are
able to help understand more details about relevant crimes [11, 12] including a clustering analysis on
US State database [13]. Association mining is one of the acceptable methods to discover the underlying
novel patterns on a large volume of crime data [6]. Other techniques, such as semantic analysis and text
mining are used to extract entity extraction from FBI bulletins [14, 15, 16].
In [6], a fuzzy association rules mining application for community crime pattern discovery is
proposed. The application produced interesting and meaningful rules at regional and national levels and,
to extract novel rules, a relative support metric is defined. In 2007, Ng et al. [17] propose the
Incremental Temporal Association Rules (ITAR), an incremental mining algorithm to discover crime
patterns. ITAR is able to prevent rescanning of database, which is the main bottleneck in Apriori-based
association rule mining. It employs temporal association rule mining when the amount of data is
growing.
Bruin et al. [14] present a new distance measure to evaluate individual criminals using the profiles
to cluster them and enable recognition of criminals' classes. This research also present a particular
distance measure for a combination of the profile differences with crime frequency and change of
criminal behavior over time. Brown [18] creates the Regional Crime Analysis Program (ReCAP) which
provides crime analysts with both data fusion and data mining, to aid Virginia law enforcement for
capturing professional criminals in their own region. Coplink is one of various research systems which
have helped to enhance criminal intelligence analysis by using a co-occurrence concept space [16].
Redmond and Baveja [19] propose a Crime Similarity System (CSS) to assist police departments for
developing a decision-making strategic view point. The system makes a list of communities by using
the cities’ enforcement profiles, crime and socioeconomic for obtaining knowledge from past
experiences. The same author in [20] conducts a study of several algorithms for numeric prediction
using case-based reasoning (CBR). Their emphasis is on case quality, an attempt to filter out cases that
may be noisy or idiosyncratic because they are not good for future prediction. The results showed
significant increase in the percentage of correct predictions at the expense of an increased risk of poor
predictions in less common cases.
Chen et al. [21] develop a crime data mining framework based on the Coplink project experience.
Abraham and Vel [5] propose a method to realize criminals' behaviors by using computer log files for
seeking some relationship among data and produced profiles which are used to understand the
behaviors.
362
A Study on Classification Learning Algorithms to Predict Crime Status
Somayeh Shojaee, Aida Mustapha, Fatimah Sidi, Marzanah A. Jabar
3. Dataset Description
The “Communities and Crime Unnormalized” dataset, available from the UCI Machine Learning
Repository, was employed for this study. This dataset focuses on American communities, and
combines socio-economic data from the 1990 US Census, law enforcement data from the 1990 Law
Enforcement Management and Administrative Statistics survey, and crime data from the 1995 US FBI
Uniform Crime Report [22]. These dataset were collected by Buczak [6].
The dataset consisted of 2215 total instances and 147 attributes for communities, 125 predictive, 4
non-predictive and 18 potential goal attributes. The data in each instance belong to different states in
the USA. The states are represented in the form of number, every number representing its respective
American state. Attributes include information across a variety of crime-related facets, ranging from
the percentage of officers assigned to drug units, to population density and percent considered urban
and to median household income. Also included are measures of crimes considered violent, which are
murder, rape, robbery, and assault. The complete details of all 147 attributes can be obtained from the
UCI machine learning repository website.
3.1 Pre-processing
There are a few techniques in practice employed for the purpose of data preprocessing. The
techniques are data cleaning, discretization and data transformation, and feature selection. It intends to
reduce some noises, incomplete and inconsistent data. The result from the preprocessing step is then
followed by the data mining algorithm. In this study, pre-processing is carried out in two steps, which
are data cleaning and data transformation based on Buczak’s work [6].
For the first step, the goal for data cleaning is to decrease noise and handle missing values. There
are a number of methods for treating records that contain missing values such as omitting the incorrect
fields(s), omitting the entire record that contains the incorrect field(s), automatically entering or
correcting the data with default values, deriving a model to enter or to correct the data, replacing all
values with a global constant and using the imputation method to predict missing values.
In this study, some communities were removed based on occurrences of significant missing or
known incorrect crime statistics. Certain attributes contain a significant number of missing values,
more than 80%, for which the data was unavailable or not recorded for particular communities. These
attributes with high amount of missing values were removed such as the pctPolicWhite, and the
pctPolicBlack. All kinds of crime attributes (potential goal attributes) with missing values were
removed because only a total number of violent crimes per 100K population attribute, which is
violentPerPop will be considered as class. All 221 instances with violentPerPop missing values were
removed and 1994 remained because violentPerPop has been chosen as the goal attribute.
For the second step, we performed data normalization, discretization, and data type transformation.
For all attributes except state, min-max normalization to [0, 1] is used to avoid the large value issue as
it has the advantage of protecting exactly all relationships in the data and to prevent any bias injection
[7]. Next, we also discretized the selected class, which is total number of violent crimes per 100K
population (violentPerPop), into a binomial class crime status (CrimeStatus). In order to perform
prediction, the goal class should be nominal in nature.
After discretization, we also performed data type transformation because initial class has two values,
“critical” and “non-critical”. If the value in violentPerPop is less than forty percent, than the value of
CrimeStatus set to “non-critical”, otherwise “critical”. The states are converted from nominal to
numeric; whereby every number represents the respective American state.
3.2 Feature Selection
Relevance analysis or feature selection is used to remove the irrelevant or redundant attributes.
Feature selection has several objectives such as enhancing model performance by avoiding overfitting
in the case of supervised classification. In this study, as mentioned above, crime status (CrimeStatus)
was chosen among eighteen potential goal attributes as the desirable dependent variable. For attribute
selection, two mechanisms were used to select the final set of attributes:
363
A Study on Classification Learning Algorithms to Predict Crime Status
Somayeh Shojaee, Aida Mustapha, Fatimah Sidi, Marzanah A. Jabar
The Golden Standard or manual selection of attributes which is based on human
understanding and intellect. This selection was based on the previous study [6]. Following
their methodology yielded 44 attributes where a majority of the 44 selected attributes are
percentages, or are computed per 100K population. Similar attributes were removed; for
example, from the attributes of the number of people under the poverty level
(NumUnderPov) where only the percentage of people under the poverty level
(PctPopUnderPov) was kept. Also, attributes with large number of missing values were
removed.
Using the Chi-square test to detect the correlated attributes as they will ruin the
classification result because of high dependency among some attributes caused
redundancy and subsequently inaccurate results. Chi-square is one of the most effective
feature selection methods of classification. Chi-square feature selection has been adopted
and 94 attributes are selected.
4. Selected Classifiers
After preprocessing and feature selection phases, the numbers of attribute was meaningfully cut
down and are now more precise for building the data mining models. In order to quantitatively predict
the crime status, many data mining methods can be used. In this study, a classification task is applied
for prediction. Classification as a famous data mining supervised learning techniques is used to extract
meaningful information from large datasets and can be efficaciously used to predict unknown classes
[23]. There are various classification algorithms, such as Support Vector Machines (SVM), k-Nearest
Neighbor (k-NN), Decision Tree, Weighted Voting and Artificial Neural Networks. All these
techniques can be applied to a dataset for discovering sets of models to forecast unknown class labels
[2].
The data classification process in this study has two steps, namely the training step to build a model
and using the model for classification. Classification techniques can be applied to a dataset for finding
sets of models to predict the unknown class labels. The predictive accuracy of the classifier is measured
by using the training set and the accuracy of classifier on a given test set is the percentage of test set
tuples which are classified correctly. If the accuracy is acceptable, the classifier can be used for future
data tuples for which the class label is unknown [3].
Based on the classification algorithms are used in [9], five different classification algorithms,
namely Naïve Bayesian (Bayes), Decision Tree (J48), Support Vector Machine (SVM), Neural
Network (MultilayerPerceptron), and k-Nearest Neighbor (k-NN) are chosen to perform classification
on the dataset. In the following section, the results of the algorithms will be compared and the more
efficient algorithms in crime status prediction will be determined through AUC comparison.
Naïve Bayesian classifiers, by adopting a supervised learning approach, have the ability to predict
the probability of a given tuple dependency to a specific class. This classifier is very simple to construct,
and it may be easily applied to huge data sets [3].
Decision Tree as another supervised learning approach performs a tree where each node refers a test
on an attribute value. The leaves symbolize classes or class distributions which predict classification
models. The branches show coincidences of features, which go to classes. Decision tree algorithms,
after treating all the dataset as a large single set, return to recursively split the set. To construct a tree, a
top down move is applied until some stopping criterion is met and different methods, such as Gain in
entropy, is used for making nodes [24].
Support Vector Machines (SVM) is a group of supervised learning methods that can be employed
for classification or regression [25, 26, 27]. In a two-class learning task, the SVM goal is to discover
the best classification function to differentiate between members of the two classes in the training data.
For that purpose, SVM construct a hyperplane or a set of hyperplanes in a high or infinite dimensional
space for separating dataset and SVM find the best function by maximizing the margin between the two
classes.
Neural Networks, as another classification method, is a nonlinear model which is able to model real
world complex relationships. Neural networks could estimate the posterior probabilities, which give the
basis for setting up classification rules and conducting statistical analysis [28].
364
A Study on Classification Learning Algorithms to Predict Crime Status
Somayeh Shojaee, Aida Mustapha, Fatimah Sidi, Marzanah A. Jabar
K-Nearest Neighbor (k-NN) classifiers are based on learning by comparison of given test tuple with
training tuples. For an unknown tuple, a k-NN classifier seeks to detect a group of k objects in the
training set that are closest to the unknown tuple and the unknown tuple labels based on the
predominance of a specific class in this neighborhood [3]. One of the elements of this classifier is the
value of k which is set to 10 for this study regarding the highest results of accuracy.
5. Results and Discussions
In this experiment, the 10-fold cross validation mode was chosen. It randomly divided database into
10 separate blocks of objects and then data mining algorithm was trained using 9 blocks. Meanwhile,
the rest was used for testing the algorithm’s performance and the process repeats the k times. Finally,
the average of the results was calculated [2].
Evaluation on five selected classification algorithms on two different sets of features was conducted
by comparing the findings on precision, recall, accuracy, and AUC. Precision shows that the proportion
of data is classified correctly. Recall represents the percentage of information which is relevant to the
class and is correctly classified. Accuracy is the percentage of instances which is classified correctly by
classifiers. Table 1 illustrates the precision and recall for both different sets.
Table 1. Precision and Recall
Method Precision (%) Recall (%)
Set 1: 44 Set 2: 94 Set 1: 44 Set 2: 94
attributes attributes attributes attributes
Naïve Bayesian 86.7 87.5 84.6 84.4
Decision Tree (J48) 84.6 85.7 85.0 86.3
Support Vector Machine (SVM) 85.0 86.1 85.6 86.5
Neural Network (MultilayerPerceptron) 85.1 86.8 85.3 87.1
k-Nearest Neighbor (K=10) 86.9 87.3 87.5 88.0
As depicted in Table 1, precision and recall have been enhanced after using Chi-square feature
selection technique. However, the difference is not significant but it is proven the feature selection
helps in achieving better classification result. After talking about the importance of feature selection,
the following experiment is to determine the better classifiers among others. To fulfill this purpose, five
selected classifiers have been applied for testing. As shown in the table, the precision results of Naïve
Bayesian (87.5%) and k-NN (87.3%) are better than others and the recall value of k-NN (88.0%) is the
best among other classifiers.
A graphical plot (ROC) is used to illustrate the performance of classifiers. Another Receiver
Operating Characteristic (ROC) curves are usually used to show results for binary decision problems in
data mining. ROC presents how the number of correctly classified positive examples varies with the
number of incorrectly classified negative examples [29]. If the ROC curve of the first classifier is
always over the ROC curve of the second classifier, we can conclude that the first classifier is better
than the second classifier.
Based on Figure 1 and Figure 2, this scenario does not occur in this study. In this case, for instance,
the SVM ROC curve is over the ROC curve of the Naïve Bayesian classifier in one part, whereas the
SVM classifier’s curve is over the ROC curve of the Naïve Bayesian classifier in some other part. This
then implies that the two classifiers are preferred under different loss conditions.
365
A Study on Classification Learning Algorithms to Predict Crime Status
Somayeh Shojaee, Aida Mustapha, Fatimah Sidi, Marzanah A. Jabar
Figure 1. ROC comparison for set 1 (with 44 attributes)
Figure 2. ROC comparison for set 2 (with 94 attributes)
However, the precision and recall values for all five classifiers are not significantly differentiated
from each other and also there is no dominating relation between ROC curves in the entire range. In
this situation, AUC provides a good summary for comparing the classifiers. Ling et al. [30] also
compared accuracy and the Area Under Curve (AUC) with different classifiers in various dataset. They
366
A Study on Classification Learning Algorithms to Predict Crime Status
Somayeh Shojaee, Aida Mustapha, Fatimah Sidi, Marzanah A. Jabar
conclude that the best tool for classifier comparison is AUC which helps users to better understand the
performance of the classifiers.
Table 2. Accuracy and AUC
Method Accuracy (%) AUC
Set 1: 44 Set 2: 94 Set 1: 44 Set 2: 94
attributes attributes attributes attributes
Naïve Bayesian 84.646 84.395 0.894 0.898
Decision Tree (J48) 84.997 86.251 0.731 0.727
Support Vector Machine (SVM) 85.649 86.452 0.66 0.678
Neural Network (MultilayerPerceptron) 85.298 87.054 0.882 0.892
k-Nearest Neighbor (k=10) 87.506 88.008 0.897 0.895
Table 2 shows that k-NN algorithm slightly outperformed other four algorithms due to highest
accuracy (88.008 %), particularly when combined with Chi-square attribute selection procedure. Then
it is followed by Neural Network; Decision Tree and Support Vector Machine are virtually equivalent
in terms of accuracy. In this experiment, increasing values of accuracy confirm the impact of feature
selection based on Chi-square.
Naïve Bayesian (0.898), k-NN (0.895) and Neural Network (0.892) based on AUC results are
completely differentiated from Decision Tree (0.727) and SVM (0.678). A classifier with a greater
AUC is said to be better than a classifier with a smaller AUC. Clearly, AUC does tell us that Naïve
Bayesian, Neural Network and k-NN are indeed better than classifiers Decision Tree and SVM. The
reasons lead to better performance of Naïve Bayesian, k-NN and Neural Network may be attributed to
the nature of dataset such as independency between attributes increase the power of Naïve Bayesian to
discern what is going on. Neural Network also usually used for predicting numeric quantities, and k-
NN is work well when attributes are not noisy and unbalanced. In addition it is also possible that some
data sets are so easy to learn that classification without any feature selection attains already the
maximal possible AUC and since enhancing AUC due to feature selection is difficult.
6. Conclusion
The aim of this study is to classify the given specified experimental dataset into two
categories which are critical and non-critical. In this regard, we used five classification
algorithms by combining two different ways of feature selection techniques, manually and Chi-
square, to determine more accurate classifiers. From the experimental results, K-Nearest
Neighbor algorithm presents the best accuracy, specifically by using Chi-square feature
selection technique. We have shown via exploratory comparisons in terms of AUC that Naïve
Bayesian, Neural Networks, and k-Nearest Neighbor predict better than the Support Vector
Machine and Decision Tree due to the nature of this dataset. Through the implementation of
Chi-square feature selection technique in Rapidminer, it is demonstrated feature selection is an
important phase to enhance the mining quality.
References
[1] Usama M. Fayyad, S. George Djorgovski, Nicholas Weir, “Automating the Analysis and
Cataloging of Sky Surveys”, Advances in Knowledge Discovery and Data Mining, Cambridge,
MA: MIT Press, pp. 471–494, 1996.
[2] Abdullah H. Wahbeh, Qasem A. Al-Radaideh, Mohammed N. Al-Kabi, Emad M. Al-Shawakfa,
“A Comparison Study between Data Mining Tools over some Classification Methods”,
International Journal of Advanced Computer Science and Applications, The SAI Organization,
Special Issue on Artificial Intelligence. pp. 18-26, 2011.
[3] Jiawei Han, Micheline Kamber, Jilan Pei, “Data Mining: Concepts and Techniques”, vol. 5.
Morgan Kaufmann Publishers, USA, 2012.
[4] Richard Wortley, Lorraine Mazerolle, “Environmental Criminology and Crime Analysis”, Willan
Publishing, UK, 2008.
367
A Study on Classification Learning Algorithms to Predict Crime Status
Somayeh Shojaee, Aida Mustapha, Fatimah Sidi, Marzanah A. Jabar
[5] Tamas Abraham, Olivier de Vel, “Investigative Profiling with Computer Forensic Log Data and
Association Rules”, In Proceedings of the IEEE International Conference on Data Mining
(ICDM'02), pp. 11 – 18, 2002.
[6] Anna L. Buczak, Christopher M. Gifford, “Fuzzy Association Rule Mining for Community Crime
Pattern Discovery”, In ACM SIGKDD Workshop on Intelligence and Security Informatics
(ISIKDD '10), 2012.
[7] Kevin L. Priddy, Paul E. Keller, Artificial Neural Networks: An Introduction, SPIE Press, USA,
2005.
[8] Malathi. A, S. Santhosh Baboo, “Enhanced Algorithms to Identify Change in Crime Patterns”
International Journal of Combinatorial Optimization Problems and Informatics, Aztec Dragon
Academic Publishing, vol. 2, no.3, pp. 32-38, 2011.
[9] Chung-Hsien Yu, Max W. Ward, Melissa Morabito, Wei Ding, “Crime Forecasting Using Data
Mining Techniques”, In Proceedings of the 2011 IEEE 11th International Conference on Data
Mining Workshops (ICDMW '11), pp. 779-786, 2011.
[10] Shyam Varan Nath, “Crime Pattern Detection Using Data Mining”, In Proceedings of the
International Conference on Web Intelligence and Intelligent Agent Technology, pp. 41-44, 2006.
[11] Peter Phillips, Ickjai Lee, “Mining Top-k and Bottom-k Correlative Crime Patterns through Graph
Representations”, In Proceedings of the IEEE International Conference on Intelligence and
Security Informatics, pp. 25-30, 2009.
[12] Peter Phillips, Ickjai Lee, “Crime Analysis through Spatial Areal Aggregated Density Patterns”,
GeoInformatica, Springer, vol. 15, no. 1, pp. 49-74, 2011.
[13] Sikha Bagui, “An Approach to Mining Crime Patterns”, International Journal of Data
Warehousing and Mining, IGI Global, vol. 2, no. 1, pp. 50-80, 2006.
[14] Jeroen S. de Bruin, Tim K. Cocx, Walter A. Kosters, Jeroen F. J. Laros, Joost N. Kok, “Data
Mining Approaches to Criminal Career Analysis”, In Proceedings of the International Conference
on Data Mining, pp. 171-177, 2006.
[15] Michael Chau, Jennifer J. Xu, Hsinchun Chen, “Extracting Meaningful Entities from Police
Narrative Reports”, In Proceedings of the 2002 Annual National Conference on Digital
Government Research, pp. 1-5, 2002.
[16] Roslin V. Hauck, Homa Atabakhsh, Pichai Ongvasith, Harsh Gupta, Hsinchun Chen, “Using
COPLINK to Analyze Criminal-Justice Data”, Computer, IEEE Computer Society Press, vol. 35,
No. 3, pp. 30-37, 2002.
[17] Vincent Ng, Stephen Chan, Derek Lau, Cheung Man Ying, “Incremental Mining for Temporal
Association Rules for Crime Pattern Discoveries”, In Proceedings of the Australasian Database
Conference, pp. 123-132, 2007.
[18] Donald E. Brown, “The Regional Crime Analysis Program (RECAP): A Framework for Mining
Data to Catch Criminals”, In Proceedings of the International Conference on Systems, Man, and
Cybernetics, pp. 2848-2853, 1998.
[19] Michael Redmond, Alok Baveja, “A Data-driven Software Tool for Enabling Cooperative
Information Sharing Among Police Departments”, European Journal of Operational Research,
Science Direct, vol. 141, no. 3, pp. 660–678, 2002.
[20] Michael A. Redmond, Timothy Highley, “Empirical Analysis of Case-Editing Approaches for
Numeric Prediction”, Innovations in Computing Sciences and Software Engineering, Springer, pp.
79-84, 2010.
[21] Hsinchun Chen, Wingyan Chung, Jennifer Jie Xu, Gang Wang, Yi Qin, Michael Chau, “Crime
Data Mining: A General Framework and Some Examples”, Computer, IEEE Computer Society
Press, vol. 37, no. 4, pp. 50- 56, 2004.
[22] UCI Machine Learning Repository, Available:
[Link] [Accessed:
2011-03-02]).
[23] E.W.T. Ngai, Li Xiu, D.C.K. Chau, “Application of Data Mining Techniques in Customer
Relationship Management: A Literature Review and Classification”, Expert Systems with
Applications, Elsevier, vol. 36, no. 2, pp. 2592-2602, 2009.
[24] Ali Hamou, Andrew Simmons, Michael Bauer, Benoit Lewden, Yi Zhang, Lars-Olof Wahlund,
Eric Westman, Megan Pritchard, Iwona Kloszewska, Patrizia Mecozzi, Hilkka Soininen, Magda
Tsolaki, Bruno Vellas, Sebastian Muehlboeck, Alan Evans, Per Julin, Niclas Sjögren, Christian
368
A Study on Classification Learning Algorithms to Predict Crime Status
Somayeh Shojaee, Aida Mustapha, Fatimah Sidi, Marzanah A. Jabar
Spenger, Simon Lovestone, Femida Gwadry-Sridhar, “Cluster Analysis of MR Imaging in
Alzheimer’s Disease using Decision Tree Refinement”, International Journal of Artificial
Intelligence, vol. 6, no. S11, pp. 90-99, 2011.
[25] Ovidiu Ivanciuc, “Applications of Support Vector Machines in Chemistry”, Reviews in
Computational Chemistry, vol. 23, pp. 291-400, 2007.
[26] Yingxu Wang, Fugui Chen, Xilong Qu, “Research and Application of Large-Scale Data Set
Processing Based on SVM”, Journal of Convergence Information Technology(JCIT), AICIT, vol.
7, no. 16, pp. 195-200, 2012.
[27] Chen Zhenzhou, “Local Support Vector Machines with Clustering for Multimodal Data”,
Advances in information Sciences and Service Sciences(AISS), AICIT, vol. 4, no. 17, pp. 266-
275, 2012.
[28] Guoqiang Peter Zhang, “Neural Networks for Classification: A Survey, Systems, Man, and
Cybernetics, Part C: Applications and Reviews”, IEEE Transactions on, vol. 30, no. 4, pp. 451-
462, 2000.
[29] Jesse Davis, Mark Goadrich, “The Relationship between Precision-Recall and ROC Curves”,
In Proceedings of the 23rd International Conference on Machine Learning (ICML '06), pp. 233-
240, 2006.
[30] Charles X. Ling, Jin Huang, Harry Zhang, “AUC: A Better Measure than Accuracy in Comparing
Learning Algorithms”. In Proceedings of the 16th Canadian Society for Computational Studies of
Intelligence Conference on Advances in Artificial Intelligence (AI'03), pp. 329-341, 2003.
369
View publication stats