Crime Predictionand Analysis
Crime Predictionand Analysis
net/publication/343750790
CITATIONS READS
36 10,388
5 authors, including:
Lokesh Chouhan
National Forensic Sciences University Gandhinagar
52 PUBLICATIONS 620 CITATIONS
SEE PROFILE
All content following this page was uploaded by Akanksha Gahalot on 03 November 2020.
Abstract—Crime is one of the dominant and alarming aspect in crime patterns[2]. Crime is classified into various types
of our society. Everyday huge number of crimes are committed, like kidnapping, theft murder, rape etc. The law enforcement
these frequent crimes have made the lives of common citizens agencies collects the crime data information with the help
restless. So, preventing the crime from occurring is a vital task.
In the recent time, it is seen that artificial intelligence has shown of information technologies(IT). But occurrence of any crime
its importance in almost all the field and crime prediction is one is naturally unpredictable and from previous searches it was
of them.However, it is needed to maintain a proper database of found that various factors like poverty,employment affects the
the crime that has occurred as this information can be used for crime rate [3]. It is neither uniform nor random[4]. With rapid
future reference. The ability to predict the crime which can occur increase in crime number, analysis of crime is also required.
in future can help the law enforcement agencies in preventing
the crime before it occurs. The capability to predict any crime on Crime analysis basically consists of procedures and methods
the basis of time, location and so on can help in providing useful that aims at reducing crime risk. It is a practical approach to
information to law enforcement from strategical perspective. identify and analyse crime patterns. But, major challenge for
However, predicting the crime accurately is a challenging task law enforcement agencies is to analyse escalating number of
because crimes are increasing at an alarming rate. Thus, the crime data efficiently and accurately. So it becomes a difficult
crime prediction and analysis methods are very important to
detect the future crimes and reduce them. In Recent time, challenge for crime analysts to analyse such voluminous crime
many researchers have conducted experiments to predict the data without any computational support. A powerful system
crimes using various machine learning methods and particular for predicting crimes is required in place of traditional crime
inputs. For crime prediction, KNN, Decision trees and some other analysis because traditional methods cannot be applied when
algorithms are used. The main purpose is to highlight the worth crime data is high dimensional and complex queries are to
and effectiveness of machine learning in predicting violent crimes
occurring in a particular region in such a way that it can be used be processed. Therefore a crime prediction and analysis tool
by police to reduce crime rates in the society. were needed for identifying crime patterns effectively. This
Index Terms—Machine Learning, Crime Prediction, K-Nearest paper introduces some methodologies with the help of which
Neighbor, Decision trees. it can be predicted that at what place and time which type
of crime has a higher probability of occurrence. Classification
I. INTRODUCTION helps in extracting features and predict future trends in crime
Crime is increasing considerably day by day. Crime is data based on similarities. Methodologies used in this study
among the main issues which is growing continuously in are Extra Tree Classifier, K-Neighbour Classifier, Support
intensity and complexity[1]. Crime patterns are changing con- Vector Machine (SVM), Decision Tree Classifier and Artificial
stantly because of which it is difficult to explain behaviours Neural Network (ANN). The paper organisation is as follows.
The introduction of the study is described in Section one.
Section II consists of the related works. Section III discusses
Authorized licensed use limited to: Carleton University. Downloaded on September 21,2020 at 03:45:09 UTC from IEEE Xplore. Restrictions apply.
the methodology for crime prediction methods. Section IV predictions. It is split in two major classes one is Regression
discusses its implementation. Section V consists of discussion. and other is classification of patterns. Regression models
Section VI consists of Conclusion. Section VII discusses the are based upon analysis of the relationship that are present
future scope. between trends and variable in order to make predictions about
the continuous variables. Whereas, the job of classification is
II. RELATED WORK to assign a particular class labels to a data value as output
Many researches have been done which address this prob- of the prediction. Division of pattern classification is in two
lem of reducing crime and many crime-predictions algorithms ways i.e., Supervised and Unsupervised learning. It is already
has been proposed.The prediction accuracy depends upon on known in supervised learning that which class labels are to
type of data used, type of attributes selected for predic- be used for building classification models. In unsupervised
tion.In[5], mobile network activity was used to obtain human learning, these class labels are not known. This paper deals
behavioural data which was used to predict the crime hotspot with supervised learning.
in London with an accuracy of about 70% when predicting
that whether a specific area in London city will be a hotspot
for crime or not. IN[6], data collected from various websites,
newsletter was used for prediction and classification of crime
using Naive Bayes algorithm and decision trees and found that
former performed better. In[7], a thorough study of various
crime prediction method like Support Vector Machine(SVM),
Artificial neural networks(ANN) was done and concluded
that there does not exist particular method which can solve
different crime datasets problems. IN[8], various supervised
learning techniques, unsupervised learning technique[9] on
the crime records were done which address the connections
between crime and crime pattern for the purpose of knowledge
discovery which will help in increasing predictive accuracy
of crime. In [10], different approach for predicting like Data
mining technique, Deep learning technique, Crime cast tech-
nique, Sentimental analysis technique were discussed and it
was found that every method have some cons and pros. Every
method gives better result for a particular instance. Clustering Fig. 1. Architecture
approaches were used for detection of crime and classification
method were used for the prediction of crime, [11]. The
K-Means clustering was implemented and their performance A. Data collection and Pre-processing
is evaluated on the basis of accuracy. On comparing the Data collection is a process in which infor-
performance of different clustering algorithm DBSCAN gave mation is gathered from many sources which
result with highest accuracy and KNN classification algorithm is later used to develop the machine learn-
is used for crime prediction. Hence, this system helps law inghttps://www.overleaf.com/project/5deb178c0230af000196cd16
enforcement agencies for accurate and improved crime anal- models. The data should be stored in a way that makes sense
ysis.In [12], a comparison of classification algorithms, Naı̈ve for problem.
Bayes and decision tree was performed with an data mining Data pre-processing basically involves methods to remove
software, WEKA. The datasets for this study was obtained the infinite or null values from data which might affect the
from USCensus 1990. In [13], the pattern of road accidents in performance of the model. In this step the data set is converted
Ethiopia were studied after taking into consideration various into the understandable format which can be fed into machine
factors like the driver, car, road conditions etc.Different classi- learning models.
fication algorithms used were K-Nearest Neighbour, Decision
tree and Naive Bayes on a dataset containing around 18000 B. Model selection
datapoints. The prediction accuracy for all three methods was 1) Support Vector Machine: Support Vector Machine per-
between 79% to 81%. forms well for regression, time prediction series and classifi-
cation problems. Support vector machine performance can be
III. METHODOLOGY measured against Recurrent Neural Network. Thus, SVM had
Predictive modeling was used for making predictions since been applied in predicting hot-spots of crime [16] and predict-
it has the method which is able to build a model and has the ing diseases like diabetic and pre-diabetic. Since it can make
capability to make predictions. This method consists of differ- prototype of nonlinear relations in a coherent way. It performs
ent algorithms of Machine Learning that can study properties well for anticipation of time series. For a predetermined degree
from the data used for training which is used for producing of crime and data set it has to select a subset using K-clustering
Authorized licensed use limited to: Carleton University. Downloaded on September 21,2020 at 03:45:09 UTC from IEEE Xplore. Restrictions apply.
algorithm of crime data set and will determine a label for each
data point in the set that is selected. Point where the crime
rate is below given rate are called hotspots and where it is
above given rate are called coldspots.
2) K-nearest Neighbour: It is used for finding correlation
between the test set and train set. If the given test set is close
to the train set then it is assigned the class label of training
set. The major limitation that emerges is when training set has
less number of data points. To enhance it diverse techniques
like K-NN algorithm has been used. This technique belongs
to supervised learning domain. It finds its applications in data
mining, intrusion detection and pattern recognition. In this the
result is a membership of class. An object is categorized by
neighbour’s mass votes, where the object is being allocated to
the most familiar of its k-nearest neighbours.
Fig. 3. Example of a decision tree
Authorized licensed use limited to: Carleton University. Downloaded on September 21,2020 at 03:45:09 UTC from IEEE Xplore. Restrictions apply.
removed. There also exist two features Descript and Resolution calculated. For KNN the parameters used were njobs and
are considered redundant as they does not exist in testing weights and their values were set to minus one and balanced
values so they were removed. respectively. Similarly for Decision tree the parameter used
was class weight and whose value was set to balance and
similarly the parameter tuning was done for other models to
get best output from each case possible. For MLP after trying
different parameter tuning it was found that it worked best for
100 hidden layers and adam solver. Here as evident from the
graph that SVC is taking lot of training time hence this cannot
be considered. Where as if we look for MLP it does not have
good value for accuracy and f-beta score .So,MLP is not a
good option.
(2 ∗ (rc ∗ q))
F − betascore =
(rc + q)
Authorized licensed use limited to: Carleton University. Downloaded on September 21,2020 at 03:45:09 UTC from IEEE Xplore. Restrictions apply.
function much faster. Same is seen with the case of San
Francisco data set in which neural networks performed better
for testing data set than the other algorithm. As proposed by
[16], the problem lies in finding out techniques which can
analyse efficiently the growing data set of crime. The accuracy
for predicting crime is basically depends upon on the crime
data set used. If used training data set is very large, then model
will be trained with very good accuracy while if the data
set used for training purpose is having less size, then small
degree of training is attained. Also,the prediction accuracy also
dependent on dimension of training data set. This will give
more right results if the model is highly trained and will not
give good results if the model is not trained properly.
Fig. 6. train-time for different models
VI. C ONCLUSION
Crime prediction is one the current trends in the society.
Crime prediction intends to reduce crime occurrences. It does
this by predicting which type of crime may occur in future.
Here, analysis of crime and prediction are performed with the
help of various approaches some of which are KNN, Artificial
Neural network, Decision trees, Extra trees and Support vector
machine.From the results obtained we saw that the training
time of SVM is very high thus it should be avoided for this
dataset. For MLP we saw that its accuracy is very low hence
MLP is not working good for this dataset. Here we can see that
for this data set Decision tree,KNN and Extra tree classifier
are working best with optimal training and good accuracy.
Fig. 7. f-score for different models
However which model will work best is totally dependant on
the dataset that is being used.
Authorized licensed use limited to: Carleton University. Downloaded on September 21,2020 at 03:45:09 UTC from IEEE Xplore. Restrictions apply.
[6] Shiju Sathyadevan,Devan M. S.,Surya S Gangadharan, First,”Crime
Analysis and Prediction Using Data Mining” International Conference
on Networks Soft Computing (ICNSC), 2014.
[7] Sunil Yadav, Meet Timbadia, Ajit Yadav, Rohit Vishwakarma
and Nikhilesh Yadav,”Crime pattern detection,analysis and predic-
tion,International Conference on Electronics, Communication and
Aerospace Technology(ICECA), 2017.
[8] Amanpreet Singh,Narina Thakur,Aakanksha Sharma,”A review of su-
pervised machine learning algorithms”,3rd International Conference on
Computing for Sustainable Global Development,2016.
[9] Bin Li,Yajuan Guo,Yi Wu,Jinming Chen,Yubo Yuan,Xiaoyi Zhang,”An
unsupervised learning algorithm for the classification of the protection
device in the fault diagnosis system”,in China International Conference
on Electricity Distribution (CICED),2014.
[10] Varshitha D N Vidyashree K P,Aishwarya P Janya T S,K R Dhananjay
Gupta Sahana R,”Paper on Different Approaches for Crime Prediction
system”,International Journal of Engineering Research Technology
(IJERT), ISSN: 2278-0181, 2017
[11] R. Iqbal, M. A. A. Murad, A. Mustapha, P. H. Shariat Panahy, and N.
Khanahmadliravi, ”An experimental study of classification algorithms
for crime prediction,” Indian J. of Sci. and Technol., vol. 6, no. 3, pp.
4219- 4225, Mar. 2013.
[12] Malathi. A,Dr. S. Santhosh Baboo,”An Enhanced Algorithm to Predict
a Future Crime using Data Mining”,International Journal of Computer
Applications (0975 – 8887) Volume 21– No.1, May 2011.
[13] T. Beshah and S. Hill, ”Mining road traffic accident data to improve
safety: role of road-related factors on accident severity in Ethiopia,”
Proc. of Artificial Intell. for Develop. (AID 2010), pp. 14-19, 2010.
[14] K.B.S. Al-Janabi, “A Proposed Framework for Analyzing Crime Data
Set using Decision Tree and Simple K-Means Mining Algorithm,” in
Journal of Kufa for Mathematics and Computer, Vol. 1, No. 3, 2011,
pp. 8-24.
[15] A. Malathi, S.S. Baboo, “An Enhanced Algorithm to Predict a Future
Crime using Data Mining,” in International Journal of Computer Appli-
cations, Vol. 21, 2011, pp. 1-6.
[16] C.H. Yu, “Crime Forecasting Using Data Mining Techniques,” in 11th
International Conference on Data Mining Workshop, 2011, pp. 779-786.
Authorized licensed use limited to: Carleton University. Downloaded on September 21,2020 at 03:45:09 UTC from IEEE Xplore. Restrictions apply.
View publication stats