IDS in Telecommunication Network Using PCA
IDS in Telecommunication Network Using PCA
4, July 2013
ABSTRACT
Data Security has become a very serious part of any organizational information system. Internet threats have become more intelligent so it can deceive the basic security solutions such as firewalls and antivirus scanners. To enhance the overall security of the network an additional security layer such as intrusion detection system (IDS) has to be added. The anomaly detection IDS is a type of IDS that can differentiate between normal and abnormal in the data monitored. This paper proposes two types of IDS, one of them can be used as a network intrusion detection system (NIDS) with overall success (0.9161) and high detection rate (0.9288) and the other type can also be used as a host intrusion detection system (HIDS) with overall success (0.8493) and very high detection rate (0.9628) using NSL-KDD data set.
KEY WORDS
IDS, NIDS, HIDS, data mining, anomaly detection.
1.INTRODUCTION
In the age of information technology revolution the telecommunications networks have been developed from circuit switched network to packet switched network, after that it has Mutations enormous towards all-IP based networks. These developments make the communication of applications and services such as data and voice are being transferred on top of the IP-protocol [1]. The development of data transmission speeds in both uplink and downlink has increased considerably from the second generation (2G) of radio access networks to the third generation (3G) of radio access networks and the development of devices that subscribers of telecommunications networks make the boundary between computers and mobile phones has become unspecified. With the smart phones, the subscriber can do almost everything and can dispense on the basic personal computers. This means that the full data on the Internet is now in the hands of each smart phone owners. Technologies in communications networks have become more progress and it has raised new unwanted possibilities. Risks and threats that were applicable only in the fixed networks are now feasible in the radio access networks. The security systems have to become more intelligent because of threats are becoming more advanced. The basic security measurements such as firewalls and antivirus scanners cannot keep pace with the overgrowing number of intelligent attacks from the Internet. A solution to enhance the overall security of the networks is to add an additional security layer to increase the security layers by
DOI : 10.5121/ijcnc.2013.5412 147
International Journal of Computer Networks & Communications (IJCNC) Vol.5, No.4, July 2013
using intrusion detection systems (IDS). Intrusion Detection System (IDS) designed to complement other security measures based on attack prevention [2]. Amparo Alonso-Betanzos et al. [3] say The aim of the IDS is to inform the system administrator of any suspicious activities and to recommend specific actions to prevent or stop the intrusion. There are two types of intrusion detection, one of them is signature- based and the other is anomaly-based intrusion detection. The signature-based or misuse detection method use patterns of well-known attacks to identify intrusions [4]. The anomaly-based intrusion detection uses network traffic which has been monitored and compared versus any deviation from the established normal usage patterns to determine whether the current state of the network is anomalous. An anomalous traffic can considered as intrusion attempt. Misuse detection uses well-defined patterns known as signatures of the attacks. Anomaly-based detection builds a normal profile and anomalous traffic detected when the deviation from the normal model reaches a preset threshold level [5]. The anomaly-based intrusion detection depends on features selection. Well selection of features will maintain accuracy of the detection while speeding up its calculations. Therefore, any reduction in the number of features used for the detection will improve the overall performance of the IDS. If there are no useless features, focus on the most important ones expected to improve the execution speed of IDS. This increase in the detection speed will not affect accuracy of the detection in a significant way. Incorrect selection of the features may reduce the speed of the operation and reduce detection accuracy [6]. This aim of this paper is to improve the intrusion detection system by using Principal Component Analysis as a dimension reduction technique. The Paper Compares between two different features selections, i.e.6 features and 10 features. One of this features selections can be used in Network Intrusion Detection System (NIDS) and the other can be used in Host Intrusion Detection System (HIDS).
2.RELATIVE WORK
Chakraborty [7] has reported that the existence of irrelevant and redundant features generally affects the performance of machine learning part of the work. Chakraborty proved that good selection of the feature set results in better classification performance. A. H. Sung et al. [8] have demonstrated that the elimination of these unimportant and irrelevant features did not reduce the performance of the IDS. Chebrolu et al. [9] reported that an important advantage of combining redundant and complementary classifiers is to increase accuracy and better overall generalization. Chebrolu et al. [9] have also identified important input features in building IDS that are computationally efficient and effective. This work shows the performance of three feature selection algorithms: (1) Bayesian networks, (2) Classification and Regression Trees and (3) an ensemble of Bayesian networks and Classification and Regression Trees.
148
International Journal of Computer Networks & Communications (IJCNC) Vol.5, No.4, July 2013
Sung and Mukkamala [8], have explored SVM and Neural Networks that can categorize features with respect to their importance. Use SVM and Neural Networks to detect specific kinds of attacks such as probing, DoS, Remote to Local, and User to Root. Prove that the elimination of less importance and irrelevant features has no effect on reducing the performance of the IDS. Chebrolu et al. [9] suggested CART-BN approach, where CART has a better performance for Normal, Probe and U2R and the ensemble approach worked has a better performance for R2L and DoS. Meanwhile, A. Abraham et al. [10] proved that ensemble of Decision Tree was suitable for Normal, LGP for Probe, DoS and R2L and Fuzzy classifier was good for R2L attacks. A. Abraham et al. [11] prove the ability of their suggested on Ensemble structure in modelling lightweight distributed IDS. Manasi Gyanchandani et al. [12] improved the performance of C4.5 classifier over NSL-KDD dataset using different classifier combinations techniques such as bagging, boosting and stacking. Gholam Reza Zargar et al. [2] show that dimension reduction and identification of effective network features for category-based selection can reduce the processing time in an intrusion detection system while maintaining the detection accuracy within an acceptable range.
Since each feature contributes equally to the calculation of the Euclidean distance, this distance is undesirable when different features measured on different scales or the features have very different variability. The effect of the features that have high variability or large scales of measurement would control others that have less variability or smaller scales. As an alternative, a measure of variability can be incorporated into the distance metric directly. One of these metrics is the well-known Mahalanobis distance d 2 (x, y) = (x y) S1 (x y) Where S is the sample covariance matrix. (2)
International Journal of Computer Networks & Communications (IJCNC) Vol.5, No.4, July 2013
are particular linear combinations of the p random variables {x1, x2, x3, , xp} with three important properties. The first one is the principal components are uncorrelated. The second one is the first principal component has the highest variance and the second principal component has the second highest variance and so on. The third one is the total variation in all the principal components combined equal to the total variation in the original variables {X1, X2, X3, , Xp }. The new variables with such properties are easily obtained from eigenanalysis of the covariance matrix or the correlation matrix of {X1, X2, X3, , Xp } [14]. Let the original data X be a n x p data matrix of n observations on each of p variables (X1, X2, , Xp) and let R be a p x p sample correlation matrix of X1, X2 , , Xp. If (1, e1), (2, e2), (3, e3), (p, ep) are the p eigenvalue and eigenvector Pairs of the matrix R, 0, then ith sample principal component of an observation vector x= (x1, x2, x3, , xp) is yi = ei z yi = ei1z1 + ei2 z2 + ei3z3 +...+ eip zp , i =1,2,3,.., p Where e = (e , e , e ,..., e ) is the ith eigenvector. And Z = (z , z2, z3, , zp ) is the vector of standardized observations defined as z = x x , k=1, 2, 3, ..., p (3)
(4)
Where x is the sample mean of the variable x . The ith principal component has sample variance and the sample covariance or correlation of any pair of principal components is equal to zero. The PCA produces a set of independent variables so the total variance of a sample is the sum of all the variances accounted for by the principal components. The correlation between any two variables is , =
( , )
(5)
Where is the standard deviation of x which is a sample of data. The principal components of the sample correlation matrix have the same properties as principal components from a sample covariance matrix. As all principal components are uncorrelated, the total variance in all of the principal components is + + + = p (6)
The principal components produced by the covariance matrix are different from the principal components produced by the correlation matrix. Eigenvalues have larger weights because of some values are much larger than others. Since The NSL-KDD data set has many items with varying scales and ranges so the correlation matrix will use.
International Journal of Computer Networks & Communications (IJCNC) Vol.5, No.4, July 2013
along all of the principal components. Second, the data sample can represent by the axes of eigenvectors of the principal components. Those axes considering a normal when the data sample is the training set of normal network connections. If any points lie outside these axes by far distance then the data connection would exhibit abnormal data connection. Outliers measured using the Mahalanobis distance are presumably network connections that are anomalous, any network connection with a distance greater than the threshold value (t) is considered an outlier. In this work, any outlier represents an attack. Consider the sample principal components of an y , y , , y observation x where yi = ei z , i =1,2,... , p z = x x , k=1, 2, 3, ..., p
The sum of scores that are squares of the partial principal component is equal to the principal component score = + + + (7)
Equating to the Mahanobolis distance of the observation X from the mean of the normal sample data set [15]. Anomaly detections Needs an offline training or learning phase whether those methods are outlier detection, statistical models, or association rule mining. PCA has two clearly separate phases (the offline and online detection phases). These two separate phases are an advantage for hardware implementation. Another advantage of PCA is reduction of features. As we will show in our experiment, PCA effectively reduces the number of processed features from 41 to 10 or 6 features. The outline steps involved in PCA are shown in (figure 1). Training data take as input and a mean vector of each sample calculate in the offline phase. Ideally, these data sets are a snapshot of activity connections in a real network environment. In addition, these data sets should contain only normal connections. Second, correlation matrixes calculate from the training data. A correlation matrix normalizes all of the data by calculating the standard deviation. Next, eigenanalysis performed on the correlation matrix to create independent orthonormal eigenvalue and eigenvector pairs. The set of principal components can use in online analysis because of these pairs. Finally, the sets of principal components sort by eigenvalue in descending order. The eigenvalue is a relative measure of the variance of its corresponding eigenvectors. Using dimensionality-reducing method such as PCA to extract the most significant principal components, so only a subset of the most important principal components needs to classify any new data. In addition to using the most significant principal components (q) to find intrusions, we have found that it is helpful to look for intrusions along a number of least-significant components (r) as well. The major principal component score calculated by the most significant principal components and the minor principal component score calculated by the less significant principal components. Major principal component score (MajC) is used to detect severe deviations with large values of the original features. These observations follow the correlation structure of the sample data. Minor principal component score (MinC) is used to detect attacks may not follow the same correlation model. In this work, two thresholds needed to detect attacks. If the principal
151
International Journal of Computer Networks & Communications (IJCNC) Vol.5, No.4, July 2013
components sorted in descending order, then (q) is a subset of the highest values and is a subset of the smallest components. The MajC threshold is referred (t ) while the MinC threshold is referred to (t ). An observation (x) is an attack if >t Or >t (8)
The online portion takes major principal components and minor principal components and maps online data into the eigenspace of those principal components
International Journal of Computer Networks & Communications (IJCNC) Vol.5, No.4, July 2013
dataset was developed NSL-KDD [18]. One copy of each repeated record was not removed in the KDD train and test set.
Recall: The percentage of the total relevant documents in a database retrieved by your search computes as recall = (11) Precision: The percentage of relevant documents in relation to the number of documents retrieved is calculated as precision = (12)
The overall success rate is the number of correct classifications divided by the total number of classifications is calculated as success rate = (13) error rate = 1 success rate (14)
153
International Journal of Computer Networks & Communications (IJCNC) Vol.5, No.4, July 2013
Feature name Duration Protocol Type Service Src-bytes Dst-bytes Flag Count Sev-count Dst-host-count Dst-host-srv-count
Description Number of seconds of the connection Type of the protocol, e.g. tcp, udp, icmp . Network service on the destination, e.g., http, telnet, https, etc Number of data bytes from source to destination Number of data bytes from destination to source Normal or error status of the connection Number of connections from the same source as the current connection in the past two seconds Number of connections to the same service as the current connection in the past two seconds from the same source Number of connections to the same host as the current connection in the past two seconds Number of connections to the same service as the current connection in the past two seconds to the same host
Type Continuous (1) Discrete (1) Discrete (1) Continuous (1) Continuous (1) Discrete (1) Continuous (3) Continuous (3) Continuous (3) Continuous (3)
We used a Matlab program to design our IDS. Based on [22], we suggest using (q) major components that can explain about 50 - 70 percents of the total variation in the standardized features. When the original features are uncorrelated, each principal component from the correlation matrix has an eigenvalue equal to 1. So the minor components are those components whose variances or eigenvalues are less than 0.20, which would indicate some relationships among the features (r). First step we selected 6 features and suggested using q = 3, r =0. Second step we added 4 features and suggested using q= 3, r =2. In a multiclass prediction, the result on a test set is often displayed as a two dimensional confusion matrix with a row and a column for each class. Each matrix element shows the number of test examples for which the actual class is the row and the predicted class is the column. Good results correspond to large numbers down the main diagonal and small, ideally zero, off-diagonal elements. The confusion Matrix is showed on the (Table 2). The Performance Measures are shown in (Table 3) and (Table 4).
Table 2 Confusion Matrix
154
International Journal of Computer Networks & Communications (IJCNC) Vol.5, No.4, July 2013
Attacks Exist Detection from step (1) Detection from step (2)
R2l 209 28 32
U2r 11 1 2
Step (1) Normal Anomaly class class 0.9050 0.0712 0.9357 0.9288 0.0949 0.8952 0.9161 0.0839
Step (2) Normal Anomaly class class 0.7503 0.0372 0.9584 0.9628 0.2496 0.7719 0.8493 0.1507
Both recall and precision have good value in these two steps but one of steps can be used as NIDS another can be used as HIDS which has a better detection rate.
155
International Journal of Computer Networks & Communications (IJCNC) Vol.5, No.4, July 2013
ACKNOWLEDGEMENTS
Thanks to everyone who helped me in carrying out this work to the fullest REFERENCES
[1] Kumar, A., Maurya, H. C., Misra, R. (April 2013). A Research Paper on Hybrid Intrusion Detection System. International Journal of Engineering and Advanced Technology (IJEAT), volume-2, Issue-4, ISSN: 2249-895 Zargar, G. R. (October 2012). Category Based Intrusion Detection Using PCA. International Journal of Information Security, 3, 259-271. Amparo, A. B., Noelia, S. M., Flix, M. C., Juan, A. S. and Beatriz, P. S. (25-27 April 2007). Classification of Computer Intrusions Using Functional Networksa Comparative Study. Proceedings of European Symposium on Artificial Neural Networks (ESANN), Bruges. pp 579584. Ilgun, K., Kemmerer, R. A. and Porras, P. A. (1995). State Transition Analysis: A Rule-Based Intrusion Detection Approach. IEEE Transaction on Software Engineering, Vol. 21, No. 3, pp. 181-199. Guyon, I. and Elisseff, A. (2003). An Introduction to Variable and Feature Selection. Journal of Machine Learning Research, Vol. 3, pp. 1157-1182. Chou, T. S. Yen, K. K. and Luo, J. (2008). Network Intrusion Detection Design Using Feature Selection of Soft Computing Paradigms. International Journal of Computational Intelligence, Vol. 4, No. 3, pp. 196-208. Chakraborty, B. (2005). Feature Subset Selection by Neuro-Rough Hybridization. Lecture Notes in Computer Science (LNCS), Springer, Heidelberg. Sung, A. H. and Mukkamala, S. (2003). Identifying Important Features for Intrusion Detection Using Support Vector Machines and Neural Networks. Proceedings of International Symposium on Applications and the Internet (SAINT) pp. 209-216. Chebrolu, S. Abraham, A. and Thomas, J. (2005). Feature Deduction and Ensemble Design of Intrusion Detection Systems. Computers and Security, Elsevier Science, Vol. 24, No. 4, pp. 295307. Abraham, A. and Jain, R. (2004). Soft Computing Models for Network Intrusion Detection systems, Springer, Heidelberg. Abraham, A. Grosan, C. and Vide, C. M. (2007) Evolutionary Design of Intrusion Detection Programs, International Journal of Network Security, Vol. 4, No.3, pp. 328 -339. Gyanchandani, M. Yadav, R. N. Rana, J. L. (December 2010). Intrusion Detection using C4.5: Performance Enhancement by Classifier Combination. International Journal on Signal and Image Processing, Vol. 1, No. 03 Boutsidis, C. Mahoney, M. W. and Drineas, P. (2008). Unsupervised Feature Selection for Principal Components Analysis. Proceedings of the 14th ACM Sigkdd International Conference on Knowledge Discovery and Data Mining, Las Vegas, pp. 61-69 Jolliffe, I. T. (2002). Principal component analysis. 2 Ed. Springer, Verlag, NY. Jobson, J. D. (1992). Applied Multivariate Data Analysis, Volume II: Categorical and Multivariate Methods. New York: Springer Verlag. Stolfo, J. Fan, W. Lee, W. Prodromidis, A. and Chan, P.K. (2000). Cost-based modeling and evaluation for data mining with application to fraud and intrusion detection. DARPA Information Survivability Conference. The KDD Archive. KDD99 cup dataset, 1999: http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html Tavallaee, M. Bagheri, E. Lu, W. and Ghorbani, A. (2009). A Detailed Analysis of the KDD CUP 99 Data Set. Proceedings of the Second IEEE Symposium on Computational Intelligence for Security and Defense Applications (CISDA). Srinivasulu, P. Nagaraju, D. Ramesh Kumar, P. and Nagerwara Rao, K. (June 2009). Classifying the Network Intrusion Attacks using Data Mining Classification Methods and their Performance Comparison. International Journal of Computer Science and Network Security, Vol.9 No.6, pp 1118. Shyu, M. Chen, S. Sarinnapakorn, K. and Chang, L. (2003). A novel anomaly detection scheme based on principal component classifier. Proceedings of the IEEE foundation and New Directions
[2] [3]
[4]
[5] [6]
[7] [8]
[9]
[13]
[17] [18]
[19]
[20]
156
International Journal of Computer Networks & Communications (IJCNC) Vol.5, No.4, July 2013
[21] [22]
of Data Mining Workshop, in conjunction with the Third IEEE International Conference on Data Mining (ICDM03), pp. 172-179 The NSL-KDD Data set: http://nsl.cs.unb.ca/NSL-KDD/ Shyu, M. Chen, S. Sarinnapakorn, K. Chang, L. (2003). A Novel Anomaly Detection Scheme Based on Principal Component Classifier. IEEE Foundations and New Directions of Data Mining Workshop, in conjunction with ICDM'03, pp. 171-179.
AUTHORS Mohamed Faisal received the B.sc degree from Assiut University (in 2010). After working as a Network security engineer (from 2011) in information network at Sohag University and Research Assistant in the Department of Electrical Engineering, at Sohag University (from 2011), He has been a demonstrator in MUST University (since2012). He finished his Preliminary Master in June 2012 in the Department of Electrical Engineering, at Assuit University. Tarik Kamal received the B.sc. and M.sc. degrees, from Assuit University in 1975 and 1980, respectively. He received the Dr. Eng. degree from France in 1986. After working as a demonstrator (from1975) and as an assistant lecturer (from 1981), He has been a lecturer in the Department of Electrical Engineering at Assuit University since 1987. His research interest includes signal processing, image processing and communication network. He is a supervisor of Information network at Assiut University. Abdel-Fattah Mahmoud received the B.sc. and M.sc. degrees, from Assuit University in 1976 and 1981, respectively. He received the Dr. Eng. degree from Maryland University in 1990. After working as demonstrator (from1978), Assistant Lecturer (from 1981) in Assuit University, Visitor Professor of Department of Mechanical Engineering, University of Texas, United States of America (from September 1991 to August 1993), associate professor (from 1995) in Assuit University, Visitor Professor of the Department of Electrical Engineering, Kanazawa University, Japan, (from April 1996 to April 1997) and Visitor Professor of the University Technology in Malaysia (from February 2006 - March 2006), he has been a professor in the Department of Electrical Engineering, Assuit University since 2000. He has been a dean of Engineering College, Assuit University since 2011.
157