0% found this document useful (0 votes)
48 views6 pages

Fingerprinting Attack On Tor Anonymity U

This paper presents a method for conducting fingerprinting attacks on Tor traffic using Stacked Denoising Autoencoder (SDAE) deep learning technology. The proposed method achieves an accuracy of 0.88 in closed-world tests and a true positive rate of 0.86 with a false positive rate of 0.02 in open-world tests, demonstrating its effectiveness in identifying users accessing illegal websites. The document outlines the technical background, related work, and evaluation results of the proposed approach.

Uploaded by

giang161004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views6 pages

Fingerprinting Attack On Tor Anonymity U

This paper presents a method for conducting fingerprinting attacks on Tor traffic using Stacked Denoising Autoencoder (SDAE) deep learning technology. The proposed method achieves an accuracy of 0.88 in closed-world tests and a true positive rate of 0.86 with a false positive rate of 0.02 in open-world tests, demonstrating its effectiveness in identifying users accessing illegal websites. The document outlines the technical background, related work, and evaluation results of the proposed approach.

Uploaded by

giang161004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

View metadata, citation and similar papers at [Link].

uk brought to you by CORE


provided by Proceedings of the Asia-Pacific Advanced Network

Proceedings of the APAN – Research Workshop 2016


ISBN 978-4-9905448-6-7

Fingerprinting Attack on Tor Anonymity


using Deep Learning
Kota Abe and Shigeki Goto

 II explains the technical background. Section III describes


Abstract— Tor is free software that enables anonymous related work. Our new method is proposed in Section IV.
communication. It defends users against traffic analysis and Section V shows the evaluation results. Section VI concludes
network surveillance. It is also useful for confidential business the paper.
activities and state security. At the same time, anonymized
protocols have been used to access criminal websites such as those
dealing with illegal drugs. This paper proposes a new method for II. TECHNICAL BACKGROUND
launching a fingerprinting attack to analyze Tor traffic in order A. Tor Anonymity
to detect users who access illegal websites. Our new method is
based on Stacked Denoising Autoencoder, a deep-learning Tor [1, 2] is a popular anonymized protocol. Figure 1 shows
technology. Our evaluation results show 0.88 accuracy in a an example of a Tor configuration. At the initial setting, there
closed-world test. In an open-world test, the true positive rate is are three nodes between a user and a web server, as shown in
0.86 and the false positive rate is 0.02. Figure 1. Tor traffic data is encrypted using Transport Layer
Security (TLS) between a user and each Tor node. Thus, Tor
Index Terms— Network Security, Tor, Fingerprinting Attack,
nodes do not know the original plain data, with one exception.
Deep Learning, Autoencoder
The closest node to the web server can read the original data
without encryption. In a Tor configuration, each node knows
I. INTRODUCTION only the Internet Protocol (IP) addresses of adjacent nodes that
are directly connected to the node.
The Onion Router (Tor) is free software that enables In the Tor protocol, content data is encapsulated into a series
anonymous communication. [1, 2]. It defends users against of cells, each with a fixed length of 512 bytes. It is difficult to
traffic analysis and network surveillance. It is also useful for estimate the original content only from the packet length.
confidential business activities and state security. At the same
time, anonymized protocols have been used to access criminal User
websites such as those dealing with illegal drugs. There is a (victim of
need to develop a method that can identify websites when fingerprinting attack) Tor Server 2 Web Server
anonymized protocols are used.
This paper proposes a new method for launching a
fingerprinting attack to analyze Tor traffic in order to detect
users who access illegal websites. Using a fingerprinting Tor Server 1
attack, we can identify a website that a user accesses on the Tor Server 3
basis of traffic features such as packet length, number of Fig. 1. Configuration of Tor
packets, and time. We can analyze this information from
captured packets regardless of encryption. Our new method for B. Fingerprinting Attacks on a Website
fingerprinting attacks is based on Stacked Denoising 1) Fingerprinting
Autoencoder (SDAE), a deep-learning technology. Our A website fingerprinting attack aims to detect a website
evaluation results show 0.88 accuracy is in a closed-world test. even if the traffic is encrypted using Tor or a virtual private
In an open world test, the true positive rate (TPR) and false network (VPN). We cannot specify the website by inspecting
positive rate (FPR) are 0.86 and 0.02, respectively. the encrypted payload. However, we can utilize the packet
The remainder of this paper is organized as follows. Section information, such as packet length, number of packets, and
time. In a fingerprinting attack, we can specify a website by
Kota Abe and Shigeki Goto are with the Department of Computer Science providing the packet information.
and Engineering, Waseda University, Shinjuku, Tokyo 169-8555 Japan There are two methods for capturing traffic data in Tor. In
e-mail: (see [Link] the first method, an attacker (analyzer) prepares an entry node

15
of Tor and captures the traffic through this node. However, the In formula (1), the input layer is represented as a vector 𝒙, the
Tor protocol selects nodes at random. It is unlikely that a output of the hidden layer as a vector 𝒉, and weights from the
specific victim connects to the attacker's node. In the second input layer to the hidden layer as a matrix 𝑾 and vector 𝒃. The
method, an attacker (analyzer) is a network operator, such as vector 𝒃 represents bias terms. We also define an activation
an Internet service provider (ISP). He or she can capture traffic function 𝑓. Data propagation from the input layer to the hidden
packets between a victim and the entry node of Tor. This is a layer is calculated using formula (1).
realistic scenario. This paper proposes a new approach using 𝒉 = 𝑓(𝑾𝒙 + 𝒃) (1)
the second method. Similarly, we define the output from the output layer as a
2) Closed- and Open-World Tests vector 𝒚, and the weights from the hidden layer to the output
There are two evaluation schemes for fingerprinting attacks. layer are represented as a matrix 𝑾′ and vector 𝒃′. The vector
The first scheme is a closed-world test. It conducts a test in 𝒃′ consists of bias terms. We also define an activation function
which a victim can access only a limited number of websites, 𝑓′. Data propagation from the hidden layer to the output layer
which the attacker attempts to detect. For example, an attacker is calculated using formula (2).
might prepare 100 monitored sites and investigate the features 𝒚 = 𝑓′(𝑾′𝒉 + 𝒃′) (2)
of these 100 websites. The victim can access only these 100 The autoencoder determines the weights 𝑾 and 𝑾′ that
websites. equalize the input 𝒙 and output 𝒚. The weights are calculated
The second scheme is an open-world test. In such a test, a using formula (3), which minimizes the difference between the
victim can freely access any websites on the Internet. The input data {𝒙𝒊 , … } and output 𝒚.
attacker must be able to determine whether a website is
monitored or non-monitored. If it is a monitored website, the min ∑‖𝒙𝒊 − 𝑓′(𝑾′ 𝑓(𝑾𝒙𝒊 + 𝒃) + 𝒃′)‖22 (3)
𝑾,𝒃,𝑾′ ,𝒃′
attacker must be able to determine which website among the 𝑖
Using an autoencoder, we can decrease the dimensions of
100 monitored sites it is. This paper uses two evaluation
schemes, closed and open. data vectors. The dimension of 𝒉 is less than that of 𝒙 or 𝒚.
The output vector 𝒉 of the hidden layer is used as a feature
C. SDAE vector in machine learning.
Deep learning is an attractive method in machine learning. It We can combine multiple autoencoders by overlapping a
is called deep because it utilizes a multiple-layered neural hidden layer as an input of the second autoencoder. This type
network. An autoencoder is a deep-learning technique. This of autoencoder is called Stacked Autoencoder (SAE). Figure 3
paper uses SDAE. shows an example.
An autoencoder is a neural network that consists of input,
hidden, and output layers. Figure 2 shows an example of an Autoencoder 1
autoencoder. It calculates weights on directed edges in Figure
2 by learning from input data. One specific autoencoder Input Hidden Autoencoder 2
feature is that the input data (vector) and the output data
Input Hidden
(vector) must be equal.

W W

Input Hidden Output


Fig. 3. Structure of a Stacked Autoencoder

It can be meaningful to add noise to an input vector. This


x h y
type of autoencoder is called a denoising autoencoder (DAE).
Input Hidden Output By adding noise data, an autoencoder can avoid overlearning
Fig. 2. Structure of an Autoencoder or overfitting, with the result that formula (3) is satisfied only
for the training data. Noise is sometimes useful to generalize
An autoencoder is represented by a mathematical formula. the training data. A DAE can attain higher accuracy.

16
We can further combine multiple DAEs similarly to SAEs, religious or political contents. Access to these sites is blocked
This type of autoencoder is called SDAE. This paper uses in China, United Kingdom, and Saudi Arabia. Non-monitored
SDAEs. We use Pylearn2 software [3] as a deep-learning tool. sites consist of Alexa’s list [7], which covers ordinary popular
web pages. In Figure 4, the first column records when a cell is
III. RELATED WORK captured. The timestamp unit is seconds. The time at which the
first cell is sent is 0.0. The second column indicates the
A. Optimal String Alignment Distance (OSAD)
direction of a cell. When a cell is sent from a victim (target) to
In 2013, Wang and Goldberg [4] conducted a fingerprinting a Tor node, it is represented as 1. When a cell is sent from a Tor
attack using OSAD. In their method, a sequence of Tor cells is node to a victim, it is represented as −1. This time sequence
treated as a string. If two instances of a cell string are captured starts when the web page begins loading and ends when the
for the same site, the distance between the two instances is last cell is sent.
small. If they are captured for two different sites, the distance
of the two instances is large. Wang and Goldberg used OSAD
in an algorithm to calculate the distance. 0.0 1
Wang and Goldberg used this distance as the kernel matrix 0.0 1
in a support vector machine (SVM). They defined the distance 0.116133928299 1
and the kernel by formulas (4) and (5), respectively. s1 and s2 0.499715805054 -1
are two strings, and the distance between s1 and s2 is 0.499715805054 -1
D(s1 , s2 ). 0.782404899597 -1
D(s1 , s2 ) 0.969846963882 -1
D′ (s1 , s2 ) = (4) 0.969846963882 -1
𝑚𝑖𝑛(|𝑠1 |, |𝑠2 |)
0.969846963882 -1
′ 2 0.969846963882 -1
K(s1 , s2 ) = 𝑒 −D (s1,s2 ) (5)
Fig. 4. Example of dataset.
When D′ = 0, two strings are equal, and K becomes one.
When the distance between two strings is large, K becomes We count the number of cells in a packet. Since the size of a
small. When D → ∞, the limit of K becomes zero. Therefore, cell is fixed at 512 bytes, the number of cells is counted by
we can use K as the kernel matrix of an SVM. Wang and dividing the packet length by 600. We use not 512 but 600
Goldberg used the one-against-one method in their SVM. This because we consider inter-cell headers and the overhead
method is used for multi-class classification by repeating [10] .Tor sends cells for flow control at regular intervals. Such
two-class classifications and by performing majority voting. a control cell is called a SENDME cell. SENDME cells are not
useful in fingerprinting attacks. We exclude SENDME cells
B. k-Nearest Neighbor Algorithm (k-NN) from the dataset.
In 2014, Wang et al. [5] proposed another fingerprinting
B. Proposed Method
attack using the k-nearest neighbor (k-NN) algorithm. In their
new method, they extract features from captured packets. First, an attacker (analyzer) collects training data for
 General features (total transmission size, total machine learning. The attacker accesses websites he or she
transmission time, and numbers of incoming and wants to monitor through Tor fingerprinting and then captures
outgoing packets) the traffic data repeatedly, e.g.,., 100 times. The attacker also
 Packet ordering collects traffic data from a large number of other websites. The
 Concentration of outgoing packets data is used for the open-world test. Since this paper uses
 Bursts Wang’s dataset, we can omit the data collection phase.
Some features are more meaningful than others. Then, they Next, the attacker extracts Tor cells from the captured data.
determine the weights of features. Finally, they classify test These are used as input to the autoencoder. Again, we can omit
data using the k-NN method with features and weights. this phase, because we use the same dataset as that in Wang’s
method. Tor cells are already extracted. Then, we sort out data
IV. NEW METHOD to create an input vector to the autoencoder. This paper uses
the direction of a cell as an element of an input vector. It is a
A. Dataset for Learning and Evaluation simple method. We do not use other features. It should be
This paper uses the same dataset as that of Wang [6] in our noted here that input vectors have a fixed length (dimension).
evaluation experiment. This dataset contains 100 sites as The original traffic data have a variety of lengths (dimensions)
monitored web sites and 9,000 sites as non-monitored sites. according to the traffic pattern. We truncate a sequence of cells
Monitored sites are used in the closed-world test. if the length is greater than 5,000. If the number of cells is less
Non-monitored sites are used in the open-world test. Each than 5,000, we put 0 as dummy padding to create a vector of
monitored site has 90 instances (cells), and each size 5,000. Figure 5 shows an example of input data
non-monitored site has one instance. Monitored sites consist of corresponding to Figure 4.
porn sites, Bit Torrent trackers’ sites, and sites that have

17
1, 1, 1, -1, -1, -1, -1, -1, -1, -1, -1, batch_size 200
-1, -1, 1, -1, 1, 1, ... , 0, 0, 0, 0, 0
Fig. 5. Example of input data 3) Results
We conducted a series of closed-world tests while changing
After preparing the data, the attacker conducts training using the number of learning sessions (max_epoch) every five times.
the SDAE. In our experiment, we specifically use a multilayer Values of max_epoch in each DAE and the output layer are the
perceptron (MLP) that has two layers of SDAEs and an output same. Figure 6 shows the accuracy of the closed-world tests.
layer realized by a softmax function. The parameters of the The highest accuracy of 0.88 is attained when the number of
SDAE and MLP will be shown in the next section (V). Before learning sessions is 50.
inputting the training data, we randomize the order of the
training vectors in the data set. If a batch has many similar
vectors, the efficiency of learning might be decreased.
The test data for evaluation is prepared similarly to the
training data.

V. EVALUATION
A. Environment
Table 1 shows the experimental environment. We use
Compute Unified Device Architecture (CUDA) [8] to
accelerate the training using a graphical processing unit (GPU).
Table 1 shows the machine specification, which includes a
Fig. 6. Relation between max_epoch and accuracy in closed world test.
GeForce GTX 750 Ti graphics card by NVIDIA.

TABLE 1
We also conduct a series of closed-world tests by changing
ENVIRONMENT OF EXPERIMENT the dimensions, nvis and nhid. We fix the max_epoch value as
OS Ubuntu 14.04.03 LTS 50. The results are shown in Table 4. There is no major change
CPU Intel Core i7-4790 in the accuracy by the dimension parameters of the hidden
RAM 32 GB layer of the SDAE. The maximum accuracy is attained when
GPU NVIDIA GeForce GTX 750 Ti
the nhid values of the first and second layers are 1,000 and 500,
respectively. When the nhid values are 500 for the first layer
B. Closed-World Test and 125 for the second, the results are similar.
1) Overview
TABLE 4
In the closed-world test, the dataset contains 100 monitored
RESULTS WHEN CHANGING NVIS AND NHID
sites, with each site containing 90 cell instances. Seventy-two (CLOSED-WORLD TEST)
instances are used for training data, and 18 instances for test Accuracy 1st layer
data. This closed-world test is a multi-class classification. We 250 500 750 1000
labeled monitored websites as class 0 to class 99. 2nd 125 86.4 88.1 86.9 86.9
layer 250 87.1 87.2 87.6 87.6
2) Layer
500 - 87.3 87.2 88.2
We used an MLP with two layers of SDAEs and with the 750 - - 87.9 87.3
output layer realized by a softmax function. Parameters of 1000 - - - 87.6
Pylearn2 are shows in Tables 2 and 3.
Nvis and nhid are the dimensions of the input and hidden 4) Execution time
layers of the Autoencoder, respectively. Learning rate is a Table 5 shows the execution time when max_epoch is set as
coefficient during the weight-training phase. 50. In this experiment, the autoencoder can use the weight that
is already learned. Then, the test time is very short. The
TABLE 2 training time is also short, because the autoencoder does not
PARAMETERS OF SDAE (CLOSED-WORLD TEST)
need to perform multiple layers of backpropagation.
Parameter First Layer Second Layer
nvis 5000 500
nhid 500 125 TABLE 5
learning_rate 0.001 0.001 EXECUTION TIME (CLOSED-WORLD TEST)
batch_size 50 50 Process Description Time [s]
Data Time to convert train data and test data to 124.4
TABLE 3 Transmission Pylearn2 format.
PARAMETERS OF MLP (CLOSED-WORLD TEST) Learning Time Time to train using 7200 train data. 163.0
Test Time Time to test 1800 Test data 3.0
Parameter MLP
Nvis 5000
n_classes 100 5) Three-layer SDAE
learning_rate 0.005 The above results are obtained for the closed-world test by

18
the two-layer SDAE. It is worthwhile investigating the Nhid 500 125
learrning_rate 0.001 0.001
performance of the three-layer SDAE.
batch_size 50 50
We conduct another closed-world test using a three-layer max_epoch 30 30
SDAE. Table 6 shows the parameters of the new SDAE. The
parameters of the MLP are the same as those in Table 3. TABLE 8
PARAMETERS OF MLP (OPEN-WORLD TEST)
TABLE 6 Parameter MLP
PARAMETERS OF THREE-LAYER SDAE (CLOSED-WORLD TEST) Nvis 5000
Parameter 1st Layer 2nd Layer 3rd Layer n_classes 101
nvis 5000 750 500 learning_rate 0.005
nhid 750 500 250 batch_size 200
learning_rate 0.001 0.001 0.001 max_epoch 50
batch_size 50 50 50
3) Results
Figure 7 shows the results while changing the learning We investigate the TPR, i.e., the rate at which monitored
intervals (max_epochs) every ten times. The maximum websites are classified correctly, and the FPR, i.e.,., the rate at
accuracy is 0.88, the same accuracy achieved by the two-layer which a non-monitored website is classified as a monitored
SDAE. However, when three layers are used, the convergence site. The TPR is shown in Figure 8, and the FPR is shown in
of learning becomes slow compared with the two-layer SDAE. Figure 9.
In Figure 8, when the number of training data instances of
non-monitored sites is larger, the TPR is lower. The maximum
TPR is 0.87, when the number of the training data of
non-monitored sites is 1,000, and the minimum TPR is 0.86,
when the number is 7,000. In Figure 9, in addition to the TPR,
when the number of training data instances of non-monitored
sites is larger, the FPR is lower. The minimum FPR is 0.02,
when the number of training data of non-monitored sites is
7,000.

Fig. 7. Relation between max_epoch and accuracy in closed-world test.

The learning and test times also become slow. For the
three-layer SDAE, when max_epoch is 50, the learning time
becomes 241.9 s and the testing time becomes 3.6 s.
C. Open-World Test
1) Overview
In the open-world test, we use the data not only of 100
monitored sites, but also those of 9,000 non-monitored sites. Fig. 8. Relation between number of training data of non-monitored sites and
The data of the monitored sites is divided into 72 instances for TPR in the open-world test.
training data and 18 instances for testing data. We use 1,800
instances of non-monitored sites as the testing data.
A non-monitored website never appears in the training data.
The victim can access a new website that the attacker does not
expect. We label monitored websites as classes 0 to 99 and all
the non-monitored websites as a single class 100.
2) Layer
In the open-world test, we use an MLP that has an input
layer with a dimension of 5,000 and a two-layer DAE. The
output layer is realized by a softmax function. The parameters
of Pylearn2 are showed in Tables 7 and 8.

TABLE 7
PARAMETERS OF SDAE (OPEN-WORLD TEST)
Parameter First Layer Second Layer Fig. 9. Relation between number of training data of non-monitored sites and
Nvis 5000 500 FPR in open-world test.

19
REFERENCES
There is a trade-off between TPR and FPR. It is better to have a [1] The Tor Project, “Tor Project: Anonymity Online,”
high value of TPR, while keeping the FPR value low. [Link] referred Jan. 20, 2016.
[2] Roger Dingledine, Nick Mathewson and Paul Syverson, “Tor: the
Comparison with Related Work second-generation onion router,” in Proceedings of the 13th USENIX
Wang et al. showed the results of the OSAD [4] and k-NN Security Symposium, 2004, pp. 303–320.
[3] the LISA lab, ”Welcome — Pylearn2 dev documentation,”
methods [5] using the same dataset. Table 9 shows the [Link] referred Jan. 20, 2016.
comparison with previously known methods and our method. [4] Tao Wang and Ian Goldber, “Improved Website Fingerprinting on Tor,”
In our proposed method, the accuracy in the closed-world WPES ’13 Proceedings of the 12th ACM workshop on Workshop on
privacy in the electronic society, 2013, pp. 201–212.
test is 0.88, slightly lower than those of OSAD and k-NN. In [5] Tao Wang, Xiang Cai, Rishab Nithyanand, Rob Johnson and Ian
the open-world test, our TPR (0.86) is higher than that of Goldberg, “Effective attacks and provable defenses for website
OSAD, and our FPR (0.02) is lower than that of OSAD. fingerprinting,” in 23th USENIX Security Symposium, 2014, pp. 143–
157.
However, our FPR is higher than that of the k-NN method. [6] Tao Wang, “Website Fingerprinting,”
[Link] referred Jan. 20, 2016.
TABLE 9 [7] Alexa, “Alexa - Actionable Analytics for the Web,”
COMPARISON WITH EXISTING METHODS [Link] referred Jan. 20, 2016.
Method Accuracy in Closed TPR in Open FPR in Open [8] NVIDIA, “Parallel Programming and Computing Platform
World Test World Test World Test — CUDA — NVIDIA — NVIDIA,”
Our Method 0.88 0.86 0.02 [Link] referred Jan. 20,
OSAD Method 0.90 0.83 0.06 2016.
k-NN Method 0.91 0.85 0.006 [9] Andriy Panchenko, Lukas Niessen, Andreas Zinnen and Thomas Engel,
“Website Fingerprinting in Onion Routing Based Anonymization
Networks,” in WPES ’11 Proceedings of 27 the 10th annual ACM
workshop on Privacy in the electronic society, 2011, pp. 103–114.
[10] Xiang Cai, Xin Cheng Zhang, Brijesh Joshi, and Rob Johnson,
VI. CONCLUSION “Touching from a distance: website fingerprinting attacks and defenses,”
in CCS ’12 Proceedings of the 2012 ACM conference on Computer and
A. Summary communications security, 2012, pp. 605–616.
[11] Hideki Asoh, Muneki Yasuda, Shiniti Maeda, Daisuke Okanohara,
Here we propose a new method for fingerprinting attacks on Takayuki Okatani, Yotaro Kubo and Danushka Bollegala, “Deep
Tor anonymity using SDAE. The input vector takes a very Learning,” Kindai kagaku sha, Tokyo, 2015.
simple form, with elements 1, −1, or 0. The evaluation results [12] Takayuki Okatani and Masaki Saito, “Deep Learning,” IPSJ SIG-CVIM:
show an accuracy of 0.88 in the closed-world test and TPR and Computer Vision and Image Media, 2013, pp.1–17.
FPR values of 0.86 and 0.02, respectively, in the open-world
test. It is the advantage of our method that we can realize a high
accuracy without selecting the features manually. Out method
is based on mechanical Deep Learning. Kota Abe Kota Abe received the B.S. degree in
Computer Science and Engineering from Waseda
This paper shows that deep-learning technology can be University in March, 2016. He is now a master
applied to fingerprinting attacks on Tor communications to student at Department of Computer Science and
have results comparable to those of existing technologies. Communications Engineering, Waseda University.
His research interest covers Cyber Security.
B. Future Research
It may be meaningful to combine our method with other
methods proposed in related work. For example, the output of
SDAE can be used as features in Wang’s method and used for
training by k-NN.
There are convolutional neural networks (CNNs) and Shigeki Goto Shigeki Goto is a professor at
Depart-ment of Computer Science and Engineering,
recurrent neural networks (RNNs) in deep learning. CNN has Waseda University, Japan. He received his B.S. and
been used in pattern recognition. RNN can handle time-series M.S. in Mathematics from the University of Tokyo.
data. It may be possible to improve the accuracy of our method Prior to becoming a professor at Waseda University,
he has worked for NTT for many years. He also
by applying other neural network technologies as well. earned a Ph.D in Information Engineering from the
University of Tokyo. He is the president of JPNIC.
ACKNOWLEDGEMENTS He is a member of ACM and IEEE, and he was a
trustee of Internet Society from 1994 to 1997.
A part of this work was supported by JSPS Grant-in-Aid for
Scientific Research B, Grant Number 16H02832.

20

You might also like