Tor PDF
Tor PDF
The core principle of Tor, "onion routing", was developed in the mid-1990s by the United States
Naval Research Laboratory employees, mathematician Paul Syverson, and computer scientists
Michael G. Reed and David Goldschlag, with the purpose of protecting U.S. intelligence
communications online. Onion routing was further developed by DARPA in 1997.
The pre-alpha version of Tor was released to the public in September of 2002 and the Tor project,
the company that maintains Tor, was established in 2006.
Here is a quote from the paper titled “Tor: The Second-Generation Onion Router” on what Tor is in a
nutshell:
The core principle of Tor is onion routing which is a technique for anonymous communication over a
public network. In onion routing messages are encapsulated in several layers of encryption,
analogous to encapsulation in the OSI 7 layer model.
Imagine an onion being sent through Tor. At each stage of the process, a new layer is added to the
onion. This is why it's called The Onion Routing Protocol, because it adds layers at each stage.
The resulting onion (fully encapsulated message) is then transmitted through a series of nodes in a
network (called onion routers) with each node peeling away a layer of the ‘onion’ and therefore
uncovering the data’s next destination. When the final layer is decrypted you get the plaintext of the
original message.
TOR isn't anonymous, but it is encrypted. Now we have a basic overview of TOR, let's start exploring
how each part of TOR works.
Clients choose a path through the network and build a circuit where each onion router in the path
knows the predecessor and the successor, but no other nodes in the circuit.
The original author (the question mark on the far left) remains anonymous, unless you're the first
path in the node as you know who sent you the packet.
No one knows what data is being sent until it reaches the last node in the path, who knows the data
but doesn't know who sent it. The second to last node in the path doesn't know what the data is,
only the last node in the path does.
This has led to attacks whereby large organisations with expansive resources create TOR servers
which aim to be the first and last onion routers in a path. If the organisation can do this, they get to
know who sent the data and what did they send, effectively breaking TOR.
Something important to note is that TOR is used for illegal purposes as well as legal. You won't find
many people on TOR using it for Netflix. They might be selling / buying drugs, or worse. Trading
Magic the Gathering Cards.
It's incredibly hard to do this without being physically close to the location of the organisations
servers, we'll explore this more later.
Each packet flows down the network in fixed-size cells. These cells have to be the same size so none
of the data going through the TOR network looks suspiciously big.
These cells are unwrapped by a symmetric key at each router and then the cell is relayed further
down the path. Let's detour into Symmetric-Key cryptography.
Symmetric-Key Encryption Algorithms, a detour
In Symmetric Key Encryption the same key is used for both Alice (sender) and Bob (receiver) as
opposed to public key encryption where these keys differ. You have 1 key that unlocks and locks
your front door. With public key cryptography, you have 2 keys. Famous examples of symmetric key
cryptography is AES and Caesar’s Cipher.
This makes it faster and easier to use than Public Key encryption, but this also causes two problems:
If a third party figures out the key, either by attacking the transfer of the key or some other method
they will be able to decrypt all communications encrypted with this key.
Symmetric Key encryption is also much less computationally expensive than public key encryption,
which is useful in a network which is inherently slow. I've written extensively about public key
cryptography and symmetric key cryptography here: public key encryption
TOR itself
“There is strength in numbers”
Tor needs a lot of users to create anonymity, if Tor was hard to use new users wouldn't adopt it so
quickly. Because new users won't adopt it, Tor becomes less anonymous. By this reasoning it is easy
to see that usability isn't just a design choice of Tor but a security requirement to make Tor more
secure.
If Tor isn't usable or designed nicely, it won't be used by many people. If it's not used by many
people, it's less anonymous.
Tor has had to make some design choices that may not improve security but improve usability with
the hopes that an improvement in usability is an improvement in security.
Tor is not secure against end to end attacks. An end to end attack is where an entity has control of
both the first and last node in a path, as stated earlier. This is a problem that cyber security experts
have yet to solve, so Tor does not have a solution to this problem.
Tor does not provide protocol-normalisation like Privoxy or the Anonymizer, meaning that If senders
want anonymity from responders while using complex and variable protocols like HTTP, Tor must be
layered with a filtering proxy such as Privoxy to hide differences between clients, and expunge
protocol features that leak identity.
In 2013 during the Final Exams period at Harvard a student tried to delay the exam by sending in a
fake bomb threat. The student used Tor and Guerrilla Mail (a service which allows people to make
disposable email addresses) to send the bomb threat to school officials.
The student was caught, even though he took precautions to make sure he wasn’t caught.
Gurillar mail sends an originating IP address header along with the email that’s sent so the receiver
knows where the original email came from. With Tor, the student expected the IP address to be
scrambled but the authorities knew it came from a Tor exit node (Tor keeps a list of all nodes in the
directory service) so the authorities simply looked for people who were accessing Tor (within the
university) at the time the email was sent.
Tor isn't an anonymising service, but it is a service that can encrypt all traffic from A to B (so long as
an end-end attack isn't performed). TOR is also incredibly slow, so using it for Netflix isn't a good use
case.
Now that we have a good handle on what TOR actually is, let's explore onion routing.
Given the network above, we are going to simulate what TOR does. Your computer is the one on the
far left, and you're sending a request to watch Stranger Things on Netflix (because what else is TOR
used for 😉). This path of nodes is called a circuit. Later on, we're going to look into how circuits are
made and how the encryption works. But for now we're trying to generalise how TOR works.
We start off with the message (we haven't sent it yet). We need to encrypt the message N times
(where N is how many nodes are in the path). We encrypt it using AES, a symmetric key crypto-
system. The key is agreed using Diffie-Hellman. Don't worry, we'll discuss all of this later. There is 4
nodes in the path (minus your computer and Netflix) so we encrypt the message 4 times.
Our packet (onion) has 4 layers. Blue, purple, orange, and teal. Each colour represents one layer of
encryption.
We send the onion to the first node in our path. That node then removes the first layer of
encryption.
Each node in the path knows what the key to decrypt their layer is (via Diffie-Hellman). Node 1
removes the blue layer with their symmetric key (that you both agreed on).
Node 1 knows you sent the message, but the message is still encrypted by 3 layers of encryption, it
has no idea what the message is.
As it travels down the path, more and more layers are stripped away. The next node does not know
who sent the packet. All it knows is that Node 1 sent them the packet, and it's to be delivered to
Node 3.
Now there's no way Amazon can find out you watch Netflix! Netflix sends back a part of Stranger
Things.
Node 4 adds its layer of encryption now. It doesn't know who originally made the request, all it
knows is that Node 3 sent the request to them so it sends the response message back to Node 3.
And so on for the next few nodes.
Now the packet is fully encrypted, the only one who still knows what the message contains is Node
4. The only one who knows who made the message is Node 1. Now that we have the fully encrypted
response back, we can use all the symmetric keys to decrypt it.
You might be thinking "I've seen snails 🐌 faster than this" and you would be right. This protocol
isn't designed for speed, but at the same time it has to care about speed.
The algorithm could be much slower, but much more secure (using entirely public key cryptography
instead of symmetric key cryptography) but the usability of the system matters. So yes, it's slow. No
it's not as slow as it could be. But it's all a balancing act here.
The encryption used is normally AES with the key being shared via Diffie-Hellman. I've written
another article about Diffie-Hellman here.
The paths TOR creates are called circuits. Let's explore how TOR chooses what nodes to use in a
circuit.
When TOR selects the exit node, it selects it following these principles:
Does the client's torrc (the configuration file of TOR) have settings about which exit nodes
not to choose?
TOR only chooses an exit relay which allows you to exit the TOR network. Some exit nodes
only allow web traffic (HTTP/S port 80) which is not useful when someone wants to send
email (SMTP port 25).
The exit node has to have the available capacity to support you. TOR tries to choose an exit
node which has enough resources available.
All paths in the circuit obey these rules:
We do not choose the same router twice for the same path.
If you choose the same node twice, it's guaranteed that the node will either be the guard node
(the node you enter at) or the exit node, both dangerous positions. There is a 2/3 chance of it
being both the guard and exit nodes, which is even more dangerous. We want to avoid the
entry / exit attacks.
• We do not choose any router in the same family as another in the same path. (Two routers
are in the same family if each one lists the other in the “family” entries of its descriptor.)
Operators who run more than 1 TOR node can choose to signify their nodes as 'family'. This means
that the nodes have all the same parent (the operator of their network). This is again a
countermeasure against the entry / exit attacks, although operators do not have to declare family if
they wish. If they want to become a guard node (discussed soon) it is recommended to declare
family, although not required.
Subnets define networks. IP addresses are made up of 8 octets of bits. As an example, Google's IP
address in binary is:
01000000.11101001.10101001.01101010
The first 16 bits (the /16 subnet) is 01000000.11101001 which means that TOR does not choose
any nodes which start with the same 16 bits as this IP address. Again, a counter-measure to the
entry / exit attacks.
• We don’t choose any non-running or non-valid router unless we have been configured to do
so. By default, we are configured to allow non-valid routers in “middle” and “rendezvous”
positions.
Non-running means the node currently isn't online. You don't want to pick things that aren't online.
Non-valid means that some configuration in the nodes torrc is wrong. You don't want to accept
strange configurations in case they are trying to hack or break something.
A guard node is a privileged node because it sees the real IP of the user. It’s ‘expensive’ to become a
guard node (maintain a high uptime for weeks and have good bandwidth).
This is possible for large companies who have 99.9% uptime and high bandwidth (such as Netflix).
TOR has no way to stop a powerful adversary from registering a load of guard nodes. Right now, TOR
is configured to stick with a single guard node for 12 weeks at a time, so you choose 4 new guard
nodes a year.
This means that if you use TOR once to watch Amazon Prime Video, it is relatively unlikely for Netflix
to be your guard node. Of course, the more guard nodes Netflix creates the more likely it is.
Although, if Netflix knows you are connecting to the TOR network to watch Amazon Prime Video
then they will have to wait 4 weeks for their suspicions to be confirmed, unless they attack the guard
node and take it over.
Becoming a guard node is relatively easy for a large organisation. Becoming the exit node is slightly
harder, but still possible. We have to assume that the large organisation has infinite computational
power to be able to do this. The solution is to make the attack highly expensive with a low rate of
success.
The more regular users of TOR, the harder is if for a large organisation to attack it. If Netflix controls
50/100 nodes in the network:
If suddenly 50 more normal user nodes join then that's 50/150, reducing the probability of Netflix
owning a guard node (and thus, a potential attack) and making it even more expensive.
There is strength in numbers within the TOR service.
When people talk about these websites they are talking about Tor Hidden Services.
These are a wild concept and honestly deserve an entire blogpost on their own. Hidden services are
servers, like any normal computer server.
Except in a Tor Hidden Service it is possible to communicate without the user and server
knowing who each other are.
The device (the question mark) knows that it wants to access Netflix, but it doesn't know anything
about the server and the server doesn't know anything about the device that's asked to access it.
This is quite confusing, but don't worry, I'm going to explain it all with cool diagrams. ✨
When a server is set up on Tor to act as a hidden service, the server sends a message to some
selected Onion Routers asking if they want to be an introduction point to the server. It is entirely up
to the server as to who gets chosen as an introduction point, although usually they ask 3 routers to
be their introduction points.
The introduction points know that they are going to be introducing people to the server.
The server will then create something called a hidden service descriptor which has a public key and
the IP address of each introduction point. It will then send this hidden service descriptor to a
distributed hash table which means that every onion router (not just the introduction points) will
hold some part of the information of the hidden service.
If you try to look up a hidden service the introduction point responsible for it will give you the full
hidden service descriptor, the address of the hidden service's introduction points.
The key for this hash table is the onion address and the onion address is derived from the public key
of the server.
The idea is that the onion address isn’t publicised over the whole Tor network but instead you find it
another way like from a friend telling you or on the internet (addresses ending in .onion).
The way that the distributed hash table is programmed means that the vast majority of the nodes
won't know what the descriptor is for a given key.
So almost every single onion router will have minimal knowledge about the hidden service unless
they explicitly want to find it.
Let's say someone gave you the onion address. You request the descriptor off the hash table and you
get back the services introduction points.
If you want to access an onion address you would first request the descriptor from the hash table
and the descriptor has, let’s say 4 or 5 IP addresses of introductory nodes. You pick one at random
let's say the top one.
You’re going to ask the introduction point to introduce you to the server and instead of making a
connection directly to the server you make a rendezvous point at random in the network from a
given set of Onion Routers.
You then make a circuit to that rendezvous point and you send a message to the rendezvous point
asking if it can introduce you to the server using the introduction point you just used. You then send
the rendezvous point a onetime password (in this example, let's use 'Labrador').
The rendezvous point makes a circuit to the introduction point and sends it the word 'Labrador' and
its IP address.
The introduction point sends the message to the server and the server can choose to accept it or do
nothing.
If the server accepts the message it will then create a circuit to the rendezvous point.
The server sends the rendezvous point a message. The rendezvous point looks at both messages
from your computer and the server. It says "well, I've received a message from this computer saying
it wants to connect with this service and I’ve also received a message from the service asking if it can
connect to a computer, therefore they must want to talk to each other".
The rendezvous point will then act as another hop on the circuit and connect them.
1. A hidden service calculates its key pair (private and public key, asymmetric encryption).
2. Then the hidden service picks some relays as its introduction points.
3. It tells its public key to those introduction points over Tor circuits.
4. After that the hidden-service creates a hidden service descriptor, containing its public key
and what its introduction points are.
5. The hidden service signs the hidden service descriptor with its private key.
6. It then uploads the hidden service descriptor to a distributed hash table (DHT).
7. Clients learn the .onion address from a hidden service out-of-band. (e.g. public website) (A
$hash.onion is a 16 character name derived from the service’s public key.)
8. After retrieving the .onion address the client connects to the DHT and asks for that $hash.
9. If it exists the client learns about the hidden service’s public key and its introduction points.
10. The client picks a relay at random to build a circuit to it, to tell it a one-time secret. The
picked relay acts as rendezvous point.
11. The client creates a introduce message, containing the address of the rendezvous point and
the one-time secret, before encrypting the message with the hidden service’s public key.
12. The client sends its message over a Tor circuit to one of the introduction points, demanding
it to be forwarded to the hidden service.
13. The hidden service decrypts the introduce message with its private key to learn about the
rendezvous point and the one-time secret.
14. The hidden service creates a rendezvous message, containing the one-time secret and sends
it over a circuit to the rendezvous point.
15. The rendezvous point tells the client that a connection was established.
16. Client and hidden service talk to each other over this rendezvous point. All traffic is end-to-
end encrypted and the rendezvous point just relays it back and forth. Note that each of
them, client and hidden service, build a circuit to the rendezvous point; at three hops per
circuit this makes six hops in total.