Transport Layer - Part 3 - TCP
Transport Layer - Part 3 - TCP
3-1
1
TCP: Overview RFCs: 793,1122,1323, 2018, 2581
2
TCP segment structure
3-3
3
TCP seq. numbers, ACKs
outgoing segment from sender
sequence numbers: source port # dest port #
sequence number
byte stream “number” of acknowledgement number
3-4
4
Sequence number example
In this example, the sequence number for Segment 1 is 1000 because it contains the first byte
("A") in the byte stream.
The sequence number for Segment 2 is 1003 because it starts with the third byte ("D") in the byte
stream, and so on.
These sequence numbers are crucial for the receiver to correctly reassemble the data in
the correct order, and they are also used for flow control, error recovery, and other aspects of
reliable data transmission in TCP.
5
Acknowledgements example
Suppose there are two devices communicating using TCP, a sender (S) and a receiver (R). The sender
has a byte stream:
Byte Stream: ABCDEFGHIJKLMNOPQRSTUVWXYZ
Segment 1 (S to R): The sender divides the data into segments and sends the first segment:
Segment 1: Sequence Number 1000, containing "ABC"
Acknowledgment 1 (R to S): The receiver receives Segment 1 and sends an acknowledgment (ACK) back to the sender. In the ACK:
The ACK number is 1003, indicating the next expected sequence number.
The acknowledgment is cumulative, meaning it acknowledges all bytes up to but not including the byte with sequence number 1003.
This ACK indicates to the sender that the receiver has successfully received the bytes up to "C" (inclusive) in the byte stream. The sender can now
continue sending the next data segment.
Segment 2 (S to R): The sender sends the next segment:
Segment 2: Sequence Number 1003, containing "DEF"
Acknowledgment 2 (R to S): The receiver receives Segment 2 and sends another ACK:
The ACK number is 1006, indicating the next expected sequence number.
It's still a cumulative acknowledgment, so it acknowledges all bytes up to but not including the byte with sequence number 1006.
This ACK indicates that the receiver has successfully received the bytes up to "F" (inclusive) in the byte
stream. The process continues for subsequent segments. In this way, the receiver uses ACKs to inform
the sender of the sequence number of the next expected byte and to confirm the receipt of all bytes up to
a specific point in the byte stream. This ensures reliable and ordered data delivery in TCP.
6
Summary of Seq# & ACK#
Sequence numbers are used to keep Acknowledgments (ACKs) are a
track of the bytes in a data stream critical part of the protocol's
being sent over a network reliability mechanism. An
connection. Each byte of data in a acknowledgment in TCP serves to
TCP segment is assigned a unique acknowledge the receipt of data and
sequence number, which is essentially inform the sender about the next
the "number" of the first byte in the sequence number expected from the
segment's data within the byte other side.
stream. This sequence number allows In Cumulative ACK, when the
TCP to order and reassemble the receiver acknowledges a specific
data correctly at the receiving end. sequence number, it implies that all
bytes with lower sequence numbers
have been received successfully.
7
TCP seq. numbers, ACKs
Host A Host B
User
types
‘C ’ Seq=42, ACK=79, data = ‘C’
host ACKs
receipt of
‘C’, echoes
Seq=79, ACK=43, data = ‘C’ back ‘C’
host ACKs
receipt
of echoed
‘C ’ Seq=43, ACK=80
We assume the starting sequence numbers are 42 and 79 for the client and server, respectively. Recall that the
sequence number of a segment is the sequence number of the first byte in the data field. Thus, the first segment sent
from the client will have sequence number 42; the first segment sent from the server will have sequence number 79.
Recall that the acknowledgment number is the sequence number of the next byte of data that the host is waiting for.
After the TCP connection is established but before any data is sent, the client is waiting for byte 79 and the server is
waiting for byte 42.
As shown in Figure, three segments are sent. The first segment is sent from the client to the server, containing the 1-
byte ASCII representation of the letter ‘C’
in its data field. This first segment also has 42 in its sequence number field. Also, because the client has not yet
received any data from the server, this first segment will have 79 in its acknowledgment number field.
The second segment is sent from the server to the client. It serves a dual purpose. First it provides an
acknowledgment of the data the server has received. By
putting 43 in the acknowledgment field, the server is telling the client that it has successfully received everything up
through byte 42 and is now waiting for bytes 43 onward. The second purpose of this segment is to echo back the
letter ‘C.’ Thus, the second segment has the ASCII representation of ‘C’ in its data field. This second segment has
the sequence number 79, the initial sequence number of the server-to- client data flow of this TCP connection, as
this is the very first byte of data that the server is sending. Note that the acknowledgment for client-to-server data is
carriedin a segment carrying server-to-client data; this acknowledgment is said to be piggybacked on the server-to-
client data segment.
The third segment is sent from the client to the server. Its sole purpose is to acknowledge the data it has received
from the server. (Recall that the second seg-
8
ment contained data—the letter ‘C’—from the server to the client.) This segment has an empty data
field (that is, the acknowledgment is not being piggybacked with any client-to-server data). The
segment has 80 in the acknowledgment number field because the client has received the stream of
bytes up through byte sequence number 79 and it is now waiting for bytes 80 onward. Y
8
Consider the figure below in which a TCP sender and receiver communicate over a connection in
which the sender->receiver segments may be lost. The TCP sender sends an initial window of 3
segments. Suppose the initial value of the sender->receiver sequence number is 307 and the first 3
segments each contain 476 bytes. The delay between the sender and receiver is 7 time units, and so
the first segment arrives at the receiver at t=8. As shown in the figure below, 1 of the 3 segment(s)
are lost between the segment and receiver.
1. 307, 783,1259
2. 783,X,783
9
Consider the figure below in which a TCP sender and receiver communicate over a connection in
which the sender->receiver segments may be lost. The TCP sender sends an initial window of 4
segments. Suppose the initial value of the sender->receiver sequence number is 455 and the first 4
segments each contain 493 bytes. The delay between the sender and receiver is 7 time units, and so
the first segment arrives at the receiver at t=8. As shown in the figure below, 2 of the 4 segment(s)
are lost between the segment and receiver.
1. 455,948,1441,1934
2. 948 , X, X,
10
TCP round trip time, timeout
Q: how to set TCP Q: how to estimate RTT?
timeout value? SampleRTT: measured
time from segment
longer than RTT transmission until ACK
but RTT varies receipt
too short: premature ignore retransmissions
timeout, unnecessary SampleRTT will vary, want
retransmissions estimated RTT “smoother”
average several recent
too long: slow reaction measurements, not just
to segment loss current SampleRTT
3-11
The TCP timeout value determines how long the sender waits for an
acknowledgment (ACK) before retransmitting a segment. It's crucial to strike a
balance between being too short and too long
11
TCP round trip time, timeout
This is how TCP re-computes the estimated RTT each time a new SampleRTT is taken.
EstimatedRTT = (1- )*EstimatedRTT + *SampleRTT
exponential weighted moving average
represents the influence of the most recent measurements on the estimated
RTT where its noticed that the effect of the past sample decreases exponentially
fast. typical value: = 0.125
RTT: gaia.cs.umass.edu to fantasia.eurecom.fr
350
300
250
RTT (milliseconds)
measured RTTs between a host in the
Massachusetts
200
EstimatedRTT: This is an estimate of the RTT between the sender and receiver.
It is continuously updated based on the measured SampleRTT values.
12
TCP round trip time, timeout
Given this value of the estimated RTT, TCP computes
* Check out the online interactive exercises for more examples: http://gaia.cs.umass.edu/kurose_ross/interactive/
Given this value of the estimated RTT, TCP computes the timeout interval to be
the estimated RTT plus a “safety margin”. DevRTT: This represents the
deviation or variance in RTT values. It helps account for variations in
SampleRTT.
And the intuition is that if we are seeing a large variation in SAMPLERTT – the
RTT estimates are fluctuating a lot - then we’ll want a larger safety margin
So TCP computes the Timeout interval to be the Estimated RTT plus 4 times a
measure of deviation in the RTT.
The deviation in the RTT is computed as the eWMA of the difference between
the most recently measured SampleRTT from the Estimated RTT
13
Suppose that TCP's current estimated values for the round trip time (estimatedRTT) and deviation
in the RTT (DevRTT) are 230 msec and 25 msec, respectively. Suppose that the next three
measured values of the RTT are 390 msec, 280 msec, and 290 msec respectively.
1. What is the estimatedRTT after the first RTT? EstimatedRTT₁ = 0.875 * EstimatedRTT₀ + 0.125 * SampleRTT₁
RTT Deviation₁ = 0.75 * DevRTT₀ + 0.25 * |SampleRTT₁ - EstimatedRTT₁|
2. What is the RTT Deviation for the the first RTT?
3. What is the TCP timeout for the first RTT? TCP Timeout₁ = EstimatedRTT₁ + 4 * RTT Deviation₁
4. What is the estimatedRTT after the second RTT?
5. What is the RTT Deviation for the the second RTT?
6. What is the TCP timeout for the second RTT?
7. What is the estimatedRTT after the third RTT?
8. What is the RTT Deviation for the the third RTT?
9. What is the TCP timeout for the third RTT?
14
Suppose that the five measured SampleRTT values are 106 ms, 120 ms,
140 ms, 90 ms, and 115 ms. Compute the EstimatedRTT after each of
these SampleRTT values is obtained, using a value of a = 0.125 and
assuming that the value of EstimatedRTT was 100 ms just before the first
of these five samples were obtained. Compute also the DevRTT after each
sample is obtained, assuming a value of ẞ = 0.25 and assuming the value
of DevRTT was 5 ms just before the first of these five samples was
obtained. Last, compute the TCP Timeout Interval after each of these
samples is obtained.
3-16
16
TCP reliable data transfer
TCP creates rdt service
on top of IP’s unreliable
service
pipelined segments
cumulative acks let’s initially consider
single retransmission simplified TCP sender:
timer ignore duplicate acks
retransmissions ignore flow control,
triggered by: congestion control
timeout events
duplicate acks
3-17
17
TCP sender events:
data rcvd from app: timeout:
create segment with retransmit segment
seq # that caused timeout
seq # is byte-stream restart timer
number of first data ack rcvd:
byte in segment if ack acknowledges
start timer if not previously unacked
already running segments
think of timer as for update what is known
oldest unacked to be ACKed
segment
start timer if there are
expiration interval: still unacked segments
TimeOutInterval
3-18
These events and actions are part of TCP's congestion control and reliability mechanisms. They ensure that data is
18
reliably delivered, retransmitted when necessary, and that the sender adjusts its behavior in
response to network conditions and acknowledgments from the receiver. This combination of
mechanisms allows TCP to provide robust and reliable communication over potentially unreliable
network links.
18
TCP: retransmission scenarios
Host A Host B Host A Host B
SendBase=92
Seq=92, 8 bytes of data Seq=92, 8 bytes of data
timeout
ACK=100
X
ACK=100
ACK=120
19
TCP: retransmission scenarios
Host A Host B
Seq=92, 8 bytes of data the first ACK is lost but the second ACK, a
cumulative ACK arrives at the sender,
Seq=100, 20 bytes of data
which then can transmit a third segment,
ACK=100 knowing that the first two have arrived,
timeout
cumulative ACK
3-20
20
TCP ACK generation [RFC 1122, RFC 2581]
3-21
•When an in-order segment with the expected sequence number arrives, and all data up to the expected
sequence number has already been acknowledged, the receiver may delay sending an acknowledgment
(ACK).
•The receiver waits for a short period (up to 500 milliseconds) for the possible arrival of the next segment.
If no further segment arrives during this time, the receiver sends an ACK for the received segment.
•If an in-order segment with the expected sequence number arrives, and there is another segment with an
ACK pending (i.e., awaiting acknowledgment), the receiver sends an immediate single cumulative ACK.
•The cumulative ACK acknowledges both the in-order segment and the previously pending segment,
reducing the number of ACKs sent.
•When an out-of-order segment arrives with a sequence number higher than expected, indicating a gap in
the received data, the receiver sends an immediate duplicate ACK.
•The duplicate ACK indicates the sequence number of the next expected byte. This lets the sender know
that there is a gap in the received data.
•If a subsequent segment arrives that partially or completely fills the gap identified by the out-of-order
segment, the receiver sends an immediate ACK.
•This ACK acknowledges the received data and provides feedback to the sender about the updated
reception status.
21
These receiver actions are part of TCP's flow and congestion control mechanisms. They
help in optimizing the communication by minimizing the number of ACKs sent, reducing
network congestion, and ensuring that the sender receives feedback about the status of its
transmitted data. The receiver's behavior helps maintain the reliability and efficiency of
TCP connections.
21
In this TCP lost ACK scenario, the first two messages get to the receiver and the 1st
message's ACK is dropped. However, the 2nd message's ACK gets through. Answer the
following questions:
a. What is the ACK at a?
b. What is the ACK at b?
c. What is the sequence number at c?
1. 122
2. 142
3. 142
22
Host A and B are communicating over a TCP connection, and Host B has already received from
A all bytes up through byte 126. Suppose Host A then sends two segments to Host B back-to-
back. The first and second segments contain 80 and 40 bytes of data, respectively. In the first
segment, the sequence number is 127, the source port number is 302, and the destination
port number is 80. Host B sends an acknowledgment whenever it receives a segment from
Host A.
a. In the second segment sent from Host A to B, what are the sequence number, source port
number, and destination port number?
b. If the first segment arrives before the second segment, in the acknowledgment of the first
arriving segment, what is the acknowledgment number, the source port number, and the
destination port number?
c. If the second segment arrives before the first segment, in the acknowledgment of the first
arriving segment, what is the acknowledgment number?
d. Suppose the two segments sent by A arrive in order at B. The first acknowledgment is lost
and the second acknowledgment arrives after the first time- out interval. Draw a timing
diagram, showing these segments and all other segments and acknowledgments sent.
(Assume there is no additional packet loss.) For each segment in your figure, provide the
sequence number and the number of bytes of data; for each acknowledgment that you add,
provide the acknowledgment number.
Given data:
•Host A and B are communicating over a TCP connection, and Host B has
already received from A all bytes up through byte 126.
•The first and second segments contain 80 and 40 bytes of data, respectively.
•The first segment of sequence number is 127.
•The source port number is 302.
•The destination port number is 80
a) Sequence number = first segment of sequence number+ destination port
number =127+80 =207
So, sequence number=207
Source port number = 302
Destination port number= 80
b)Acknowledgement number= 207
Source port number = 80
Destination port number= 302
c) Acknowledgement number=127
23
TCP fast retransmit
time-out period often
relatively long: TCP fast retransmit
long delay before if sender receives 3
resending lost packet ACKs for same data
detect lost segments ((““triple
triple duplicate
duplicate ACKs
ACKs””),),
via duplicate ACKs. resend unacked
sender often sends segment with smallest
many segments back- seq #
to-back
likely that unacked
if segment is lost, there segment lost, so don’t
will likely be many wait for timeout
duplicate ACKs.
3-24
24
TCP fast retransmit
Host A Host B
25
Chapter 3 outline
3.1 transport-layer 3.5 connection-oriented
services transport: TCP
3.2 multiplexing and segment structure
demultiplexing reliable data transfer
3.3 connectionless flow control
transport: UDP connection management
3.4 principles of reliable 3.6 principles of congestion
data transfer control
3.7 TCP congestion control
3-26
26
TCP flow control
application
application may process
remove data from application
TCP socket buffers ….
TCP socket OS
receiver buffers
… slower than TCP
receiver is delivering
(sender is sending) TCP
code
IP
flow control code
receiver controls sender, so
sender won’t overflow
receiver’s buffer by transmitting from sender
too much, too fast
receiver protocol stack
3-27
27
TCP flow control to application process
TCP flow control is a critical mechanism that ensures that the sender does not overwhelm the receiver by
sending data too quickly. The receiver communicates its available buffer space to the sender through the
use of a "receive window" (rwnd) value included in TCP header fields in acknowledgment (ACK)
segments.
The receiver has a designated amount of buffer space available to hold incoming data, which is
determined by its receive buffer size.
The receiver maintains a buffer, often referred to as the "Receive Buffer" or "RcvBuffer," to store
incoming data.
The receiver advertises its available buffer space to the sender by including an rwnd value in the TCP
header of acknowledgment segments.
•The sender, upon receiving an acknowledgment (ACK) from the receiver, examines the rwnd value in
the ACK segment to determine the available buffer space at the receiver.
•The sender then limits the amount of unacknowledged ("in-flight") data it sends to the receiver based on
the receiver's rwnd value.
•The sender ensures that the amount of data in transit (in-flight data) does not exceed the receiver's
advertised receive window size.
28
Chapter 3 outline
3.1 transport-layer 3.5 connection-oriented
services transport: TCP
3.2 multiplexing and segment structure
demultiplexing reliable data transfer
3.3 connectionless flow control
transport: UDP connection management
3.4 principles of reliable 3.6 principles of congestion
data transfer control
3.7 TCP congestion control
3-29
29
Connection Management
before exchanging data, sender/receiver “handshake”:
agree to establish connection (each knowing the other willing
to establish connection)
agree on connection parameters
application application
network network
30
Agreeing to establish a connection
2-way handshake:
Q: will 2-way handshake
always work in
Let’s talk
network?
ESTAB variable delays
OK
ESTAB retransmitted messages
(e.g. req_conn(x)) due to
message loss
message reordering
choose x
req_conn(x)
can’t “see” other side
ESTAB
acc_conn(x)
ESTAB
3-31
31
Agreeing to establish a connection
2-way handshake failure scenarios:
choose x choose x
req_conn(x) req_conn(x)
ESTAB ESTAB
retransmit acc_conn(x) retransmit acc_conn(x)
req_conn(x) req_conn(x)
ESTAB ESTAB
data(x+1) accept
req_conn(x)
retransmit data(x+1)
data(x+1)
connection connection
client x completes server x completes server
client
terminates forgets x terminates forgets x
req_conn(x)
ESTAB ESTAB
data(x+1) accept
half open connection! data(x+1)
(no client!) dup data
accepted!
3-32
32
TCP 3-way handshake
3-33
33
TCP: closing a connection
client, server each close their side of connection
send TCP segment with FIN bit = 1
respond to received FIN with ACK
on receiving FIN, ACK can be combined with own
FIN
simultaneous FIN exchanges can be handled
3-34
34
TCP: closing a connection
client state server state
ESTAB ESTAB
clientSocket.close()
LAST_ACK
FINbit=1, seq=y
TIMED_WAIT can no longer
send data
ACKbit=1; ACKnum=y+1
timed wait
for 2*max CLOSED
segment lifetime
CLOSED
3-35
35
• The connection is now in a "TIME_WAIT" state for a short duration to ensure that any
delayed segments from the network are handled properly.
• After the "TIME_WAIT" period expires, the connection is considered fully closed and can be
reused for new connections if needed.
35
Chapter 3 outline
3.1 transport-layer 3.5 connection-oriented
services transport: TCP
3.2 multiplexing and segment structure
demultiplexing reliable data transfer
3.3 connectionless flow control
transport: UDP connection management
3.4 principles of reliable 3.6 principles of congestion
data transfer control
3.7 TCP congestion control
3-36
36
Principles of congestion control
congestion:
informally: “too many sources sending too much
data too fast for network to handle”
different from flow control!
manifestations:
lost packets (buffer overflow at routers)
long delays (queueing in router buffers)
a top-10 problem!
3-37
37
Causes/costs of congestion: scenario 1
original data: lin throughput: lout
two senders, two
receivers Host A
R/2
delay
lout
Without any mechanisms to manage congestion or prioritize traffic, the network experiences increasing
delays, reduced throughput, and potential packet loss as the arrival rate of packets approaches the link
capacity. This scenario demonstrates the importance of congestion control mechanisms and Quality of
Service (QoS) policies to manage and prioritize traffic in networks.
The rate at which Host A offers traffic to the router in this first scenario is thus λin bytes/sec. Host B
operates in a similar manner, and we assume for simplicity that it too is sending at a rate of λin bytes/sec.
Packets from Hosts A and B pass through a router and over a shared outgoing link of capacity R. The router
has buffers that allow it to store incoming packets when the packet-arrival rate exceeds the outgoing link’s
capacity.
The performance of Host A’s connection under this first scenario. The left graph plots the per-connection
throughput (number of bytes per second at the receiver) as a function of the connection-sending rate. For a
sending rate between 0 and R/2, the throughput at the receiver equals the sender’s sending
rate—everything sent by the sender is received at the receiver with a finite delay. When the sending rate is
above R/2, however, the throughput is only R/2. This upper limit on throughput is a consequence of the
sharing of link capacity between two connections. The link simply cannot deliver packets to a receiver at a
steady-state rate that exceeds R/2. No matter how high Hosts A and B set their sending rates, they will each
never see a throughput higher than R/2.
The right-hand graph, shows the consequence of operating near link capacity. As the sending rate
approaches R/2 (from the left), the average delay becomes larger and larger. When the sending rate
exceeds R/2, the average number of queued packets in the router is unbounded, and the average delay
38
between source and destination becomes infinite
38
Causes/costs of congestion: scenario 2
one router, finite buffers
sender retransmission of timed-out packet
application-layer input = application-layer output: lin =
lout
transport-layer input includes retransmissions : l‘in lin
Host A
39
Causes/costs of congestion: scenario 2
R/2
idealization: perfect
knowledge
lout
sender sends only when
router buffers available
lin R/2
Consider the unrealistic case that Host A is able to somehow determine whether
or not a buffer is free in the router and thus sends a packet only when a buffer is
free. In this case, no loss would occur, in would be equal to λin , and the
throughput of the connection λout would be equal to λin .
40
Causes/costs of congestion: scenario 2
Idealization: known loss
packets can be lost,
dropped at router due
to full buffers
sender only resends if
packet known to be lost
A
no buffer space!
Host B
3-41
41
Causes/costs of congestion: scenario 2
Idealization: known loss
packets can be lost,
dropped at router due
to full buffers
sender only resends if
packet known to be lost
A
free buffer space!
Host B
3-42
Consider next the slightly more realistic case that the sender retransmits only
when a packet is known for certain to be lost.
Then if the offered load, λin (the rate of original data transmission plus
retransmissions), equals R/2, the rate at which data are delivered to the receiver
application is lesser than R/2..
42
Causes/costs of congestion: scenario 2
Realistic: duplicates
packets can be lost, dropped
at router due to full buffers
sender times out prematurely,
sending two copies, both of
which are delivered
lin
timeout
copy l'in lout
A
free buffer space!
Host B
3-43
The case that the sender may time out prematurely and retransmit a packet that
has been delayed in the queue but not yet lost. In this case, both the original data
packet and the retransmission may reach the receiver.
43
Causes/costs of congestion: scenario 2
Realistic: duplicates
packets can be lost, dropped
at router due to full buffers
sender times out prematurely,
sending two copies, both of
which are delivered
“costs” of congestion:
more work (retrans) for given “goodput”
unneeded retransmissions: link carries multiple copies of pkt
decreasing goodput
3-44
44
Causes/costs of congestion: scenario 3
four senders Q: what happens as lin and lin’
increase ?
multihop paths
A: as red lin’ increases, all arriving
timeout/retransmit blue pkts at upper queue are
dropped, blue throughput g 0
Host A
lin : original data lout
Host B
l'in: original data, plus
retransmitted data
finite shared output
link buffers
Host D
Host C
3-45
Consider the connection from Host A to Host C, passing through routers R1 and
R2. The A–C connection shares router R1 with the D–B connection and
shares router R2 with the B–D connection. For extremely small values of λin ,
buffer overflows are rare and the throughput approximately equals the offered
load. For slightly larger values of λin , the corresponding throughput is also
larger, since more original data is being transmitted into the network and
delivered to the destination, and overflows are still rare.
Consider router R2. The A–C traffic arriving to router R2 (being forwarded from
R1) can have an arrival rate at R2 that is at most R, the capacity of the link from
R1 to R2, regardless of the value of λin . If λin is extremely large for all
connections (including the B–D connection), then the arrival rate of B–D traffic
at R2 can be much larger than that of the A–C traffic. Because the A–C and B–D
traffic must compete at router R2 for the limited amount of buffer space, the
amount of A–C traffic that successfully gets through R2 (that is, is not lost due to
buffer overflow) becomes smaller and smaller as the offered load from B–D gets
larger and larger. In the limit, as the offered load approaches infinity, an empty
buffer at R2 is immediately filled by a B–D packet, and the throughput of the A–
C connection at R2 goes to zero. This, in turn, implies that the A–C end-to-end
throughput goes to zero in the limit of heavy traffic.
45
Causes/costs of congestion: scenario 3
C/2
lout
lin’ C/2
3-46
The reason for the eventual decrease in throughput with increasing offered load is
evident when one considers the amount of wasted work done by the net-
work. In the high-traffic scenario outlined above, whenever a packet is dropped at
a second-hop router, the work done by the first-hop router in forwarding a
packet to the second-hop router ends up being “wasted.”
46
Causes/costs of congestion: insights
R/2
throughput: lout
lin R/2
delay
R/2
lin R/2
lout
loss/retransmission decreases effective
throughput:
throughput
lin R/2 R/2
throughput: lout
effective throughput
R/2
lin
lout
buffering wasted for packets lost
downstream lin’ R/2
47
Approaches towards congestion control
3-48
48
Case study: ATM ABR congestion control
3-49
49
Case study: ATM ABR congestion control
With ATM ABR service, data cells are transmitted from a source to a destination
through a series of intermediate switches. Interspersed with the data cells are
resource-management cells (RM cells); these RM cells can be used to convey
congestion-related information among the hosts and switches. When an RM cell
arrives at a destination, it will be turned around and sent back to the sender
(possibly after the destination has modified the contents of the RM cell). It is also
possible for a switch to generate an RM cell itself and send this RM cell directly
to a source. RM cells can thus be used to provide both direct network feedback
and network feedback via the receiver.
A congested network switch can set the EFCI bit in a data cell to 1 to signal
congestion to the destination host. RM cells have a congestion indication (CI) bit
and a no increase (NI) bit that can be set by a Switch.
Each RM cell also contains a 2-byte explicit rate (ER) field. A congested switch
may lower the value contained in the ER field in a passing RM cell.
50
Chapter 3 outline
3.1 transport-layer 3.5 connection-oriented
services transport: TCP
3.2 multiplexing and segment structure
demultiplexing reliable data transfer
3.3 connectionless flow control
transport: UDP connection management
3.4 principles of reliable 3.6 principles of congestion
data transfer control
3.7 TCP congestion control
3-51
51
TCP congestion control: additive increase
multiplicative decrease
approach: sender increases transmission rate (window
size), probing for usable bandwidth, until loss occurs
additive increase: increase cwnd by 1 MSS every
RTT until loss detected
multiplicative decrease: cut cwnd in half after loss
additively increase window size …
…. until loss occurs (then cut window in half)
congestion window size
cwnd: TCP sender
time
3-52
AIMD is a mechanism to help TCP manage its sending rate to find an optimal
balance between utilizing available bandwidth and avoiding network congestion.
52
TCP AIMD: more
Multiplicative decrease detail: sending rate is
Cut in half on loss detected by triple duplicate ACK (TCP Reno)
Cut to 1 MSS (maximum segment size) when loss detected by
timeout (TCP Tahoe)
Why AIMD?
AIMD – a distributed, asynchronous algorithm – has been
shown to:
• optimize congested flow rates network wide!
• have desirable stability properties
53
TCP Congestion Control: details
sender sequence number space
cwnd TCP sending rate:
roughly: send cwnd
bytes, wait RTT for
last byte last byte
ACKS, then send
ACKed sent, not-
yet ACKed
sent more bytes
(“in-
flight”) cwnd
sender limits transmission: rate ~
~
RTT
bytes/sec
TCP regulates its sending rate based on the current network conditions and
congestion levels.
LastByteSent: This represents the sequence number of the last byte sent by the
sender.
LastByteAcked: This represents the sequence number of the last byte that has
been acknowledged by the receiver.
cwnd (Congestion Window) determines the maximum number of
unacknowledged bytes (segments) that can be in flight at any given time.
54
TCP Slow Start
Host A Host B
when connection begins,
increase rate
exponentially until first
loss event:
RTT
initially cwnd = 1 MSS
double cwnd every RTT
done by incrementing
cwnd for every ACK
received
summary: initial rate is
slow but ramps up
exponentially fast time
3-55
55
TCP: detecting, reacting to loss
loss indicated by timeout:
cwnd set to 1 MSS;
window then grows exponentially (as in slow start)
to threshold, then grows linearly
loss indicated by 3 duplicate ACKs: TCP RENO
dup ACKs indicate network capable of delivering
some segments
cwnd is cut in half window then grows linearly
56
57
TCP Tahoe and TCP Reno are two congestion control algorithms.
TCP Tahoe:It uses a simple
approach known as "slow start" and
"congestion avoidance" to manage
network congestion.
Slow start - It begins by sending a small
number of packets and doubles the
sending rate with each successful
acknowledgment. This phase continues
until a predefined congestion
threshold (ssthresh) is reached.
Congestion avoidance phase - it
increases the sending rate linearly by
only one packet per round trip time.
results in inefficient use of network
resources and can lead to poor
performance
58
TCP Reno: uses “Slow start &fast
recovery“ and “congestion avoidance”
to handle congestion.
Slow Start & Fast Recovery - It begins by
sending a small number of packets and
doubles the sending When a packet is
lost, it reduces the sending rate slightly
and enters the fast recovery state.
Congestion avoidance -, TCP Reno increases
the sending rate linearly by only one packet
per round trip time.
avoids unnecessary rate reductions, leading to
improved network efficiency by balancing
between responsiveness to congestion and
efficient use of available bandwidth.
59
TCP: switching from slow start to CA
Q: when should the
exponential
increase switch to
linear?
A: when cwnd gets
to 1/2 of its value
before timeout.
Implementation:
variable ssthresh
on loss event, ssthresh
is set to 1/2 of cwnd just
before loss event
3-60
60
Slow start, exponential increase Congestion avoidance, additive increase
61
TCP Congestion policy summary
Congestion Policy in TCP
1.Slow Start Phase: Starts
slow increment is exponential
to the threshold.
2.Congestion Avoidance
Phase: After reaching the
threshold increment is by 1.
After this, The sender goes
back to the Slow start phase or
the Congestion avoidance
phase.
62
Assuming TCP Reno is the protocol experiencing the behaviour shown below, Answer the
following:
a. In the Slow Start state, cwnd is initialized to a small value of 1MSS (Maximum Segment Size => the maximum
amount of data that can be grabbed and placed in a segment) and rapidly increased by a value of 1MSS for each
transmission round, i.e. cwnd = cwnd + MSS, and sets the duplicate ACK count AKA dupACKcount = 0. This results
in an effective doubling of the sending rate exponentially. In this figure, this is happening between the first & sixth
transmission rounds. It also kicks back into Slow Start at the 23 transmission round, setting MSS = 1 and increments
up to the end of the data, at transmission round 26.
b. In the Congestion Avoidance state, the system assumes congestion, indicated when the ssthresh (slow start threshold)
is equal to cwnd. ssthresh should always be half the value of cwnd. However, once cwnd >= ssthresh, congestion
avoidance starts, at which point cwnd is increased at a more conservative, linear rate of MSS bytes, i.e. MSS/cwnd,
or by the statement cwnd = cwnd + [MSS·(MSS/cwnd)]. This is between transmission rounds 6 & 16. and and [17,22]
c. After the 16th tranmission round, cwnd is decreased from ~42MSS to ~24MSS (0.5 cwnd + 3 MSS), ~half of the
cwnd after the event in question. If it was a timeout event, it would start at the Slow Start state at 1MSS and increase
exponentially. However, this is clearly a triple duplicate ACK event as it halves the value of cmd and increases in a
linear fashion. If there was a timeout, the congestion window size would have dropped to 1.
d. After the 22nd transmission round, cwnd is reduced to the initial state of cwnd, i.e. cwnd is set to 1MSS and increased
exponentially, indicating that the causal event is a timeout event.
e. The threshold is initially 32, since it is at this window size that slow start stops and congestion avoidance begins
f. The threshold is set to half the value of the congestion window when packet loss is detected. When loss is detected
during transmission round 16, the congestion windows size is 42. Hence the threshold is 21 during the 18th
transmission round.
g. The threshold is set to half the value of the congestion window when packet loss is detected. When loss is detected
during transmission round 22, the congestion windows size is 29. Hence the threshold is 14 (taking lower floor of
14.5) during the 24th transmission round.
h. During the 1st transmission round, packet 1 is sent; packet 2-‐3 are sent in the 2nd transmission round; packets 4-‐7 are
sent in the 3rd transmission round; packets 8-‐15 are sent in the 4th transmission round; packets 16-‐31 are sent in the
5th transmission round; packets 32-‐63 are sent in the 6th transmission round; packets 64 – 96 are sent in the 7th
transmission round. Thus packet 70 is sent in the 7th transmission round.
63
i. The threshold will be set to half the current value of the congestion window (8) when the loss
occurred and congestion window will be set to the new threshold value . Thus the new values
of the threshold and window will be 4.
j. Threshold would be 21, and congestion window size 1.
63
Consider sending a large file from a host to another over a TCP connection that
has no loss.
a. Suppose TCP uses AIMD for its congestion control without slow start.
Assuming cwnd increases by 1 MSS every time a batch of ACKs is received and
assuming approximately constant round-trip times, how long does it take for
cwnd increase from 6 MSS to 12 MSS (assuming no loss events)?
b. What is the average throughout (in terms of MSS and RTT) for this
connection up through time = 6 RTT?
3-64
a)
Given data:
Assuming cwnd increases by 1 MSS every time a batch of ACKs is received and
assuming approximately constant round-trip times.
Then transmission rate of TCP is =w bytes/RTT
cwnd increases by 1 MSS if every batch of ACKs received.
The below steps are take for cwnd to increase from 6 MSS to 12 MSS:
•1 RTTs to to 7 MSS.
•2 RTTs to 8 MSS.
•3 RTTs to 9 MSS.
•4 RTTs to 10 MSS.
•5 RTTs to 11MSS.
•6 RTTs to 12 MSS.
b)
Given data:
Connection up through time = 6 RTT
Average throughout (in terms of MSS and RTT) =(6+7+8+9+10+11)/6
=8.5 MSS/RTT
64
Host A sends a file consisting of 9 MSS-sized segments to a host B using TCP. Assume
that the 4th segment in the transmission is lost. Assume the retransmission timeout is
T, the one-way latency is d, and that T > 4*d. Ignore the transmission time of the
segments and of the acknowledgements. Also, assume the TCP three-way handshake
has completed, but no data has been transmitted.
Assume no fast retransmission or fast recovery. Draw the time diagram showing each
segment and acknowledgement until the entire file is transferred. Indicate on the
diagram all changes in the cwnd and ssthresh. How long does it take to transfer the
file?
NOTE:
• For Fast Recovery, assume that each duplicate acknowledgment increases cwnd by 1.
• For Fast Recovery, assume that, upon receiving a non-duplicate acknowledgment, cwnd drops back to ssthresh.
• If the value of cwnd is fractional, you should round it to the closest larger integer.
• The transfer time is the time interval measure at source A from the time the first segment is sent until the acknowledgement of
the last segment is received
3-65
65
Consider the figure below, which plots the evolution of TCP's congestion window at the beginning of each time unit
(where the unit of time is equal to the RTT). In the abstract model for this problem, TCP sends a "flight" of packets
of size cwnd at the beginning of each time unit. The result of sending that flight of packets is that either (i) all packets
are ACKed at the end of the time unit, (ii) there is a timeout for the first packet, or (iii) there is a triple duplicate
ACK for the first packet. In this problem, you are asked to reconstruct the sequence of events (ACKs, losses) that
resulted in the evolution of TCP's cwnd shown below.
1. Give the times at which TCP is in slow start.
2. Give the times at which TCP is in congestion avoidance.
3. Give the times at which TCP is in fast recovery.
4. Give the times at which packets are lost via timeout.
5. Give the times at which packets are lost via triple ACK.
6. Give the times at which the value of ssthresh changes.
66
TCP throughput
avg. TCP thruput as function of window size, RTT?
ignore slow start, assume always data to send
W: window size (measured in bytes) where loss occurs
avg. window size (# in-flight bytes) is ¾ W
avg. thruput is 3/4W per RTT
3 W
avg TCP thruput = bytes/sec
4 RTT
W/2
3-67
if you want to express the throughput in bits per second (bps), you can convert
bytes to bits by multiplying by 8 (since there are 8 bits in 1 byte):
Average Throughput (in bps) = (3/4) * W * 8 bits per byte
67
TCP Futures: TCP over “long, fat pipes”
3-68
68
TCP Fairness
fairness goal: if K TCP sessions share same
bottleneck link of bandwidth R, each should have
average rate of R/K
TCP connection 1
bottleneck
router
capacity R
TCP connection 2
3-69
69
Why is TCP fair?
two competing sessions:
additive increase gives slope of 1, as throughout increases
multiplicative decrease decreases throughput proportionally
Connection 1 throughput R
3-70
70
Fairness (more)
Fairness and UDP Fairness, parallel TCP
multimedia apps often connections
do not use TCP application can open
do not want rate multiple parallel
throttled by congestion connections between two
control
hosts
instead use UDP:
web browsers do this
send audio/video at
constant rate, tolerate e.g., link of rate R with 9
packet loss existing connections:
new app asks for 1 TCP, gets rate
R/10
new app asks for 11 TCPs, gets R/2
3-71
71