Transport Layer: Outline
Transport Layer: Outline
Congestion Control 6
Principles of congestion control
congestion:
v informally: “too many sources sending too much
data too fast for network to handle”
v different from flow control!
v manifestations:
§ lost packets (buffer overflow at routers)
§ long delays (queueing in router buffers)
v a top-10 problem!
Congestion Control 7
Congestion
Congestion
Router
Router’s buffer.
Congestion Control 8
Congestion Congestion
Ugh. I so
can’t deal
Trash with this right
now!
Router
Router’s buffer.
Congestion Control 10
Congestion Collapse
Congestion Collapse
… Link A Link B
…
…
Congestion Control 11
Congestion Collapse
Congestion Collapse
… Link A Link B
…
One sender starts,
but there’s still
capacity at link A.
…
S1
Congestion Control 12
Congestion Collapse
Congestion Collapse
S2
… Link A Link B
…
…
S1
Congestion Control 13
Congestion Collapse
Congestion Collapse
S2
… Unrelated traffic
passes through and
congests link B.
… Link A Link B
…
…
S1
Congestion Control 14
Congestion
Congestion Collapse
Collapse
S2’s traffic is being dropped at
S2 Link B, so it starts retransmitting
on top of what it was sending.
…
… Link A Link B
…
…
S1
(This is very bad. S2 is now sending lots of traffic over link A
that has no hope of crossing link B.)
Congestion Control 15
Congestion CollapseCollapse
Congestion
S2
… Link A Link B
…
Increased traffic from S2
causes Link A to become
congested. S1 starts
…
retransmitting.
S1
Congestion Control 16
Congestion CollapseCollapse
Congestion
S2
…
Congestion
Link A Link B
…
propagates
backwards…
…
S1
Congestion Control 17
Without congestion control
congestion:
v Increases delivery latency
v Increases loss rate
v Increases retransmissions, many unnecessary
v Wastes capacity of traffic that is never delivered
v Increases congestion, cycle continues …
Congestion Control 18
Cost of Congestion
packet
knee cliff
loss
Throughput
v Knee – point after which
§ Throughput increases slowly congestion
collapse
§ Delay increases fast
Load
Delay
v Cliff – point after which
§ Throughput starts to drop to zero
(congestion collapse)
§ Delay approaches infinity
Load
19
Congestion Control
Congestion Collapse
This happened to the Internet (then NSFnet) in
1986
v Rate dropped from a blazing 32 Kbps to 40bps
v This happened on and off for two years
v In 1988, Van Jacobson published “Congestion
Avoidance and Control”
v The fix: senders voluntarily limit sending rate
Congestion Control 20
Approaches towards congestion control
Congestion Control 21
Self Study
Congestion Control 22
Self Study
Case study: ATM ABR congestion control
Congestion Control 24
TCP’s Approach in a Nutshell
v TCP connection has window
§ Controls number of packets in flight
cwnd
rate ~
~ bytes/sec
RTT
last byte last byte
ACKed sent, not- sent
yet ACKed
(“in-
flight”)
Congestion Control 26
CWND
Congestion Control 27
Two Basic Questions
Congestion Control 28
Quiz: What is a “congestion event”
A: A segment loss
E: A + B + C Congestion Control 29
Quiz: How should we set CWND?
v Packet loss
§ Fail-safe signal that TCP already has to detect
§ Complication: non-congestive loss (checksum errors)
Congestion Control 31
Not All Losses the Same
v Duplicate ACKs: isolated loss
§ dup ACKs indicate network capable of delivering
some segments
Congestion Control 32
Rate Adjustment
v Basic structure:
§ Upon receipt of ACK (of new data): increase rate
§ Upon detection of loss: decrease rate
Congestion Control 33
Bandwidth Discovery with Slow Start (SS)
v Goal: estimate available bandwidth
§ start slow (for safety)
§ but ramp up quickly (for efficiency)
v Consider
§ RTT = 100ms, MSS=1000bytes
§ Window size to fill 1Mbps of BW = 12.5 packets
§ Window size to fill 1Gbps = 12,500 packets
§ Either is possible!
Congestion Control 34
TCP Slow Start
Host A Host B
v when connection begins,
increase rate
exponentially until first one segm
ent
loss event:
RTT
§ initially cwnd = 1 MSS two segm
ents
§ double cwnd every RTT
§ Simpler implementation
achieved by incrementing four segm
ents
cwnd for every ACK
received
v summary: initial rate is
slow but ramps up time
exponentially fast
Congestion Control 35
Adjusting to Varying Bandwidth
v Slow start gave an estimate of available bandwidth
Window Loss
Exponential t
“slow start”
Congestion Control 38
Slow-Start vs. AIMD
v When does a sender stop Slow-Start and
start Additive Increase?
v Events
§ ACK (new data)
§ dupACK (duplicate ACK for old data)
§ Timeout
Congestion Control 40
Event: ACK (new data)
v If CWND < ssthresh • CWND packets per RTT
§ CWND += 1 • Hence after one RTT
with no drops:
CWND = 2xCWND
Congestion Control 41
Event: ACK (new data)
v If CWND < ssthresh
Slow start phase
§ CWND += 1
v Else
§ CWND = CWND + 1/ “Congestion
CWND Avoidance” phase
(additive increase)
• CWND packets per RTT
• Hence after one RTT
with no drops:
CWND = CWND + 1
Congestion Control 42
Event: TimeOut
v On Timeout
§ ssthresh ß CWND/2
§ CWND ß 1
Congestion Control 43
Event: dupACK
v dupACKcount ++
Congestion Control 44
Example
Window
Timeout SSThresh
Fast
Retransmission Set to Here
Congestion Control 46
Example (window in units of MSS, not bytes)
v Consider a TCP connection with:
§ CWND=10 packets (of size MSS, which is 100 bytes)
§ Last ACK was for byte # 101
• i.e., receiver expecting next packet to have seq. no. 101
Congestion Control 47
Timeline
v ACK 101 (due to 201) cwnd=10 dupACK#1 (no xmit)
v ACK 101 (due to 301) cwnd=10 dupACK#2 (no xmit)
v ACK 101 (due to 401) cwnd=10 dupACK#3 (no xmit)
v RETRANSMIT 101 ssthresh=5 cwnd= 5
v ACK 101 (due to 501) cwnd=5 + 1/5 (no xmit)
v ACK 101 (due to 601) cwnd=5 + 2/5 (no xmit)
v ACK 101 (due to 701) cwnd=5 + 3/5 (no xmit)
v ACK 101 (due to 801) cwnd=5 + 4/5 (no xmit)
v ACK 101 (due to 901) cwnd=5 + 5/5 (no xmit)
v ACK 101 (due to 1001) cwnd=6 + 1/5 (no xmit)
v ACK 1101 (due to 101) ç only now can we transmit new packets
v Plus no packets in flight so ACK “clocking” (to increase CWND)
stalls for another RTT
Congestion Control 48
Solution: Fast Recovery
Idea: Grant the sender temporary “credit” for each dupACK
so as to keep packets in flight
v If dupACKcount = 3
§ ssthresh = cwnd/2
§ cwnd = ssthresh + 3
Congestion Control 49
Example
v Consider a TCP connection with:
§ CWND=10 packets (of size MSS = 100 bytes)
§ Last ACK was for byte # 101
• i.e., receiver expecting next packet to have seq. no.
101
Congestion Control 50
Timeline
v ACK 101 (due to 201) cwnd=10 dup#1
v ACK 101 (due to 301) cwnd=10 dup#2
v ACK 101 (due to 401) cwnd=10 dup#3
v REXMIT 101 ssthresh=5 cwnd= 8 (5+3)
v ACK 101 (due to 501) cwnd= 9 (no xmit)
v ACK 101 (due to 601) cwnd=10 (no xmit)
v ACK 101 (due to 701) cwnd=11 (xmit 1101)
v ACK 101 (due to 801) cwnd=12 (xmit 1201)
v ACK 101 (due to 901) cwnd=13 (xmit 1301)
v ACK 101 (due to 1001) cwnd=14 (xmit 1401)
v ACK 1101 (due to 101) cwnd = 5 (xmit 1501) ç exiting fast recovery
v Packets 1101-1401 already in flight
v ACK 1201 (due to 1101) cwnd = 5 + 1/5 ß back in congestion avoidance
Congestion Control 51
Summary: TCP Congestion Control
New
New ACK!
ACK!
duplicate ACK
dupACKcount++ new ACK
new ACK
.
cwnd = cwnd + MSS (MSS/cwnd)
dupACKcount = 0
cwnd = cwnd+MSS transmit new segment(s), as allowed
dupACKcount = 0
Λ transmit new segment(s), as allowed
cwnd = 1 MSS
ssthresh = 64 KB cwnd > ssthresh
dupACKcount = 0 slow Λ congestion
start timeout avoidance
ssthresh = cwnd/2
cwnd = 1 MSS duplicate ACK
timeout dupACKcount = 0 dupACKcount++
ssthresh = cwnd/2 retransmit missing segment
cwnd = 1 MSS
dupACKcount = 0
retransmit missing segment
timeout New
ACK!
ssthresh = cwnd/2
cwnd = 1 New ACK
dupACKcount = 0
retransmit missing segment cwnd = ssthresh dupACKcount == 3
dupACKcount == 3 dupACKcount = 0
ssthresh= cwnd/2 ssthresh= cwnd/2
cwnd = ssthresh + 3 cwnd = ssthresh + 3
retransmit missing segment retransmit missing segment
fast
recovery
duplicate ACK
cwnd = cwnd + MSS
transmit new segment(s), as allowed
Congestion Control 52
TCP Flavours
v TCP-Tahoe
§ cwnd =1 on triple dup ACK & timeout
v TCP-Reno
§ cwnd =1 on timeout
§ cwnd = cwnd/2 on triple dup ACK
v TCP-newReno
§ TCP-Reno + improved fast recovery
v TCP-SACK (NOT COVERED IN THE COURSE)
§ incorporates selective acknowledgements
Congestion Control 53
TCP/Reno: Big Picture
CongWin
TD TD TD
TO
threshold
threshold threshold
threshold
Time
slow congestion congestion congestion slow congestion
start avoidance avoidance avoidance start avoidance
Congestion Control 54
Sample Problem on Congestion
Window Evolution
http://gaia.cs.umass.edu/kurose_ross/interactive/tcp_evolution.php
Congestion Control 55
TCP Fairness
fairness goal: if K TCP sessions share same
bottleneck link of bandwidth R, each should have
average rate of R/K
TCP connection 1
bottleneck
router
capacity R
TCP connection 2
Congestion Control 56
Why AIMD?
v Some rate adjustment options: Every RTT, we can
§ Multiplicative increase or decrease: CWND→ a*CWND
§ Additive increase or decrease: CWND→ CWND + b
v Four alternatives:
§ AIAD: gentle increase, gentle decrease
§ AIMD: gentle increase, drastic decrease
§ MIAD: drastic increase, gentle decrease
§ MIMD: drastic increase and decrease
Congestion Control 57
Simple Model of Congestion Control
1 Efficiency Fairness
line line
v Two users
§ rates x1 and x2
Congestion Control 58
Example
1 fairness
Efficient: x1+x2=1 line
Fair
Congested: x1+x2=1.2
User 2: x2
(0.2, 0.5) (0.7, 0.5)
(0.5, 0.5)
User 1: x1 1
Congestion Control 59
AIAD
fairness
(x1-aD+aI), line
v Increase: x + aI x2-aD+aI))
v Decrease: x - aD
v Does not (x1,x2)
converge to
User 2: x2
fairness
(x1-aD,x2-aD)
efficiency
line
User 1: x1
Congestion Control 60
AIAD Sharing Dynamics
x1
A x2
B
D E
60
50
40
30
20
10
0
1
28
55
82
109
136
163
190
217
244
271
298
325
352
379
406
433
460
487
61
Congestion Control 61
MIMD
fairness
v Increase: x*bI line
(x1,x2)
v Decrease: x*bD (bIbDx1,
v Does not bIbDx2)
converge to
User 2: x2
fairness
(bdx1,bdx2)
efficiency
line
User 1: x1
Congestion Control 62
AIMD
fairness
line
v Increase: x+aI (x1,x2)
(bDx1+aI,
v Decrease: x*bD bDx2+aI)
v Converges to
User 2: x2
fairness
(bDx1,bDx2)
efficiency
line
User 1: x1
Congestion Control 63
AIMD Sharing Dynamics
A x1 B
x2 50 pkts/sec
D E
60
50
20
10
0
1
28
55
82
109
136
163
190
217
244
271
298
325
352
379
406
433
460
487
64
Congestion Control 64
TCP AIMD
two competing sessions:
v additive increase gives slope of 1, as throughout increases
v multiplicative decrease decreases throughput proportionally
Connection 1 throughput R
Congestion Control 65
A Simple Model for TCP Throughput
cwnd Timeouts
Wmax
A
2
1
t
RTT
cwnd Timeouts
Wmax
Wmax
A
2
t
3 2
Packet drop rate, p = 1 / A, where A = Wmax
8
A 3 1
Throughput, B = =
! Wmax $ 2 RTT p
# & RTT
" 2 % Congestion Control 67
Implications (1): Different RTTs
3 1
Throughput =
2 RTT p
A1 B1
100ms
200ms
bottleneck
A2 link B2
Congestion Control 68
Implications (2): High Speed TCP
3 1
Throughput =
2 RTT p
Congestion Control 69
Adapting TCP to High Speed
v Other approaches?
§ Multiple simultaneous connections (hack but works
today)
§ Router-assisted approaches (will see shortly)
Congestion Control 70
Implications (3): Rate-based CC
3 1
Throughput =
2 RTT p
v TCP throughput is “choppy”
§ repeated swings between W/2 to W
Congestion Control 71
Implications (4): Loss not due to congestion?
Congestion Control 72
Implications: (5) How do short flows fare?
Congestion Control 73
Implications: (6) TCP fills up queues à long
delays
Congestion Control 74
Implications: (7) Cheating
v Three easy ways to cheat
§ Increasing CWND faster than +1 MSS per RTT
Congestion Control 75
Increasing CWND Faster
C y
x increases by 2 per RTT
y increases by 1 per RTT
Limit rates:
x = 2y
x
Congestion Control 76
Implications: (7) Cheating
v Three easy ways to cheat
§ Increasing CWND faster than +1 MSS per RTT
§ Opening many connections
Congestion Control 77
Open Many Connections
x
A B
y
D E
Assume
• A starts 10 connections to B
• D starts 1 connection to E
• Each connection gets about the same throughput
Congestion Control 78
Implications: (7) Cheating
v Three easy ways to cheat
§ Increasing CWND faster than +1 MSS per RTT
§ Opening many connections
§ Using large initial CWND
Congestion Control 79
Implications: (8) CC intertwined
with reliability
l Mechanisms for CC and reliability are tightly coupled
l CWND adjusted based on ACKs and timeouts
l Cumulative ACKs and fast retransmit/recovery rules
l Complicates evolution
l Consider changing from cumulative to selective ACKs
l A failure of modularity, not layering
Network Layer 83
Network Layer: outline
4.1 introduction 4.5 routing algorithms
4.2 virtual circuit and § link state
datagram networks § distance vector
4.3 what’s inside a router § hierarchical routing
4.4 IP: Internet Protocol 4.6 routing in the Internet
§ datagram format § RIP
§ IPv4 addressing § OSPF
§ BGP
§ ICMP
§ IPv6 4.7 broadcast and multicast
routing
Network Layer 84
Some background
v 1968: DARPAnet/ARPAnet (precursor to Internet)
§ (Defense) Advanced Research Projects Agency Network
Network Layer 85
Internet Protocol Stack
Email
• Application: Email, Web, … Web
Internetworking
• Transport: TCP, UDP, … UDP
TCP
v
• Network:
Cerf & Kahn in 1974, IP IP
• Link:
§ “A Protocol for Packet Network
Ethernet, WiFi, ATM, … Ethernet
Intercommunication”
• Physical: copper, fiber, air, … ATM
§ Foundation for the modern Internet
…
segments to transport
physical physical
network
data link
layer
physical
application
network transport
D: Do both in advance
Network Layer 89
Interplay between routing and forwarding
routing algorithm
routing algorithm determines
end-end-path through network
value in arriving
packet’s header
0111
1
3 2
Network Layer 90
Connection setup
v 3rd important function in some network
architectures:
§ ATM, frame relay, X.25
v before datagrams flow, two end hosts and
intervening routers establish virtual
connection
§ routers get involved
v network vs transport layer connection
service:
§ network: between two hosts (may also involve
intervening routers in case of VCs)
§ transport: between two processes Network Layer 91
Self Study
Network Layer 92
Self Study
Network Layer 93
Network Layer: outline
4.1 introduction 4.5 routing algorithms
4.2 virtual circuit and § link state
datagram networks § distance vector
4.3 what’s inside a router § hierarchical routing
4.4 IP: Internet Protocol 4.6 routing in the Internet
§ datagram format § RIP
§ IPv4 addressing § OSPF
§ BGP
§ ICMP
§ IPv6 4.7 broadcast and multicast
routing
Network Layer 94
Connection, connection-less service
v datagram network provides network-layer
connectionless service
v virtual-circuit network provides network-
layer connection service
v analogous to TCP/UDP connecton-
oriented / connectionless transport-layer
services, but:
§ service: host-to-host
§ no choice: network provides one or the other
§ implementation: in network core
Network Layer 95
Virtual circuits
“source-to-dest path behaves much like telephone
circuit”
§ performance-wise
§ network actions along source-to-dest path
v call setup, teardown for each call before data can flow
v each packet carries VC identifier (not destination host
address)
v every router on source-dest path maintains “state” for
each passing connection
v link, router resources (bandwidth, buffers) may be
allocated to VC (dedicated resources = predictable
service)
Network Layer 96
VC implementation
a VC consists of:
1. path from source to destination
2. VC numbers, one number for each link along
path
3. entries in forwarding tables in routers along path
v packet belonging to VC carries VC
number (rather than dest address)
v VC number can be changed on each link.
§ new VC number comes from forwarding table
Network Layer 97
VC forwarding table 22 32
12 Host B
R1 22 R2
Host A 12 32
1 3
2
VC number
interface
forwarding table in number
Router R1:
Incoming interface Incoming VC # Outgoing interface Outgoing VC #
1 12 3 22
2 63 1 18
3 7 2 17
1 97 3 87
… … … …
application application
5. data flow begins 6. receive data
transport transport
4. call connected 3. accept call
network network
1. initiate call 2. incoming call
data link data link
physical physical
Network Layer 99
IN PRACTICE Not on Exam
application application
transport transport
network 1. send datagrams 2. receive datagrams network
data link data link
physical physical
IP destination address in
arriving packet’s header
1
3 2
otherwise 3
examples:
DA: 11001000 00010111 00010110 10100001 which interface?
DA: 11001000 00010111 00011000 10101010 which interface?
Network Layer 104
Datagram or VC network: why?
Internet (datagram) ATM (VC)
v data exchange among v evolved from telephony
computers v human conversation:
§ “elastic” service, no strict § strict timing, reliability
timing req. requirements
§ need for guaranteed service
v many link types v “dumb” end systems
§ different characteristics § telephones
§ uniform service difficult § complexity inside network
v “smart” end systems
(computers)
§ can adapt, perform control,
error recovery
§ simple inside network,
complexity at “edge”
A. TCP
B. Internet
C. Virtual circuit network
D. UDP
E. A and C
Prefix Interface
1* A
11* B
111* C
Default D
link layer
physical layer
v Designing interfaces
§ What task are you trying to perform?
§ What information do you need to accomplish it?
3-bit
16-bit Identification Flags 13-bit Fragment Offset
8-bit Time to
Live (TTL) 8-bit Protocol 16-bit Header Checksum
Payload
3-bit
16-bit Identification Flags 13-bit Fragment Offset
8-bit Time to
Live (TTL) 8-bit Protocol 16-bit Header Checksum
Payload
3-bit
16-bit Identification Flags 13-bit Fragment Offset
8-bit Time to
Live (TTL) 8-bit Protocol 16-bit Header Checksum
Payload
3-bit
16-bit Identification Flags 13-bit Fragment Offset
8-bit Time to
Live (TTL) 8-bit Protocol 16-bit Header Checksum
Payload
3-bit
16-bit Identification Flags 13-bit Fragment Offset
8-bit Time to
Live (TTL) 8-bit Protocol 16-bit Header Checksum
Payload
v Protocol (8 bits)
§ Identifies the higher-level protocol
§ Important for demultiplexing at receiving host
L3 Network IP
v Protocol (8 bits)
§ Identifies the higher-level protocol
§ Important for demultiplexing at receiving host
v Most common examples
§ E.g., “6” for the Transmission Control Protocol (TCP)
§ E.g., “17” for the User Datagram Protocol (UDP)
protocol=6 protocol=17
IP header IP header
TCP header UDP header
v Loop: TTL
3-bit
16-bit Identification Flags 13-bit Fragment Offset
8-bit Time to
Live (TTL) 8-bit Protocol 16-bit Header Checksum
Payload
…
frame in: one large datagram
§ different link types, out: 3 smaller datagrams
different MTUs
v large IP datagram divided
(“fragmented”) within net reassembly
§ one datagram becomes
several datagrams
§ “reassembled” only at
final destination …
§ IP header bits used to
identify, order related
fragments
Network Layer 125
IP fragmentation, reassembly
length ID fragflag offset
example: =4000 =x =0 =0
Applet:
http://media.pearsoncmg.com/aw/aw_kurose_network_2/applets/ip/ipfragmentation.html
Network Layer 126
Quiz: How can we use this for evil?
A: Send fragments that overlap.
3-bit
16-bit Identification Flags 13-bit Fragment Offset
8-bit Time to
Live (TTL) 8-bit Protocol 16-bit Header Checksum
Payload