0% found this document useful (0 votes)
23 views

Transport Layer: Outline

Uploaded by

rishabh mishra
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views

Transport Layer: Outline

Uploaded by

rishabh mishra
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 126

Transport Layer: Outline

3.1 transport-layer 3.5 connection-oriented


services transport: TCP
3.2 multiplexing and § segment structure
demultiplexing § reliable data transfer
3.3 connectionless § flow control
transport: UDP § connection management
3.4 principles of reliable 3.6 principles of congestion
data transfer control
3.7 TCP congestion control

Congestion Control 6
Principles of congestion control
congestion:
v informally: “too many sources sending too much
data too fast for network to handle”
v different from flow control!
v manifestations:
§ lost packets (buffer overflow at routers)
§ long delays (queueing in router buffers)
v a top-10 problem!

Congestion Control 7
Congestion
Congestion

Router

Router’s buffer.

Incoming rate is faster than


outgoing link can support.

Congestion Control 8
Congestion Congestion
Ugh. I so
can’t deal
Trash with this right
now!
Router

Router’s buffer.

Incoming rate is faster than


outgoing link can support.
Congestion Control 9
Quiz: What’s the worst that can happen?

A: This is no problem. Senders just keep


transmitting, and it’ll all work out.

B: There will be retransmissions, but the network


will still perform without much trouble.

C: Retransmissions will become very frequent,


causing a serious loss of efficiency

D: The network will become completely unusable

Congestion Control 10
Congestion Collapse
Congestion Collapse

… Link A Link B

Congestion Control 11
Congestion Collapse
Congestion Collapse

… Link A Link B

One sender starts,
but there’s still
capacity at link A.

S1

Congestion Control 12
Congestion Collapse
Congestion Collapse
S2

Another sender starts


up. Link A is showing

slight delay, but still
doing ok.

… Link A Link B


S1
Congestion Control 13
Congestion Collapse
Congestion Collapse
S2

… Unrelated traffic
passes through and
congests link B.

… Link A Link B


S1
Congestion Control 14
Congestion
Congestion Collapse
Collapse
S2’s traffic is being dropped at
S2 Link B, so it starts retransmitting
on top of what it was sending.

… Link A Link B


S1
(This is very bad. S2 is now sending lots of traffic over link A
that has no hope of crossing link B.)
Congestion Control 15
Congestion CollapseCollapse
Congestion
S2

… Link A Link B

Increased traffic from S2
causes Link A to become
congested. S1 starts

retransmitting.
S1
Congestion Control 16
Congestion CollapseCollapse
Congestion
S2


Congestion
Link A Link B

propagates
backwards…

S1
Congestion Control 17
Without congestion control
congestion:
v Increases delivery latency
v Increases loss rate
v Increases retransmissions, many unnecessary
v Wastes capacity of traffic that is never delivered
v Increases congestion, cycle continues …

Congestion Control 18
Cost of Congestion
packet
knee cliff
loss

Throughput
v Knee – point after which
§ Throughput increases slowly congestion
collapse
§ Delay increases fast

Load

Delay
v Cliff – point after which
§ Throughput starts to drop to zero
(congestion collapse)
§ Delay approaches infinity

Load
19
Congestion Control
Congestion Collapse
This happened to the Internet (then NSFnet) in
1986
v Rate dropped from a blazing 32 Kbps to 40bps
v This happened on and off for two years
v In 1988, Van Jacobson published “Congestion
Avoidance and Control”
v The fix: senders voluntarily limit sending rate

Congestion Control 20
Approaches towards congestion control

two broad approaches towards congestion control:

end-end congestion network-assisted


control: congestion control:
v no explicit feedback v routers provide
from network feedback to end systems
v congestion inferred § single bit indicating
from end-system congestion (SNA,
observed loss, delay DECbit, TCP/IP ECN,
v approach taken by ATM)
TCP § explicit rate for
sender to send at

Congestion Control 21
Self Study

Case study: ATM ABR congestion control

ABR: available bit rate: RM (resource management)


v “elastic service” cells:
v if sender’s path v sent by sender, interspersed
“underloaded”: with data cells
§ sender should use v bits in RM cell set by switches
available bandwidth (“network-assisted”)
v if sender’s path § NI bit: no increase in rate
congested: (mild congestion)
§ sender throttled to § CI bit: congestion
minimum guaranteed indication
rate v RM cells returned to sender
by receiver, with bits intact

Congestion Control 22
Self Study
Case study: ATM ABR congestion control

RM cell data cell

v two-byte ER (explicit rate) field in RM cell


§ congested switch may lower ER value in cell
§ senders’ send rate thus max supportable rate on path
v EFCI bit in data cells: set to 1 in congested switch
§ if data cell preceding RM cell has EFCI set, receiver sets
CI bit in returned RM cell
Congestion Control 23
Transport Layer: Outline
3.1 transport-layer 3.5 connection-oriented
services transport: TCP
3.2 multiplexing and § segment structure
demultiplexing § reliable data transfer
3.3 connectionless § flow control
transport: UDP § connection management
3.4 principles of reliable 3.6 principles of congestion
data transfer control
3.7 TCP congestion control

Congestion Control 24
TCP’s Approach in a Nutshell
v TCP connection has window
§ Controls number of packets in flight

v TCP sending rate:


§ roughly: send cwnd bytes, wait RTT for ACKS, then
send more bytes sender sequence number space
cwnd

cwnd
rate ~
~ bytes/sec
RTT
last byte last byte
ACKed sent, not- sent
yet ACKed
(“in-
flight”)

v Vary window size to control sending rate


Congestion Control 25
All These Windows…

v Congestion Window: CWND


§ How many bytes can be sent without overflowing routers
§ Computed by the sender using congestion control algorithm

v Flow control window: AdvertisedWindow (RWND)


§ How many bytes can be sent without overflowing receiver’s buffers
§ Determined by the receiver and reported to the sender

v Sender-side window = minimum{CWND,


RWND}
• Assume for this lecture that RWND >> CWND

Congestion Control 26
CWND

v This lecture will talk about CWND in units of


MSS
§ (Recall MSS: Maximum Segment Size, the amount of
payload data in a TCP packet)
§ This is only for pedagogical purposes

v Keep in mind that real implementations


maintain CWND in bytes

Congestion Control 27
Two Basic Questions

v How does the sender detect congestion?

v How does the sender adjust its sending rate?

Congestion Control 28
Quiz: What is a “congestion event”
A: A segment loss

B: Receiving duplicate acknowledgement (s)

C: A retransmission timeout firing

D: Some subset of A, B and C (what is the


subset?)

E: A + B + C Congestion Control 29
Quiz: How should we set CWND?

A: We should keep raising it until a “congestion event”


then back off slightly until we notice no more events.

B: We should raise it until a “congestion event”, then


go back to 1 and start raising it again

C: We should raise it until a “congestion event” then


go back to median value and start raising it again.

D: We should sent as fast as possible at all times.


Congestion Control 30
Detecting Congestion
v Packet delays
§ Tricky: can be noisy as delay often varies considerably

v Routers tell end hosts when they’re congested


§ Requires explicit feedback

v Packet loss
§ Fail-safe signal that TCP already has to detect
§ Complication: non-congestive loss (checksum errors)

Congestion Control 31
Not All Losses the Same
v Duplicate ACKs: isolated loss
§ dup ACKs indicate network capable of delivering
some segments

v Timeout: much more serious


§ Not enough dup ACKs
§ Must have suffered several losses

v Will adjust rate differently for each case

Congestion Control 32
Rate Adjustment

v Basic structure:
§ Upon receipt of ACK (of new data): increase rate
§ Upon detection of loss: decrease rate

v How we increase/decrease the rate depends on


the phase of congestion control we’re in:
§ Discovering available bottleneck bandwidth vs.
§ Adjusting to bandwidth variations

Congestion Control 33
Bandwidth Discovery with Slow Start (SS)
v Goal: estimate available bandwidth
§ start slow (for safety)
§ but ramp up quickly (for efficiency)

v Consider
§ RTT = 100ms, MSS=1000bytes
§ Window size to fill 1Mbps of BW = 12.5 packets
§ Window size to fill 1Gbps = 12,500 packets
§ Either is possible!

Congestion Control 34
TCP Slow Start
Host A Host B
v when connection begins,
increase rate
exponentially until first one segm
ent
loss event:

RTT
§ initially cwnd = 1 MSS two segm
ents
§ double cwnd every RTT
§ Simpler implementation
achieved by incrementing four segm
ents
cwnd for every ACK
received
v summary: initial rate is
slow but ramps up time
exponentially fast
Congestion Control 35
Adjusting to Varying Bandwidth
v Slow start gave an estimate of available bandwidth

v Now, want to track variations in this available


bandwidth, oscillating around its current value
§ Repeated probing (rate increase) and backoff (rate
decrease)
§ Known as Congestion Avoidance (CA)

v TCP uses: “Additive Increase Multiplicative


Decrease” (AIMD)
§ We’ll see why shortly…
Congestion Control 36
AIMD
v approach: sender increases transmission rate (window size), probing for
usable bandwidth, until loss occurs
§ additive increase: increase cwnd by 1 MSS every RTT until loss
detected
§ For each successful RTT, cwnd = cwnd +1
§ Simple implementation: for each ACK, cwnd = cwnd + 1/cwnd

§ multiplicative decrease: cut cwnd in half after loss


additively increase window size …
…. until loss occurs (then cut window in half)
congestion window size
cwnd: TCP sender

AIMD saw tooth


behavior: probing
for bandwidth

time Congestion Control 37


Leads to the TCP “Sawtooth”

Window Loss

Loss Loss Loss Loss

Exponential t
“slow start”

Congestion Control 38
Slow-Start vs. AIMD
v When does a sender stop Slow-Start and
start Additive Increase?

v Introduce a “slow start


threshold” (ssthresh)
§ Initialized to a large value
§ On timeout, ssthresh = CWND/2

v When CWND = ssthresh, sender switches


from slow-start to AIMD-style increase
Congestion Control 39
Implementation
v State at sender
§ CWND (initialized to a small constant)
§ ssthresh (initialized to a large constant)
§ [Also dupACKcount and timer, as before]

v Events
§ ACK (new data)
§ dupACK (duplicate ACK for old data)
§ Timeout

Congestion Control 40
Event: ACK (new data)
v If CWND < ssthresh • CWND packets per RTT
§ CWND += 1 • Hence after one RTT
with no drops:
CWND = 2xCWND

Congestion Control 41
Event: ACK (new data)
v If CWND < ssthresh
Slow start phase
§ CWND += 1

v Else
§ CWND = CWND + 1/ “Congestion
CWND Avoidance” phase
(additive increase)
• CWND packets per RTT
• Hence after one RTT
with no drops:
CWND = CWND + 1

Congestion Control 42
Event: TimeOut

v On Timeout
§ ssthresh ß CWND/2
§ CWND ß 1

Congestion Control 43
Event: dupACK

v dupACKcount ++

v If dupACKcount = 3 /* fast retransmit */


§ ssthresh = CWND/2
§ CWND = CWND/2

Congestion Control 44
Example
Window
Timeout SSThresh
Fast
Retransmission Set to Here

Slow start in operation until


it reaches half of previous
CWND, I.e., SSTHRESH t

Slow-start restart: Go back to CWND = 1 MSS, but take


advantage of knowing the previous value of CWND
Congestion Control 45
One Final Phase: Fast Recovery
v The problem: congestion avoidance too
slow in recovering from an isolated loss

Congestion Control 46
Example (window in units of MSS, not bytes)
v Consider a TCP connection with:
§ CWND=10 packets (of size MSS, which is 100 bytes)
§ Last ACK was for byte # 101
• i.e., receiver expecting next packet to have seq. no. 101

v 10 packets [101, 201, 301,…, 1001] are in flight


§ Packet 101 is dropped
§ What ACKs do they generate?
§ And how does the sender respond?

Congestion Control 47
Timeline
v ACK 101 (due to 201) cwnd=10 dupACK#1 (no xmit)
v ACK 101 (due to 301) cwnd=10 dupACK#2 (no xmit)
v ACK 101 (due to 401) cwnd=10 dupACK#3 (no xmit)
v RETRANSMIT 101 ssthresh=5 cwnd= 5
v ACK 101 (due to 501) cwnd=5 + 1/5 (no xmit)
v ACK 101 (due to 601) cwnd=5 + 2/5 (no xmit)
v ACK 101 (due to 701) cwnd=5 + 3/5 (no xmit)
v ACK 101 (due to 801) cwnd=5 + 4/5 (no xmit)
v ACK 101 (due to 901) cwnd=5 + 5/5 (no xmit)
v ACK 101 (due to 1001) cwnd=6 + 1/5 (no xmit)
v ACK 1101 (due to 101) ç only now can we transmit new packets
v Plus no packets in flight so ACK “clocking” (to increase CWND)
stalls for another RTT

Congestion Control 48
Solution: Fast Recovery
Idea: Grant the sender temporary “credit” for each dupACK
so as to keep packets in flight

v If dupACKcount = 3
§ ssthresh = cwnd/2
§ cwnd = ssthresh + 3

v While in fast recovery


§ cwnd = cwnd + 1 for each additional duplicate ACK

v Exit fast recovery after receiving new ACK


§ set cwnd = ssthresh

Congestion Control 49
Example
v Consider a TCP connection with:
§ CWND=10 packets (of size MSS = 100 bytes)
§ Last ACK was for byte # 101
• i.e., receiver expecting next packet to have seq. no.
101

v 10 packets [101, 201, 301,…, 1001] are in


flight
§ Packet 101 is dropped

Congestion Control 50
Timeline
v ACK 101 (due to 201) cwnd=10 dup#1
v ACK 101 (due to 301) cwnd=10 dup#2
v ACK 101 (due to 401) cwnd=10 dup#3
v REXMIT 101 ssthresh=5 cwnd= 8 (5+3)
v ACK 101 (due to 501) cwnd= 9 (no xmit)
v ACK 101 (due to 601) cwnd=10 (no xmit)
v ACK 101 (due to 701) cwnd=11 (xmit 1101)
v ACK 101 (due to 801) cwnd=12 (xmit 1201)
v ACK 101 (due to 901) cwnd=13 (xmit 1301)
v ACK 101 (due to 1001) cwnd=14 (xmit 1401)
v ACK 1101 (due to 101) cwnd = 5 (xmit 1501) ç exiting fast recovery
v Packets 1101-1401 already in flight
v ACK 1201 (due to 1101) cwnd = 5 + 1/5 ß back in congestion avoidance

Congestion Control 51
Summary: TCP Congestion Control
New
New ACK!
ACK!
duplicate ACK
dupACKcount++ new ACK
new ACK
.
cwnd = cwnd + MSS (MSS/cwnd)
dupACKcount = 0
cwnd = cwnd+MSS transmit new segment(s), as allowed
dupACKcount = 0
Λ transmit new segment(s), as allowed
cwnd = 1 MSS
ssthresh = 64 KB cwnd > ssthresh
dupACKcount = 0 slow Λ congestion
start timeout avoidance
ssthresh = cwnd/2
cwnd = 1 MSS duplicate ACK
timeout dupACKcount = 0 dupACKcount++
ssthresh = cwnd/2 retransmit missing segment
cwnd = 1 MSS
dupACKcount = 0
retransmit missing segment
timeout New
ACK!
ssthresh = cwnd/2
cwnd = 1 New ACK
dupACKcount = 0
retransmit missing segment cwnd = ssthresh dupACKcount == 3
dupACKcount == 3 dupACKcount = 0
ssthresh= cwnd/2 ssthresh= cwnd/2
cwnd = ssthresh + 3 cwnd = ssthresh + 3
retransmit missing segment retransmit missing segment
fast
recovery
duplicate ACK
cwnd = cwnd + MSS
transmit new segment(s), as allowed

Congestion Control 52
TCP Flavours

v TCP-Tahoe
§ cwnd =1 on triple dup ACK & timeout
v TCP-Reno
§ cwnd =1 on timeout
§ cwnd = cwnd/2 on triple dup ACK
v TCP-newReno
§ TCP-Reno + improved fast recovery
v TCP-SACK (NOT COVERED IN THE COURSE)
§ incorporates selective acknowledgements
Congestion Control 53
TCP/Reno: Big Picture

CongWin
TD TD TD
TO
threshold

threshold threshold
threshold

Time
slow congestion congestion congestion slow congestion
start avoidance avoidance avoidance start avoidance

TD: Triple duplicate acknowledgements


TO: Timeout

Congestion Control 54
Sample Problem on Congestion
Window Evolution
http://gaia.cs.umass.edu/kurose_ross/interactive/tcp_evolution.php

Congestion Control 55
TCP Fairness
fairness goal: if K TCP sessions share same
bottleneck link of bandwidth R, each should have
average rate of R/K

TCP connection 1

bottleneck
router
capacity R
TCP connection 2

Congestion Control 56
Why AIMD?
v Some rate adjustment options: Every RTT, we can
§ Multiplicative increase or decrease: CWND→ a*CWND
§ Additive increase or decrease: CWND→ CWND + b

v Four alternatives:
§ AIAD: gentle increase, gentle decrease
§ AIMD: gentle increase, drastic decrease
§ MIAD: drastic increase, gentle decrease
§ MIMD: drastic increase and decrease

Congestion Control 57
Simple Model of Congestion Control
1 Efficiency Fairness
line line
v Two users
§ rates x1 and x2

User 2’s rate (x2)


v Congestion when
x1+x2 > 1
v Unused capacity
when x1+x2 < 1

v Fair when x1 =x2

User 1’s rate (x1) 1

Congestion Control 58
Example
1 fairness
Efficient: x1+x2=1 line
Fair

Congested: x1+x2=1.2

User 2: x2
(0.2, 0.5) (0.7, 0.5)
(0.5, 0.5)

Inefficient: x1+x2=0.7 (0.7, 0.3)

Efficient: x1+x2=1 efficiency


line
Not fair

User 1: x1 1

Congestion Control 59
AIAD
fairness
(x1-aD+aI), line
v Increase: x + aI x2-aD+aI))
v Decrease: x - aD
v Does not (x1,x2)
converge to

User 2: x2
fairness
(x1-aD,x2-aD)

efficiency
line

User 1: x1
Congestion Control 60
AIAD Sharing Dynamics
x1
A x2
B
D E

60

50

40

30

20

10

0
1
28
55
82
109
136
163
190
217
244
271
298
325
352
379
406
433
460
487
61
Congestion Control 61
MIMD
fairness
v Increase: x*bI line
(x1,x2)
v Decrease: x*bD (bIbDx1,
v Does not bIbDx2)

converge to

User 2: x2
fairness
(bdx1,bdx2)

efficiency
line

User 1: x1
Congestion Control 62
AIMD
fairness
line
v Increase: x+aI (x1,x2)
(bDx1+aI,
v Decrease: x*bD bDx2+aI)

v Converges to

User 2: x2
fairness
(bDx1,bDx2)

efficiency
line

User 1: x1
Congestion Control 63
AIMD Sharing Dynamics
A x1 B
x2 50 pkts/sec

D E
60

50

40 Rates equalize à fair share


30

20

10

0
1
28
55
82
109
136
163
190
217
244
271
298
325
352
379
406
433
460
487
64
Congestion Control 64
TCP AIMD
two competing sessions:
v additive increase gives slope of 1, as throughout increases
v multiplicative decrease decreases throughput proportionally

R equal bandwidth share


Connection 2 throughput

loss: decrease window by factor of 2


congestion avoidance: additive increase
loss: decrease window by factor of 2
congestion avoidance: additive increase

Connection 1 throughput R

Congestion Control 65
A Simple Model for TCP Throughput

cwnd Timeouts

½ Wmax RTTs between drops


Wmax

Wmax
A
2

1
t

RTT

Avg. ¾ Wmax packets per RTTs


A Simple Model for TCP Throughput

cwnd Timeouts

Wmax

Wmax
A
2

t
3 2
Packet drop rate, p = 1 / A, where A = Wmax
8
A 3 1
Throughput, B = =
! Wmax $ 2 RTT p
# & RTT
" 2 % Congestion Control 67
Implications (1): Different RTTs
3 1
Throughput =
2 RTT p

v Flows get throughput inversely proportional to RTT


v TCP unfair in the face of heterogeneous RTTs!

A1 B1
100ms

200ms
bottleneck
A2 link B2
Congestion Control 68
Implications (2): High Speed TCP
3 1
Throughput =
2 RTT p

v Assume RTT = 100ms, MSS=1500bytes

v What value of p is required to reach 100Gbps throughput


§ ~ 2 x 10-12
v How long between drops?
§ ~ 16.6 hours
v How much data has been sent in this time?
§ ~ 6 petabits
v These are not practical numbers!

Congestion Control 69
Adapting TCP to High Speed

§ Once past a threshold speed, increase CWND faster


§ A proposed standard [Floyd’03]: once speed is past some threshold,
change equation to p-.8 rather than p-.5
§ Let the additive constant in AIMD depend on CWND

v Other approaches?
§ Multiple simultaneous connections (hack but works
today)
§ Router-assisted approaches (will see shortly)

High Speed TCP: http://www.icir.org/floyd/hstcp.html

Congestion Control 70
Implications (3): Rate-based CC

3 1
Throughput =
2 RTT p
v TCP throughput is “choppy”
§ repeated swings between W/2 to W

v Some apps would prefer sending at a steady rate


§ e.g., streaming apps

v A solution: “Equation-Based Congestion Control”


§ ditch TCP’s increase/decrease rules and just follow the equation
§ measure drop percentage p, and set rate accordingly

v Following the TCP equation ensures we’re “TCP friendly”


§ i.e., use no more than TCP does in similar setting

Congestion Control 71
Implications (4): Loss not due to congestion?

v TCP will confuse corruption with congestion

v Flow will cut its rate


§ Throughput ~ 1/sqrt(p) where p is loss prob.
§ Applies even for non-congestion losses!

Congestion Control 72
Implications: (5) How do short flows fare?

v 50% of flows have < 1500B to send; 80% < 100KB

v Implication (1): short flows never leave slow start!


§ short flows never attain their fair share

v Implication (2): too few packets to trigger dupACKs


§ Isolated loss may lead to timeouts
§ At typical timeout values of ~500ms, might severely
impact latency

Congestion Control 73
Implications: (6) TCP fills up queues à long
delays

v A flow deliberately overshoots capacity, until it


experiences a drop

v Means that delays are large for everyone


§ Consider a flow transferring a 10GB file sharing a
bottleneck link with 10 flows transferring 100B

Congestion Control 74
Implications: (7) Cheating
v Three easy ways to cheat
§ Increasing CWND faster than +1 MSS per RTT

Congestion Control 75
Increasing CWND Faster

C y
x increases by 2 per RTT
y increases by 1 per RTT

Limit rates:
x = 2y

x
Congestion Control 76
Implications: (7) Cheating
v Three easy ways to cheat
§ Increasing CWND faster than +1 MSS per RTT
§ Opening many connections

Congestion Control 77
Open Many Connections
x
A B
y
D E

Assume
• A starts 10 connections to B
• D starts 1 connection to E
• Each connection gets about the same throughput

Then A gets 10 times more throughput than D

Congestion Control 78
Implications: (7) Cheating
v Three easy ways to cheat
§ Increasing CWND faster than +1 MSS per RTT
§ Opening many connections
§ Using large initial CWND

v Why hasn’t the Internet suffered a congestion


collapse yet?

Congestion Control 79
Implications: (8) CC intertwined
with reliability
l Mechanisms for CC and reliability are tightly coupled
l CWND adjusted based on ACKs and timeouts
l Cumulative ACKs and fast retransmit/recovery rules

l Complicates evolution
l Consider changing from cumulative to selective ACKs
l A failure of modularity, not layering

l Sometimes we want CC but not reliability


l e.g., real-time applications
l Sometimes we want reliability but not CC (?)
Congestion Control 80
Recap: TCP problems
Routers tell endpoints
if they’re congested
v Misled by non-congestion losses
v Fills up queues leading to high delays
v Short flows complete before discovering available capacity
v AIMD impractical for high speed links Routers tell
v Sawtooth discovery too choppy for some apps endpoints what
rate to send at
v Unfair under heterogeneous RTTs
v Tight coupling with reliability mechanisms
v Endhosts can cheat
Routers enforce
fair sharing

Could fix many of these with some help from routers!


Congestion Control
81
Transport Layer: Summary
v principles behind
transport layer services:
§ multiplexing,
demultiplexing next:
v leaving the
§ reliable data transfer
network
§ flow control “edge” (application
§ congestion control , transport layers)
v instantiation, v into the network
implementation in the “core”
Internet
§ UDP
§ TCP
Congestion Control 82
Chapter 4: network layer
Our goals:
v understand principles behind network layer
services:
§ network layer service models
§ forwarding versus routing
§ how a router works
§ routing (path selection)
§ broadcast, multicast
v instantiation, implementation in the Internet

Network Layer 83
Network Layer: outline
4.1 introduction 4.5 routing algorithms
4.2 virtual circuit and § link state
datagram networks § distance vector
4.3 what’s inside a router § hierarchical routing
4.4 IP: Internet Protocol 4.6 routing in the Internet
§ datagram format § RIP
§ IPv4 addressing § OSPF
§ BGP
§ ICMP
§ IPv6 4.7 broadcast and multicast
routing

Network Layer 84
Some background
v 1968: DARPAnet/ARPAnet (precursor to Internet)
§ (Defense) Advanced Research Projects Agency Network

v Mid 1970’s: new networks emerge


§ SATNet, Packet Radio, Ethernet
§ All “islands” to themselves – didn’t work together

v Big question: How to connect these networks?

Network Layer 85
Internet Protocol Stack
Email
• Application: Email, Web, … Web

Internetworking
• Transport: TCP, UDP, … UDP
TCP

v
• Network:
Cerf & Kahn in 1974, IP IP
• Link:
§ “A Protocol for Packet Network
Ethernet, WiFi, ATM, … Ethernet
Intercommunication”
• Physical: copper, fiber, air, … ATM
§ Foundation for the modern Internet

v Routers forward packets from source to


destination• “Hourglass” model, “thin waist”, “narrow waist”
§ May cross many separate networks along the
way

v All packets use a common Internet


Protocol
§ Any underlying data link protocol
§ Any higher layer transport protocol Network Layer 86
Network layer
application

v transport segment from transport


network

sending to receiving host data link


physical
network network

v on sending side network


data link
data link
physical
data link
physical

encapsulates segments physical network


data link
network
data link

into datagrams physical physical

v on receiving side, delivers network


data link
network
data link

segments to transport
physical physical
network
data link

layer
physical
application
network transport

v network layer protocols network


data link
physical
network
data link
network
data link

in every host, router data link


physical
physical physical

v router examines header


fields in all IP datagrams
passing through it
Network Layer 87
Two key network-layer functions
v forwarding: move analogy:
packets from
routing: process of
v
router’s input to planning trip from source
appropriate router to dest
output
v forwarding: process of
v routing: determine getting through single
route taken by interchange
packets from source
to dest.
§ routing algorithms
Network Layer 88
Quiz: When should a router perform
routing? And fowarding ?

A: Do both when a packet arrives

B: Route in advance, forward when a packet arrives

C: Forward in advance, route when a packet arrives

D: Do both in advance

E: Some other combination

Network Layer 89
Interplay between routing and forwarding

routing algorithm
routing algorithm determines
end-end-path through network

forwarding table determines


local forwarding table
local forwarding at this router
header value output link
0100 3
0101 2
0111 2
1001 1

value in arriving
packet’s header
0111
1

3 2

Network Layer 90
Connection setup
v 3rd important function in some network
architectures:
§ ATM, frame relay, X.25
v before datagrams flow, two end hosts and
intervening routers establish virtual
connection
§ routers get involved
v network vs transport layer connection
service:
§ network: between two hosts (may also involve
intervening routers in case of VCs)
§ transport: between two processes Network Layer 91
Self Study

Network service model


Q: What service model for “channel” transporting
datagrams from sender to receiver?
example services for example services for a flow
individual datagrams: of datagrams:
v guaranteed delivery v in-order datagram
v guaranteed delivery with delivery
less than 40 msec delay v guaranteed minimum
bandwidth to flow
v restrictions on changes in
inter-packet spacing

Network Layer 92
Self Study

Network layer service models:


Guarantees ?
Network Service Congestion
Architecture Model Bandwidth Loss Order Timing feedback

Internet best effort none no no no no (inferred


via loss)
ATM CBR constant yes yes yes no
rate congestion
ATM VBR guaranteed yes yes yes no
rate congestion
ATM ABR guaranteed no yes no yes
minimum
ATM UBR none no yes no no

Network Layer 93
Network Layer: outline
4.1 introduction 4.5 routing algorithms
4.2 virtual circuit and § link state
datagram networks § distance vector
4.3 what’s inside a router § hierarchical routing
4.4 IP: Internet Protocol 4.6 routing in the Internet
§ datagram format § RIP
§ IPv4 addressing § OSPF
§ BGP
§ ICMP
§ IPv6 4.7 broadcast and multicast
routing

Network Layer 94
Connection, connection-less service
v datagram network provides network-layer
connectionless service
v virtual-circuit network provides network-
layer connection service
v analogous to TCP/UDP connecton-
oriented / connectionless transport-layer
services, but:
§ service: host-to-host
§ no choice: network provides one or the other
§ implementation: in network core
Network Layer 95
Virtual circuits
“source-to-dest path behaves much like telephone
circuit”
§ performance-wise
§ network actions along source-to-dest path

v call setup, teardown for each call before data can flow
v each packet carries VC identifier (not destination host
address)
v every router on source-dest path maintains “state” for
each passing connection
v link, router resources (bandwidth, buffers) may be
allocated to VC (dedicated resources = predictable
service)
Network Layer 96
VC implementation
a VC consists of:
1. path from source to destination
2. VC numbers, one number for each link along
path
3. entries in forwarding tables in routers along path
v packet belonging to VC carries VC
number (rather than dest address)
v VC number can be changed on each link.
§ new VC number comes from forwarding table

Network Layer 97
VC forwarding table 22 32
12 Host B
R1 22 R2
Host A 12 32

1 3
2
VC number
interface
forwarding table in number
Router R1:
Incoming interface Incoming VC # Outgoing interface Outgoing VC #

1 12 3 22
2 63 1 18
3 7 2 17
1 97 3 87
… … … …

VC routers maintain connection state information!


Network Layer 98
Virtual circuits: signaling protocols
v used to setup, maintain teardown VC
v used in ATM, frame-relay, X.25
v not widely used in today’s Internet

application application
5. data flow begins 6. receive data
transport transport
4. call connected 3. accept call
network network
1. initiate call 2. incoming call
data link data link
physical physical

Network Layer 99
IN PRACTICE Not on Exam

Virtual Circuits in Action


v MPLS (Multi-protocol Label Switching): RFC 3031
§ Packets pre-fixed with “labels”
• A 20-bit label value
• a 3-bit Traffic Class field for Quality of Service (QoS) priority and ECN
(Explicit Congestion Notification)
• a 1-bit bottom of stack flag.
– If this is set, it signifies that the current label is the last in the stack
• 8-bit TTL field
§ Labels can be stacked on top of another
§ Virtual circuit (label-switched path) established between Label
Edge Routers (LERs)
§ Often used to setup Virtual Private Networks (VPN)

Article Linked to Webpage for those interested to dig deeper

Network Layer 100


Datagram networks
v no call setup at network layer
v routers: no state about end-to-end connections
§ no network-level concept of “connection”
v packets forwarded using destination host address

application application
transport transport
network 1. send datagrams 2. receive datagrams network
data link data link
physical physical

Network Layer 101


Datagram forwarding table
4 billion IP addresses, so
routing algorithm rather than list individual
destination address
local forwarding table
list range of addresses
dest address output link (aggregate table entries)
address-range 1 3
address-range 2 2
address-range 3 2
address-range 4 1

IP destination address in
arriving packet’s header
1

3 2

Network Layer 102


Datagram forwarding table
Link Interface
Destination Address Range

11001000 00010111 00010000 00000000


0
through
11001000 00010111 00010111 11111111

11001000 00010111 00011000 00000000


1
through
11001000 00010111 00011000 11111111

11001000 00010111 00011001 00000000 2


through
11001000 00010111 00011111 11111111

otherwise 3

Q: but what happens if ranges don’t divide up so nicely?


Network Layer 103
Longest prefix matching
longest prefix matching
when looking for forwarding table entry for given
destination address, use longest address prefix that
matches destination address.

Destination Address Range Link interface


11001000 00010111 00010*** ********* 0
11001000 00010111 00011000 ********* 1
11001000 00010111 00011*** ********* 2
otherwise 3

examples:
DA: 11001000 00010111 00010110 10100001 which interface?
DA: 11001000 00010111 00011000 10101010 which interface?
Network Layer 104
Datagram or VC network: why?
Internet (datagram) ATM (VC)
v data exchange among v evolved from telephony
computers v human conversation:
§ “elastic” service, no strict § strict timing, reliability
timing req. requirements
§ need for guaranteed service
v many link types v “dumb” end systems
§ different characteristics § telephones
§ uniform service difficult § complexity inside network
v “smart” end systems
(computers)
§ can adapt, perform control,
error recovery
§ simple inside network,
complexity at “edge”

Network Layer 105


Quiz: Connection state
v Which of the following relies on
connection state in routers in the network?
Pick one.

A. TCP
B. Internet
C. Virtual circuit network
D. UDP
E. A and C

Network Layer 106


Quiz: Longest prefix matching
v On which outgoing interface will a packet
destined to 11011001 be forwarded?

Prefix Interface
1* A
11* B
111* C
Default D

Network Layer 107


Network Layer: outline
4.1 introduction 4.5 routing algorithms
4.2 virtual circuit and § link state
datagram networks § distance vector
4.3 what’s inside a router § hierarchical routing
4.4 IP: Internet Protocol 4.6 routing in the Internet
§ datagram format § RIP
§ IPv4 addressing § OSPF
§ BGP
§ ICMP
§ IPv6 4.7 broadcast and multicast
routing

Network Layer 108


The Internet network layer
host, router network layer functions:

transport layer: TCP, UDP

routing protocols IP protocol


• path selection • addressing conventions
• RIP, OSPF, BGP • datagram format
network • packet handling conventions
layer forwarding
table ICMP protocol
• error reporting
• router “signaling”

link layer

physical layer

Network Layer 109


What is Designing IP?
v Syntax: format of packet
§ Nontrivial part: packet “header”
§ Rest is opaque payload (why opaque?)

Header Opaque Payload

v Semantics: meaning of header fields


§ Required processing Network Layer 110
Packet Headers

v Think of packet header as interface


§ Only way of passing information from packet to
router

v Designing interfaces
§ What task are you trying to perform?
§ What information do you need to accomplish it?

v Header reflects information needed for basic


tasks
Network Layer 111
What Tasks Do We Need to Do?
v Read packet correctly
v Get packet to the destination; responses
back to the source
v Carry data
v Tell host what to do with packet once
arrived
v Specify any special network handling of the
packet
v Deal with problems that arise along the
path
Network Layer 112
IP Packet Structure

4-bit 4-bit 8-bit


Version Header Type of Service 16-bit Total Length (Bytes)
Length (TOS)

3-bit
16-bit Identification Flags 13-bit Fragment Offset

8-bit Time to
Live (TTL) 8-bit Protocol 16-bit Header Checksum

32-bit Source IP Address

32-bit Destination IP Address

Options (if any)

Payload

Network Layer 113


20 Bytes of Standard Header, then Options

4-bit 4-bit 8-bit


Version Header Type of Service 16-bit Total Length (Bytes)
Length (TOS)

3-bit
16-bit Identification Flags 13-bit Fragment Offset

8-bit Time to
Live (TTL) 8-bit Protocol 16-bit Header Checksum

32-bit Source IP Address

32-bit Destination IP Address

Options (if any)

Payload

Network Layer 114


Fields for Reading Packet Correctly

4-bit 4-bit 8-bit


Version Header Type of Service 16-bit Total Length (Bytes)
Length (TOS)

3-bit
16-bit Identification Flags 13-bit Fragment Offset

8-bit Time to
Live (TTL) 8-bit Protocol 16-bit Header Checksum

32-bit Source IP Address

32-bit Destination IP Address

Options (if any)

Payload

Network Layer 115


Reading Packet Correctly

v Version number (4 bits)


§ Indicates the version of the IP protocol
§ Necessary to know what other fields to expect
§ Typically “4” (for IPv4), and sometimes “6” (for IPv6)

v Header length (4 bits)


§ Number of 32-bit words in the header
§ Typically “5” (for a 20-byte IPv4 header)
§ Can be more when IP options are used

v Total length (16 bits)


§ Number of bytes in the packet
§ Maximum size is 65,535 bytes (216 -1)
§ … though underlying links may impose smaller limits

Network Layer 116


Fields for Reaching Destination and Back

4-bit 4-bit 8-bit


Version Header Type of Service 16-bit Total Length (Bytes)
Length (TOS)

3-bit
16-bit Identification Flags 13-bit Fragment Offset

8-bit Time to
Live (TTL) 8-bit Protocol 16-bit Header Checksum

32-bit Source IP Address

32-bit Destination IP Address

Options (if any)

Payload

Network Layer 117


Telling End-Host How to Handle Packet

4-bit 4-bit 8-bit


Version Header Type of Service 16-bit Total Length (Bytes)
Length (TOS)

3-bit
16-bit Identification Flags 13-bit Fragment Offset

8-bit Time to
Live (TTL) 8-bit Protocol 16-bit Header Checksum

32-bit Source IP Address

32-bit Destination IP Address

Options (if any)

Payload

Network Layer 118


Telling End-Host How to Handle Packet

v Protocol (8 bits)
§ Identifies the higher-level protocol
§ Important for demultiplexing at receiving host

L7 Application SMTP HTTP DNS NTP

L4 Transport TCP UDP

L3 Network IP

L2 Data link Ethernet FDDI PPP

L1 Physical optical copper radio PSTN

Network Layer 119


Telling End-Host How to Handle Packet

v Protocol (8 bits)
§ Identifies the higher-level protocol
§ Important for demultiplexing at receiving host
v Most common examples
§ E.g., “6” for the Transmission Control Protocol (TCP)
§ E.g., “17” for the User Datagram Protocol (UDP)

protocol=6 protocol=17
IP header IP header
TCP header UDP header

Network Layer 120


Potential Problems
v Header Corrupted: Checksum

v Loop: TTL

v Packet too large: Fragmentation

Network Layer 121


Checksum, TTL and Fragmentation Fields

4-bit 4-bit 8-bit


Version Header Type of Service 16-bit Total Length (Bytes)
Length (TOS)

3-bit
16-bit Identification Flags 13-bit Fragment Offset

8-bit Time to
Live (TTL) 8-bit Protocol 16-bit Header Checksum

32-bit Source IP Address

32-bit Destination IP Address

Options (if any)

Payload

Network Layer 122


Header Corruption (Checksum)
v Checksum (16 bits)
§ Particular form of checksum over packet
header

v If not correct, router discards packets


§ So it doesn’t act on bogus information

v Checksum recalculated at every router


§ Why?
§ Why include TTL?
§ Why only header? Network Layer 123
Preventing Loops (TTL)
v Forwarding loops cause packets to cycle for a looong time
§ As these accumulate, eventually consume all capacity

v Time-to-Live (TTL) Field (8 bits)


§ Decremented at each hop, packet discarded if reaches 0
§ …and “time exceeded” message is sent to the source

Network Layer 124


IP fragmentation, reassembly
v network links have MTU
(max.transfer size) -
largest possible link-level fragmentation:


frame in: one large datagram
§ different link types, out: 3 smaller datagrams
different MTUs
v large IP datagram divided
(“fragmented”) within net reassembly
§ one datagram becomes
several datagrams
§ “reassembled” only at
final destination …
§ IP header bits used to
identify, order related
fragments
Network Layer 125
IP fragmentation, reassembly
length ID fragflag offset
example: =4000 =x =0 =0

v 4000 byte datagram


one large datagram becomes
v MTU = 1500 bytes several smaller datagrams

1480 bytes in length ID fragflag offset


data field =1500 =x =1 =0

offset = length ID fragflag offset


1480/8 =1500 =x =1 =185

length ID fragflag offset


=1040 =x =0 =370

Applet:
http://media.pearsoncmg.com/aw/aw_kurose_network_2/applets/ip/ipfragmentation.html
Network Layer 126
Quiz: How can we use this for evil?
A: Send fragments that overlap.

B: Send many tiny fragments, none of which have


offset 0.

C: Send segments that when assembled, are bigger


than the maximum IP datagram.

D: More than one of the above.

E: Nah, networks (and operating systems) are too


robust for this to cause problems. Network Layer 127
IP Fragmentation Attacks
IP Fragmentation …
Attacks…

Network Layer 128


Fields for Special Handling

4-bit 4-bit 8-bit


Version Header Type of Service 16-bit Total Length (Bytes)
Length (TOS)

3-bit
16-bit Identification Flags 13-bit Fragment Offset

8-bit Time to
Live (TTL) 8-bit Protocol 16-bit Header Checksum

32-bit Source IP Address

32-bit Destination IP Address

Options (if any)

Payload

Network Layer 129


Special Handling
v “Type of Service”, or “Differentiated
Services Code Point (DSCP)” (8 bits)
§ Allow packets to be treated differently based on needs
§ E.g., low delay for audio, high bandwidth for bulk transfer
§ Has been redefined several times, will cover later in class

v Options (not often used)

Network Layer 130


IP datagram format
IP protocol version
32 bits total datagram
number
length (bytes)
header length head. type of
ver length
(bytes) len service
for
“type” of data fragment fragmentation/
16-bit identifier flgs
offset reassembly
max number time to upper header
remaining hops live layer checksum
(decremented at
each router) 32 bit source IP address

32 bit destination IP address


upper layer protocol
to deliver payload to options (if any) e.g. timestamp,
record route
how much overhead? data taken, specify
(variable length, list of routers
v 20 bytes of TCP to visit.
typically a TCP
v 20 bytes of IP
or UDP segment)
v = 40 bytes + app
layer overhead

Network Layer 131

You might also like