0% found this document useful (0 votes)
8 views

3rdedition Chapter3

Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

3rdedition Chapter3

Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 111

Chapter 3

Transport Layer

A note on the use of these ppt slides: Computer


We’re making these slides freely available to all (faculty, students, readers). Networking: A Top
They’re in PowerPoint form so you can add, modify, and delete slides
(including this one) and slide content to suit your needs. They obviously
Down Approach
represent a lot of work on our part. In return for use, we only ask the Featuring the
following:
 If you use these slides (e.g., in a class) in substantially unaltered form, Internet,
that you mention their source (after all, we’d like people to use our book!) 3rd edition.
 If you post any slides in substantially unaltered form on a www site, that
you note that they are adapted from (or perhaps identical to) our slides, and Jim Kurose, Keith
note our copyright of this material. Ross
Thanks and enjoy! JFK/KWR Addison-Wesley, July
2004.
All material copyright 1996-2004
J.F Kurose and K.W. Ross, All Rights Reserved
Transport Layer 3-1
Chapter 3: Transport Layer
Our goals:
 understand  learn about transport
principles behind layer protocols in the
transport layer Internet:
services:  UDP: connectionless
 multiplexing/ transport
demultiplexing  TCP: connection-oriented
 reliable data transport
transfer  TCP congestion control
 flow control
 congestion control

Transport Layer 3-2


Chapter 3 outline
 3.1 Transport-layer  3.5 Connection-
services oriented transport:
 3.2 Multiplexing and TCP
demultiplexing  segment structure
 3.3 Connectionless
 reliable data transfer
 flow control
transport: UDP
 connection
 3.4 Principles of
management
reliable data transfer  3.6 Principles of
congestion control
 3.7 TCP congestion
control
Transport Layer 3-3
Transport services and
protocols
 provide logical applicatio
n
communication between app transport
network
processes running on data link
network
data link

lo
different hosts physical network physical

gi
data link

ca
 transport protocols run in physical

l
network

en
end systems data link

d-
physical network

en
data link
 send side: breaks app

d
physical

tr
messages into segments,

a
network

ns
data link
passes to network layer

po
physical

r t
 rcv side: reassembles
applicatio
segments into messages, n
transport
passes to app layer network
data link
 more than one transport physical

protocol available to apps


 Internet: TCP and UDP

Transport Layer 3-4


Transport vs. network layer
 network layer: Household analogy:
logical 12 kids sending letters
communication to 12 kids
between hosts  processes = kids
 transport layer:  app messages =
logical letters in envelopes
communication  hosts = houses
between processes  transport protocol =
 relies on, enhances, Ann and Bill
network layer services  network-layer protocol
= postal service

Transport Layer 3-5


Internet transport-layer
protocols
 reliable, in-order applicatio
n

delivery (TCP) transport


network network
data link data link
 congestion control

lo
physical network physical

gi
data link

ca
 flow control physical

l
network

en
 connection setup data link

d-
physical network

en
data link
 unreliable, unordered

d
physical

tr
a
network
delivery: UDP

ns
data link

po
physical

r
 no-frills extension of

t
applicatio
“best-effort” IP n
transport
 services not available: network
data link
physical
 delay guarantees
 bandwidth guarantees

Transport Layer 3-6


Chapter 3 outline
 3.1 Transport-layer  3.5 Connection-
services oriented transport:
 3.2 Multiplexing and TCP
demultiplexing  segment structure
 3.3 Connectionless
 reliable data transfer
 flow control
transport: UDP
 connection
 3.4 Principles of
management
reliable data transfer  3.6 Principles of
congestion control
 3.7 TCP congestion
control
Transport Layer 3-7
Multiplexing/demultiplexing
Demultiplexing at rcv host: Multiplexing at send host:
gathering data from multiple
delivering received segments
sockets, enveloping data with
to correct socket
header (later used for
demultiplexing)
= socket = process

application P3 P1
P1 application P2 P4 application

transport transport transport

network network network

link link link

physical physical physical

host 2 host 3
host 1
Transport Layer 3-8
How demultiplexing works
 host receives IP datagrams
 each datagram has source IP
address, destination IP address 32 bits
 each datagram carries 1
source port # dest port #
transport-layer segment
 each segment has source,
destination port number other header fields
 host uses IP addresses & port
numbers to direct segment to
appropriate socket
application
data
(message)

TCP/UDP segment format

Transport Layer 3-9


Connectionless
demultiplexing
 When host receives
 Create sockets with port
UDP segment:
numbers:
DatagramSocket mySocket1 = new
 checks destination port
DatagramSocket(99111); number in segment
DatagramSocket mySocket2 = new  directs UDP segment to
DatagramSocket(99222); socket with that port
 UDP socket identified by number
 IP datagrams with
two-tuple:
(dest IP address, dest port number) different source IP
addresses and/or
source port numbers
directed to same
socket
Transport Layer 3-10
Connectionless demux (cont)
DatagramSocket serverSocket = new DatagramSocket(6428);

P2 P1
P1
P3

SP: 6428 SP: 6428


DP: 9157 DP: 5775

SP: 9157 SP: 5775


client DP: 6428 DP: 6428 Client
server
IP: A IP: C IP:B

SP provides “return address”

Transport Layer 3-11


Connection-oriented demux
 TCP socket identified  Server host may
by 4-tuple: support many
 source IP address simultaneous TCP
 source port number sockets:
 dest IP address  each socket identified
 dest port number by its own 4-tuple
 recv host uses all four  Web servers have
values to direct different sockets for
segment to each connecting client
appropriate socket  non-persistent HTTP will
have different socket for
each request

Transport Layer 3-12


Connection-oriented demux
(cont)

P1 P4 P5 P6 P2 P1P3

SP: 5775
DP: 80
S-IP: B
D-IP:C

SP: 9157 SP: 9157


client DP: 80 DP: 80 Client
server
IP: A S-IP: A S-IP: B IP:B
IP: C
D-IP:C D-IP:C

Transport Layer 3-13


Connection-oriented demux:
Threaded Web Server

P1 P4 P2 P1P3

SP: 5775
DP: 80
S-IP: B
D-IP:C

SP: 9157 SP: 9157


client DP: 80 DP: 80 Client
server
IP: A S-IP: A S-IP: B IP:B
IP: C
D-IP:C D-IP:C

Transport Layer 3-14


Chapter 3 outline
 3.1 Transport-layer  3.5 Connection-
services oriented transport:
 3.2 Multiplexing and TCP
demultiplexing  segment structure
 3.3 Connectionless
 reliable data transfer
 flow control
transport: UDP
 connection
 3.4 Principles of
management
reliable data transfer  3.6 Principles of
congestion control
 3.7 TCP congestion
control
Transport Layer 3-15
UDP: User Datagram Protocol [RFC
768]
 “no frills,” “bare bones”
Internet transport protocol Why is there a UDP?
 “best effort” service, UDP
 no connection
segments may be:
establishment (which can
 lost
add delay)
 delivered out of order
 simple: no connection
to app state at sender, receiver
 connectionless:
 small segment header
 no handshaking
 no congestion control:
between UDP sender,
UDP can blast away as
receiver
fast as desired
 each UDP segment
handled independently
of others

Transport Layer 3-16


UDP: more
 often used for streaming
multimedia apps 32 bits
 loss tolerant
Length, in source port # dest port #
 rate sensitive bytes of UDP length checksum
segment,
 other UDP uses
including
 DNS header
 SNMP
 reliable transfer over Application
UDP: add reliability at data
application layer (message)
 application-specific
error recovery!
UDP segment format

Transport Layer 3-17


UDP checksum
Goal: detect “errors” (e.g., flipped bits) in
transmitted segment

Sender: Receiver:
 treat segment contents  compute checksum of
as sequence of 16-bit received segment
integers  check if computed checksum
 checksum: addition (1’s equals checksum field value:
 NO - error detected
complement sum) of
segment contents  YES - no error detected.

 sender puts checksum But maybe errors


nonetheless? More later
value into UDP
….
checksum field

Transport Layer 3-18


Internet Checksum Example
 Note
 When adding numbers, a carryout from the
most significant bit needs to be added to
the result
 Example: add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0
1 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

wraparound 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

sum 1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 0
checksum 1 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1
Transport Layer 3-19
Chapter 3 outline
 3.1 Transport-layer  3.5 Connection-
services oriented transport:
 3.2 Multiplexing and TCP
demultiplexing  segment structure
 3.3 Connectionless
 reliable data transfer
 flow control
transport: UDP
 connection
 3.4 Principles of
management
reliable data transfer  3.6 Principles of
congestion control
 3.7 TCP congestion
control
Transport Layer 3-20
Principles of Reliable data
transfer
 important in app., transport, link layers
 top-10 list of important networking topics!

 characteristics of unreliable channel will determine complexity of reliable data transfer


protocol (rdt)

Transport Layer 3-21


Reliable data transfer: getting
started
rdt_send(): called from above, deliver_data(): called
(e.g., by app.). Passed data to by rdt to deliver data to
deliver to receiver upper layer upper

send receive
side side

udt_send(): called by rdt, rdt_rcv(): called when packet


to transfer packet over arrives on rcv-side of channel
unreliable channel to
receiver
Transport Layer 3-22
Reliable data transfer: getting
started
We’ll:
 incrementally develop sender, receiver
sides of reliable data transfer protocol (rdt)
 consider only unidirectional data transfer
 but control info will flow on both directions!
 use finite state machines (FSM) to specify
sender, receiver
event causing state transition
actions taken on state transition
state: when in this
“state” next state state state
uniquely determined 1 event
2
by next event actions

Transport Layer 3-23


Rdt1.0: reliable transfer over a reliable channel

 underlying channel perfectly reliable


 no bit errors
 no loss of packets

 separate FSMs for sender, receiver:


 sender sends data into underlying channel
 receiver read data from underlying channel

Wait for rdt_send(data) Wait for rdt_rcv(packet)


call from call from extract (packet,data)
above packet = make_pkt(data) below deliver_data(data)
udt_send(packet)

sender receiver

Transport Layer 3-24


Rdt2.0: channel with bit errors
 underlying channel may flip bits in packet
 checksum to detect bit errors

 the question: how to recover from errors:


 acknowledgements (ACKs): receiver explicitly tells
sender that pkt received OK
 negative acknowledgements (NAKs): receiver
explicitly tells sender that pkt had errors
 sender retransmits pkt on receipt of NAK
 new mechanisms in rdt2.0 (beyond rdt1.0):
 error detection
 receiver feedback: control msgs (ACK,NAK) rcvr-
>sender

Transport Layer 3-25


rdt2.0: FSM specification
rdt_send(data)
snkpkt = make_pkt(data, checksum) receiver
udt_send(sndpkt)
rdt_rcv(rcvpkt) &&
isNAK(rcvpkt)
Wait for Wait for rdt_rcv(rcvpkt) &&
call from ACK or udt_send(sndpkt) corrupt(rcvpkt)
above NAK
udt_send(NAK)

rdt_rcv(rcvpkt) && isACK(rcvpkt)


Wait for

call from
below
sender
rdt_rcv(rcvpkt) &&
notcorrupt(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
udt_send(ACK)

Transport Layer 3-26


rdt2.0: operation with no errors
rdt_send(data)
snkpkt = make_pkt(data, checksum)
udt_send(sndpkt)
rdt_rcv(rcvpkt) &&
isNAK(rcvpkt)
Wait for Wait for rdt_rcv(rcvpkt) &&
call from ACK or udt_send(sndpkt) corrupt(rcvpkt)
above NAK
udt_send(NAK)

rdt_rcv(rcvpkt) && isACK(rcvpkt)


Wait for
 call from
below

rdt_rcv(rcvpkt) &&
notcorrupt(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
udt_send(ACK)

Transport Layer 3-27


rdt2.0: error scenario
rdt_send(data)
snkpkt = make_pkt(data, checksum)
udt_send(sndpkt)
rdt_rcv(rcvpkt) &&
isNAK(rcvpkt)
Wait for Wait for rdt_rcv(rcvpkt) &&
call from ACK or udt_send(sndpkt) corrupt(rcvpkt)
above NAK
udt_send(NAK)

rdt_rcv(rcvpkt) && isACK(rcvpkt)


Wait for
 call from
below

rdt_rcv(rcvpkt) &&
notcorrupt(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
udt_send(ACK)

Transport Layer 3-28


rdt2.0 has a fatal flaw!
What happens if Handling duplicates:
ACK/NAK corrupted?  sender retransmits current
 sender doesn’t know pkt if ACK/NAK garbled
 sender adds sequence
what happened at
receiver! number to each pkt
 can’t just retransmit:  receiver discards (doesn’t

possible duplicate deliver up) duplicate pkt

stop and wait


Sender sends one packet,
then waits for receiver
response

Transport Layer 3-29


rdt2.1: sender, handles garbled
ACK/NAKs
rdt_send(data)
sndpkt = make_pkt(0, data, checksum)
udt_send(sndpkt) rdt_rcv(rcvpkt) &&
( corrupt(rcvpkt) ||
Wait for Wait for
ACK or
isNAK(rcvpkt) )
call 0 from
NAK 0 udt_send(sndpkt)
above
rdt_rcv(rcvpkt)
&& notcorrupt(rcvpkt) rdt_rcv(rcvpkt)
&& isACK(rcvpkt) && notcorrupt(rcvpkt)
&& isACK(rcvpkt)


Wait for Wait for
ACK or call 1 from
rdt_rcv(rcvpkt) && NAK 1 above
( corrupt(rcvpkt) ||
isNAK(rcvpkt) ) rdt_send(data)

udt_send(sndpkt) sndpkt = make_pkt(1, data, checksum)


udt_send(sndpkt)

Transport Layer 3-30


rdt2.1: receiver, handles garbled
ACK/NAKs
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)
&& has_seq0(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
sndpkt = make_pkt(ACK, chksum)
udt_send(sndpkt)
rdt_rcv(rcvpkt) && rdt_rcv(rcvpkt) &&
(corrupt(rcvpkt) (corrupt(rcvpkt)
sndpkt = make_pkt(NAK, chksum) sndpkt = make_pkt(NAK, chksum)
udt_send(sndpkt) udt_send(sndpkt)
Wait for Wait for
rdt_rcv(rcvpkt) && 0 from 1 from rdt_rcv(rcvpkt) &&
not corrupt(rcvpkt) && below below not corrupt(rcvpkt) &&
has_seq1(rcvpkt) has_seq0(rcvpkt)
sndpkt = make_pkt(ACK, chksum) sndpkt = make_pkt(ACK, chksum)
udt_send(sndpkt) udt_send(sndpkt)
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)
&& has_seq1(rcvpkt)

extract(rcvpkt,data)
deliver_data(data)
sndpkt = make_pkt(ACK, chksum)
udt_send(sndpkt)

Transport Layer 3-31


rdt2.1: discussion
Sender: Receiver:
 seq # added to pkt  must check if
 two seq. #’s (0,1) received packet is
will suffice. Why? duplicate
 must check if
 state indicates
whether 0 or 1 is
received ACK/NAK expected pkt seq #
corrupted  note: receiver can
 twice as many states
not know if its last
 state must ACK/NAK received
“remember” whether
OK at sender
“current” pkt has 0 or
1 seq. #

Transport Layer 3-32


rdt2.2: a NAK-free protocol
 same functionality as rdt2.1, using ACKs only
 instead of NAK, receiver sends ACK for last pkt
received OK
 receiver must explicitly include seq # of pkt being
ACKed
 duplicate ACK at sender results in same action
as NAK: retransmit current pkt

Transport Layer 3-33


rdt2.2: sender, receiver fragments
rdt_send(data)
sndpkt = make_pkt(0, data, checksum)
udt_send(sndpkt)
rdt_rcv(rcvpkt) &&
( corrupt(rcvpkt) ||
Wait for Wait for
ACK isACK(rcvpkt,1) )
call 0 from
above 0 udt_send(sndpkt)
sender FSM
fragment rdt_rcv(rcvpkt)
&& notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) && && isACK(rcvpkt,0)
(corrupt(rcvpkt) || 
has_seq1(rcvpkt)) Wait for receiver FSM
0 from
udt_send(sndpkt) below fragment
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)
&& has_seq1(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
sndpkt = make_pkt(ACK1, chksum)
udt_send(sndpkt) Transport Layer 3-34
rdt3.0: channels with errors and loss

New assumption: Approach: sender waits


underlying channel “reasonable” amount
can also lose packets of time for ACK
(data or ACKs)  retransmits if no ACK
 checksum, seq. #, received in this time
ACKs, retransmissions  if pkt (or ACK) just delayed
will be of help, but not (not lost):
enough  retransmission will be
duplicate, but use of seq.
#’s already handles this
 receiver must specify
seq # of pkt being
ACKed
 requires countdown timer
Transport Layer 3-35
rdt3.0 sender
rdt_send(data)
rdt_rcv(rcvpkt) &&
sndpkt = make_pkt(0, data, checksum) ( corrupt(rcvpkt) ||
udt_send(sndpkt) isACK(rcvpkt,1) )
rdt_rcv(rcvpkt) start_timer 
 Wait for Wait
for timeout
call 0from
ACK0 udt_send(sndpkt)
above
start_timer
rdt_rcv(rcvpkt)
&& notcorrupt(rcvpkt) rdt_rcv(rcvpkt)
&& isACK(rcvpkt,1) && notcorrupt(rcvpkt)
stop_timer && isACK(rcvpkt,0)
stop_timer
Wait Wait for
timeout for call 1 from
udt_send(sndpkt) ACK1 above
start_timer rdt_rcv(rcvpkt)
rdt_send(data) 
rdt_rcv(rcvpkt) &&
sndpkt = make_pkt(1, data, checksum)
( corrupt(rcvpkt) || udt_send(sndpkt)
isACK(rcvpkt,0) ) start_timer

Transport Layer 3-36


rdt3.0 in action

Transport Layer 3-37


rdt3.0 in action

Transport Layer 3-38


Performance of rdt3.0
 rdt3.0 works, but performance stinks
 example: 1 Gbps link, 15 ms e-e prop. delay, 1KB packet:

Ttransmi = L (packet length in bits) = 8kb/pkt = 8 microsec


R (transmission rate, bps) 10**9 b/sec
t
L/ R .008
U = = = 0.00027
sender 30.008
RTT +L / R microsec
onds
 U : utilization – fraction of time sender busy sending
sender
 1KB pkt every 30 msec -> 33kB/sec thruput over 1 Gbps link
 network protocol limits use of physical resources!

Transport Layer 3-39


rdt3.0: stop-and-wait operation
sender receiver
first packet bit transmitted, t = 0
last packet bit transmitted, t = L / R

first packet bit arrives


RTT last packet bit arrives, send
ACK

ACK arrives, send next


packet, t = RTT + L / R

L/ R .008
U = = = 0.00027
sender 30.008
RTT +L / R microsec
onds

Transport Layer 3-40


Pipelined protocols
Pipelining: sender allows multiple, “in-flight”,
yet-to-be-acknowledged pkts
 range of sequence numbers must be increased
 buffering at sender and/or receiver

 Two generic forms of pipelined protocols: go-


Back-N, selective repeat
Transport Layer 3-41
Pipelining: increased utilization
sender receiver
first packet bit transmitted, t = 0
last bit transmitted, t = L / R

first packet bit arrives


RTT last packet bit arrives, send ACK
last bit of 2nd packet arrives, send ACK
last bit of 3rd packet arrives, send ACK
ACK arrives, send next
packet, t = RTT + L / R

Increase utilization
by a factor of 3!
3*L/ R .024
U = = = 0.0008
sender 30.008
RTT +L / R microsecon
ds
Transport Layer 3-42
Go-Back-N
Sender:
 k-bit seq # in pkt header
 “window” of up to N, consecutive unack’ed pkts allowed

 ACK(n): ACKs all pkts up to, including seq # n - “cumulative ACK”


 may deceive duplicate ACKs (see receiver)
 timer for each in-flight pkt
 timeout(n): retransmit pkt n and all higher seq # pkts in window

Transport Layer 3-43


GBN: sender extended FSM
rdt_send(data)
if (nextseqnum < base+N) {
sndpkt[nextseqnum] = make_pkt(nextseqnum,data,chksum)
udt_send(sndpkt[nextseqnum])
if (base == nextseqnum)
start_timer
nextseqnum++
}
 else
refuse_data(data)
base=1
nextseqnum=1
timeout
start_timer
Wait
udt_send(sndpkt[base])
rdt_rcv(rcvpkt) udt_send(sndpkt[base+1])
&& corrupt(rcvpkt) …
udt_send(sndpkt[nextseqnum-
1])
rdt_rcv(rcvpkt) &&
notcorrupt(rcvpkt)
base = getacknum(rcvpkt)+1
If (base == nextseqnum)
stop_timer
else
start_timer Transport Layer 3-44
GBN: receiver extended FSM
default
udt_send(sndpkt) rdt_rcv(rcvpkt)
&& notcurrupt(rcvpkt)
 && hasseqnum(rcvpkt,expectedseqnum)
expectedseqnum=1 Wait extract(rcvpkt,data)
sndpkt = deliver_data(data)
make_pkt(expectedseqnum,ACK,chksum) sndpkt = make_pkt(expectedseqnum,ACK,chksum)
udt_send(sndpkt)
expectedseqnum++

ACK-only: always send ACK for correctly-received


pkt with highest in-order seq #
 may generate duplicate ACKs
 need only remember expectedseqnum
 out-of-order pkt:
 discard (don’t buffer) -> no receiver buffering!
 Re-ACK pkt with highest in-order seq #

Transport Layer 3-45


GBN in
action

Transport Layer 3-46


Selective Repeat
 receiver individually acknowledges all
correctly received pkts
 buffers pkts, as needed, for eventual in-order
delivery to upper layer
 sender only resends pkts for which ACK not
received
 sender timer for each unACKed pkt
 sender window
 N consecutive seq #’s
 again limits seq #s of sent, unACKed pkts

Transport Layer 3-47


Selective repeat: sender, receiver
windows

Transport Layer 3-48


Selective repeat
sender receiver
data from above : pkt n in [rcvbase, rcvbase+N-
 if next available seq # in 1]
 send ACK(n)
window, send pkt
 out-of-order: buffer
timeout(n):
 in-order: deliver (also
 resend pkt n, restart
deliver buffered, in-order
timer
pkts), advance window to
ACK(n) in next not-yet-received pkt
[sendbase,sendbase+N]:
 mark pkt n as received
pkt n in [rcvbase-N,rcvbase-1]
 ACK(n)
 if n smallest unACKed
pkt, advance window otherwise:
base to next unACKed  ignore
seq #

Transport Layer 3-49


Selective repeat in action

Transport Layer 3-50


Selective repeat:
dilemma
Example:
 seq #’s: 0, 1, 2, 3
 window size=3

 receiver sees no
difference in two
scenarios!
 incorrectly passes
duplicate data as new
in (a)

Q: what relationship
between seq # size
and window size?
Transport Layer 3-51
Chapter 3 outline
 3.1 Transport-layer  3.5 Connection-
services oriented transport:
 3.2 Multiplexing and TCP
demultiplexing  segment structure
 3.3 Connectionless
 reliable data transfer
 flow control
transport: UDP
 connection
 3.4 Principles of
management
reliable data transfer  3.6 Principles of
congestion control
 3.7 TCP congestion
control
Transport Layer 3-52
TCP: Overview RFCs: 793, 1122, 1323,
2018, 2581

 point-to-point:  full duplex data:


 one sender, one  bi-directional data flow
receiver in same connection
 reliable, in-order byte  MSS: maximum

steam: segment size


 no “message  connection-oriented:
boundaries”  handshaking (exchange
 pipelined: of control msgs) init’s
sender, receiver state
 TCP congestion and flow
before data exchange
control set window size
 flow controlled:
 send
application & receive buffers
w rites data
application
reads data
 sender will not
sock et sock et
door
TCP TCP
door overwhelm receiver
send buffer receive buffer
segm ent

Transport Layer 3-53


TCP segment structure
32 bits
URG: urgent data counting
(generally not used) source port # dest port #
by bytes
sequence number of data
ACK: ACK #
valid acknowledgement (not segments!)
U A Pnumber
head not
PSH: push data now len used
R S F Receive window
(generally not used) # bytes
checksum Urg data pnter
rcvr willing
RST, SYN, FIN: to accept
Options (variable length)
connection estab
(setup, teardown
commands)
application
Internet data
checksum (variable length)
(as in UDP)

Transport Layer 3-54


TCP seq. #’s and ACKs
Seq. #’s:
Host A Host B
 byte stream
“number” of first User Seq=4
2, A C
byte in segment’s types K=79,
da t a =
‘C’ ‘C’
data host ACKs
ACKs: receipt of

C ‘C’, echoes
 seq # of next byte , d a ta = ‘
3
9 , A CK=4 back ‘C’
expected from e q =7
S
other side
 cumulative ACK host ACKs
receipt Seq=4
Q: how receiver handles of echoed 3, ACK
=80
out-of-order segments ‘C’
 A: TCP spec doesn’t
say, - up to
implementor time
simple telnet scenario

Transport Layer 3-55


TCP Round Trip Time and
Timeout
Q: how to set TCP Q: how to estimate RTT?
timeout value?  SampleRTT: measured time
 longer than RTT from segment transmission
 but RTT varies
until ACK receipt
 ignore retransmissions
 too short: premature
timeout  SampleRTT will vary, want
 unnecessary estimated RTT “smoother”
 average several recent
retransmissions
 too long: slow measurements, not just
reaction to segment current SampleRTT
loss

Transport Layer 3-56


TCP Round Trip Time and
Timeout
EstimatedRTT = (1- )*EstimatedRTT + *SampleRTT

 Exponential weighted moving average


 influence of past sample decreases exponentially fast
 typical value:  = 0.125

Transport Layer 3-57


Example RTT estimation:
RTT: gaia.cs.umass.edu to fantasia.eurecom.fr

350

300

250
RTT (milliseconds)

200

150

100
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)

SampleRTT Estimated RTT

Transport Layer 3-58


TCP Round Trip Time and
Timeout
Setting the timeout
 EstimtedRTT plus “safety margin”
 large variation in EstimatedRTT -> larger safety margin
 first estimate of how much SampleRTT deviates from EstimatedRTT:

DevRTT = (1-)*DevRTT +
*|SampleRTT-EstimatedRTT|

(typically,  = 0.25)

Then set timeout interval:

TimeoutInterval = EstimatedRTT + 4*DevRTT

Transport Layer 3-59


Chapter 3 outline
 3.1 Transport-layer  3.5 Connection-
services oriented transport:
 3.2 Multiplexing and TCP
demultiplexing  segment structure
 3.3 Connectionless
 reliable data transfer
 flow control
transport: UDP
 connection
 3.4 Principles of
management
reliable data transfer  3.6 Principles of
congestion control
 3.7 TCP congestion
control
Transport Layer 3-60
TCP reliable data transfer
 TCP creates rdt  Retransmissions are
service on top of IP’s triggered by:
unreliable service  timeout events
 Pipelined segments  duplicate acks
 Cumulative acks  Initially consider
 TCP uses single simplified TCP
retransmission timer sender:
 ignore duplicate acks
 ignore flow control,
congestion control

Transport Layer 3-61


TCP sender events:
data rcvd from app: timeout:
 Create segment with  retransmit segment
seq # that caused timeout
 seq # is byte-stream  restart timer
number of first data Ack rcvd:
byte in segment  If acknowledges
 start timer if not
previously unacked
already running (think segments
of timer as for oldest  update what is known
unacked segment) to be acked
 expiration interval:  start timer if there are
TimeOutInterval outstanding segments

Transport Layer 3-62


NextSeqNum = InitialSeqNum
SendBase = InitialSeqNum

loop (forever) {
TCP
switch(event)
sender
event: data received from application above
create TCP segment with sequence number NextSeqNum
(simplified
if (timer currently not running) )
start timer
pass segment to IP
Comment:
NextSeqNum = NextSeqNum + length(data)
• SendBase-1: last
event: timer timeout cumulatively
retransmit not-yet-acknowledged segment with ack’ed byte
smallest sequence number Example:
start timer • SendBase-1 = 71;
y= 73, so the rcvr
event: ACK received, with ACK field value of y wants 73+ ;
if (y > SendBase) {
y > SendBase, so
SendBase = y
if (there are currently not-yet-acknowledged segments)
that new data is
start timer acked
}

} /* end of loop forever */


Transport Layer 3-63
TCP: retransmission scenarios
Host A Host B Host A Host B

Seq=9 Seq=9
2, 8 b 2, 8 b
y t es d y t es d
at a

Seq=92 timeout
at a Seq=
100,
20 b y
t es d
timeout

ata
=100
ACK 0
10
X CK
A AC
=
K =120
loss
Seq=9 Seq=9
2, 8 b 2, 8 b
y t es d Sendbase y t es d
at a
at a

Seq=92 timeout
= 100
SendBase
= 120 =1 20
K
CK =100 AC
A

SendBase
= 100 SendBase
= 120 premature timeout
time time
lost ACK scenario
Transport Layer 3-64
TCP retransmission scenarios
(more)
Host A Host B

Seq=9
2, 8 b
y t es d
at a

=100
timeout

Seq=1 A CK
00 , 2 0
b y t es
dat a
X
loss

SendBase CK =120
A
= 120

time
Cumulative ACK scenario

Transport Layer 3-65


TCP ACK generation [RFC 1122, RFC
2581]

Event at Receiver TCP Receiver action


Arrival of in-order segment with Delayed ACK. Wait up to 500ms
expected seq #. All data up to for next segment. If no next segment,
expected seq # already ACKed send ACK

Arrival of in-order segment with Immediately send single cumulative


expected seq #. One other ACK, ACKing both in-order segments
segment has ACK pending

Arrival of out-of-order segment Immediately send duplicate ACK,


higher-than-expect seq. # . indicating seq. # of next expected byte
Gap detected

Arrival of segment that Immediate send ACK, provided that


partially or completely fills gap segment startsat lower end of gap

Transport Layer 3-66


Fast Retransmit
 Time-out period  If sender receives 3
often relatively long: ACKs for the same
 long delay before data, it supposes that
resending lost packet segment after ACKed
 Detect lost segments data was lost:
via duplicate ACKs.  fast retransmit: resend
 Sender often sends segment before timer
many segments back- expires
to-back
 If segment is lost,
there will likely be
many duplicate ACKs.

Transport Layer 3-67


Fast retransmit algorithm:

event: ACK received, with ACK field value of y


if (y > SendBase) {
SendBase = y
if (there are currently not-yet-acknowledged segments)
start timer
}
else {
increment count of dup ACKs received for y
if (count of dup ACKs received for y = 3) {
resend segment with sequence number y
}

a duplicate ACK for fast retransmit


already ACKed segment

Transport Layer 3-68


Chapter 3 outline
 3.1 Transport-layer  3.5 Connection-
services oriented transport:
 3.2 Multiplexing and TCP
demultiplexing  segment structure
 3.3 Connectionless
 reliable data transfer
 flow control
transport: UDP
 connection
 3.4 Principles of
management
reliable data transfer  3.6 Principles of
congestion control
 3.7 TCP congestion
control
Transport Layer 3-69
TCP Flow Control
flow control
sender won’t
 receive side of TCP
overflow
connection has a receiver’s buffer by
receive buffer: transmitting too
much,
too fast
 speed-matching
service: matching
the send rate to the
receiving app’s drain
rate
 app process may be
slow at reading from
buffer
Transport Layer 3-70
TCP Flow control: how it
works
 Rcvr advertises spare
room by including
value of RcvWindow in
segments
 Sender limits
(Suppose TCP receiver unACKed data to
discards out-of-order RcvWindow
segments)  guarantees receive
 spare room in buffer buffer doesn’t overflow
= RcvWindow
= RcvBuffer-[LastByteRcvd -
LastByteRead]

Transport Layer 3-71


Chapter 3 outline
 3.1 Transport-layer  3.5 Connection-
services oriented transport:
 3.2 Multiplexing and TCP
demultiplexing  segment structure
 3.3 Connectionless
 reliable data transfer
 flow control
transport: UDP
 connection
 3.4 Principles of
management
reliable data transfer  3.6 Principles of
congestion control
 3.7 TCP congestion
control
Transport Layer 3-72
TCP Connection Management
Recall: TCP sender, receiver Three way handshake:
establish “connection”
before exchanging data Step 1: client host sends TCP
segments SYN segment to server
 initialize TCP variables:  specifies initial seq #
 seq. #s  no data
 buffers, flow control info
Step 2: server host receives
(e.g. RcvWindow) SYN, replies with SYNACK
 client: connection initiator segment
Socket clientSocket = new
Socket("hostname","port
 server allocates buffers
number");  specifies server initial

 server: contacted by client seq. #


Socket connectionSocket = Step 3: client receives SYNACK,
welcomeSocket.accept(); replies with ACK segment,
which may contain data

Transport Layer 3-73


TCP Connection Management (cont.)

Closing a connection: client server

client closes socket: close


FIN
clientSocket.close();

Step 1: client end system


ACK
sends TCP FIN control close
segment to server FIN

Step 2: server receives FIN,

timed wait
ACK
replies with ACK. Closes
connection, sends FIN.

closed

Transport Layer 3-74


TCP Connection Management (cont.)

Step 3: client receives FIN, client server


replies with ACK.
closing
FIN
 Enters “timed wait” - will
respond with ACK to
received FINs
ACK
closing
Step 4: server, receives ACK. FIN
Connection closed.

Note: with small


timed wait
ACK
modification, can handle
simultaneous FINs. closed

closed

Transport Layer 3-75


TCP Connection Management
(cont)

TCP server
lifecycle

TCP client
lifecycle

Transport Layer 3-76


Chapter 3 outline
 3.1 Transport-layer  3.5 Connection-
services oriented transport:
 3.2 Multiplexing and TCP
demultiplexing  segment structure
 3.3 Connectionless
 reliable data transfer
 flow control
transport: UDP
 connection
 3.4 Principles of
management
reliable data transfer  3.6 Principles of
congestion control
 3.7 TCP congestion
control
Transport Layer 3-77
Principles of Congestion Control

Congestion:
 informally: “too many sources sending too
much data too fast for network to handle”
 different from flow control!
 manifestations:
 lost packets (buffer overflow at routers)
 long delays (queueing in router buffers)
 a top-10 problem!

Transport Layer 3-78


Causes/costs of congestion: scenario
1
Host A out
in : original data
 two senders, two
receivers
Host B unlimited shared
 one router, output link buffers

infinite buffers
 no
retransmission

 large delays
when congested
 maximum
achievable
throughput

Transport Layer 3-79


Causes/costs of congestion: scenario
2
 one router, finite buffers
 sender retransmission of lost packet

Host A in : original data out

'in : original data, plus


retransmitted data

Host B finite shared output


link buffers

Transport Layer 3-80


Causes/costs of congestion: scenario
2
 always: = 
 (goodput)
in out
 > out
 “perfect” retransmission only when loss:
in

 retransmission of delayed (not lost) packet makes
in
larger (than perfect case) forsame
out
R/2 R/2 R/2

R/3
out

out

out
R/4

R/2 R/2 R/2


in in in

a. b. c.
“costs” of congestion:
 more work (retrans) for given “goodput”
 unneeded retransmissions: link carries multiple copies of pkt

Transport Layer 3-81


Causes/costs of congestion: scenario
3
 four senders
Q: what happens as
 multihop paths in
and increase ?
 timeout/retransmit in
Host A out
in : original data
'in : original data, plus
retransmitted data
finite shared output
link buffers

Host B

Transport Layer 3-82


Causes/costs of congestion: scenario
3
H 
o
s o
t
u
A
t

H
o
s
t
B

Another “cost” of congestion:


 when packet dropped, any “upstream transmission capacity
used for that packet was wasted!

Transport Layer 3-83


Approaches towards congestion
control
Two broad approaches towards congestion control:

End-end congestion Network-assisted


control: congestion control:
 no explicit feedback from  routers provide feedback
network to end systems
 congestion inferred from  single bit indicating
end-system observed congestion (SNA,
loss, delay DECbit, TCP/IP ECN,
 approach taken by TCP ATM)
 explicit rate sender
should send at

Transport Layer 3-84


Case study: ATM ABR congestion
control
ABR: available bit RM (resource
rate: management) cells:
 “elastic service”  sent by sender, interspersed
 if sender’s path with data cells
 bits in RM cell set by switches
“underloaded”:
 sender should use (“network-assisted”)
 NI bit: no increase in rate
available bandwidth
 if sender’s path
(mild congestion)
 CI bit: congestion
congested:
indication
 sender throttled to
 RM cells returned to sender
minimum
by receiver, with bits intact
guaranteed rate

Transport Layer 3-85


Case study: ATM ABR congestion
control

 two-byte ER (explicit rate) field in RM cell


 congested switch may lower ER value in cell
 sender’ send rate thus minimum supportable rate on
path
 EFCI bit in data cells: set to 1 in congested switch
 if data cell preceding RM cell has EFCI set, sender sets CI
bit in returned RM cell
Transport Layer 3-86
Chapter 3 outline
 3.1 Transport-layer  3.5 Connection-
services oriented transport:
 3.2 Multiplexing and TCP
demultiplexing  segment structure
 3.3 Connectionless
 reliable data transfer
 flow control
transport: UDP
 connection
 3.4 Principles of
management
reliable data transfer  3.6 Principles of
congestion control
 3.7 TCP congestion
control
Transport Layer 3-87
TCP Congestion Control
 end-end control (no network How does sender
assistance) perceive congestion?
 sender limits transmission:
 loss event = timeout
LastByteSent-LastByteAcked
or 3 duplicate acks
 CongWin
 Roughly,
 TCP sender reduces
rate (CongWin) after
CongWin loss event
rate = Bytes/sec
RTT
 CongWin is dynamic, function of three mechanisms:
perceived network congestion  AIMD
 slow start
 conservative after
timeout events
Transport Layer 3-88
TCP AIMD
multiplicative decrease: additive increase: increase
cut CongWin in half CongWin by 1 MSS every
after loss event RTT in the absence of
loss events: probing
congestion
window

24 Kbytes

16 Kbytes

8 Kbytes

time

Long-lived TCP connection


Transport Layer 3-89
TCP Slow Start
 When connection
 When connection
begins, increase rate
begins, CongWin = 1
exponentially fast until
MSS
first loss event
 Example: MSS = 500
bytes & RTT = 200
msec
 initial rate = 20 kbps
 available bandwidth
may be >> MSS/RTT
 desirable to quickly
ramp up to respectable
rate

Transport Layer 3-90


TCP Slow Start (more)
 When connection Host A Host B
begins, increase rate
one s e gm
exponentially until ent

RTT
first loss event:
two segm
 double CongWin every en ts
RTT
 done by incrementing
four segm
CongWin for every ents
ACK received
 Summary: initial rate
is slow but ramps up
exponentially fast time

Transport Layer 3-91


Refinement
Philosophy:
 After 3 dup ACKs:
 CongWin is cut in half • 3 dup ACKs indicates
 window then grows linearly
network capable of
 But after timeout event:
 CongWin instead set to 1 MSS;
delivering some
 window then grows exponentially segments
 to a threshold, then grows linearly • timeout before 3 dup
ACKs is “more
alarming”

Transport Layer 3-92


Refinement (more)
Q: When should the
exponential increase
switch to linear?
A: When CongWin gets
to 1/2 of its value
before timeout.

Implementation:
 Variable Threshold
 At loss event, Threshold
is set to 1/2 of CongWin
just before loss event

Transport Layer 3-93


Summary: TCP Congestion Control
 When CongWin is below Threshold, sender in
slow-start phase, window grows exponentially.
 When CongWin is above Threshold, sender is
in congestion-avoidance phase, window grows
linearly.
 When a triple duplicate ACK occurs, Threshold
set to CongWin/2 and CongWin set to
Threshold.
 When timeout occurs, Threshold set to
CongWin/2 and CongWin is set to 1 MSS.
Transport Layer 3-94
TCP sender congestion
control
State Event TCP Sender Action Commentary
Slow Start ACK receipt CongWin = CongWin + MSS, Resulting in a doubling of
(SS) for previously If (CongWin > Threshold) CongWin every RTT
unacked set state to “Congestion
data Avoidance”
Congestion ACK receipt CongWin = CongWin+MSS * Additive increase, resulting
Avoidance for previously (MSS/CongWin) in increase of CongWin by
(CA) unacked 1 MSS every RTT
data
SS or CA Loss event Threshold = CongWin/2, Fast recovery,
detected by CongWin = Threshold, implementing multiplicative
triple Set state to “Congestion decrease. CongWin will not
duplicate Avoidance” drop below 1 MSS.
ACK
SS or CA Timeout Threshold = CongWin/2, Enter slow start
CongWin = 1 MSS,
Set state to “Slow Start”
SS or CA Duplicate Increment duplicate ACK count CongWin and Threshold
ACK for segment being acked not changed

Transport Layer 3-95


TCP throughput
 What’s the average throughout of TCP
as a function of window size and RTT?
 Ignore slow start
 Let W be the window size when loss
occurs.
 When window is W, throughput is W/RTT
 Just after loss, window drops to W/2,
throughput to W/2RTT.
 Average throughout: .75 W/RTT

Transport Layer 3-96


TCP Futures
 Example: 1500 byte segments, 100ms RTT,
want 10 Gbps throughput
 Requires window size W = 83,333 in-flight
segments
 Throughput in terms of loss rate:

1.22 MSS
RTT L
 ➜ L = 2·10-10 Wow
 New versions of TCP for high-speed needed!

Transport Layer 3-97


TCP Fairness
Fairness goal: if K TCP sessions share same
bottleneck link of bandwidth R, each should
have average rate of R/K

TCP connection 1

bottleneck
TCP
router
connection 2
capacity R

Transport Layer 3-98


Why is TCP fair?
Two competing sessions:
 Additive increase gives slope of 1, as throughout increases
 multiplicative decrease decreases throughput proportionally

R equal bandwidth share


Connection 2 throughput

loss: decrease window by factor of 2


congestion avoidance: additive increase
loss: decrease window by factor of 2
congestion avoidance: additive increase

Connection 1 throughput R

Transport Layer 3-99


Fairness (more)
Fairness and UDP Fairness and parallel TCP
 Multimedia apps connections
 nothing prevents app
often do not use TCP
 do not want rate from opening parallel
throttled by cnctions between 2
congestion control hosts.
 Instead use UDP:  Web browsers do this
 pump audio/video at  Example: link of rate R
constant rate, tolerate supporting 9 cnctions;
packet loss  new app asks for 1 TCP,
 Research area: TCP gets rate R/10
friendly  new app asks for 11 TCPs,
gets R/2 !

Transport Layer 3-100


Delay modeling
Notation, assumptions:
Q: How long does it take  Assume one link between

to receive an object client and server of rate R


 S: MSS (bits)
from a Web server after  O: object size (bits)
sending a request?  no retransmissions (no loss, no

Ignoring congestion, delay corruption)

is influenced by: Window size:


 First assume: fixed congestion
 TCP connection
window, W segments
establishment  Then dynamic window,
 data transmission delay modeling slow start
 slow start

Transport Layer 3-101


Fixed congestion window (1)

First case:
WS/R > RTT + S/R: ACK for
first segment in window
returns before window’s
worth of data sent

delay = 2RTT + O/R

Transport Layer 3-102


Fixed congestion window (2)

Second case:
 WS/R < RTT + S/R: wait
for ACK after sending
window’s worth of data
sent

delay = 2RTT + O/R


+ (K-1)[S/R + RTT - WS/R]

Transport Layer 3-103


TCP Delay Modeling: Slow Start (1)
Now suppose window grows according to slow
start

Will show that the delay


O for
 one Sobject
 P
is: S
Latency 2 RTT   P  RTT    ( 2  1)
R  R R

where P is the number of times TCP idles at server:

P min{Q, K  1}

- where Q is the number of times the server idles


if the object were of infinite size.

- and K is the number of windows that cover the object.

Transport Layer 3-104


TCP Delay Modeling: Slow Start (2)
Delay components: initiate TC P
connection
• 2 RTT for
connection estab and request
object
request first w indow
= S /R
• O/R to transmit
object R TT
second w indow
• time server idles = 2S /R

due to slow start


third w indow
= 4S /R
Server idles:
P = min{K-1,Q}
times fourth w indow
Example: = 8S /R
• O/S = 15
segments
• K = 4 windows
•Q=2 object
com plete
transm ission
• P = min{K-1,Q} = delivered
tim e at
2 tim e at server
client

Server idles P=2 Transport Layer 3-105


TCP Delay Modeling (3)
S
 RTT time from when server starts to send segment
R
until server receives acknowledgement
initiate TC P
connection
S
2k  1 time to transmit the kth window request
R object
first w indow
= S /R

S k1 S 
R TT

 R  RTT  2 idle time after the kth window second w indow

R 
= 2S /R

third w indow
= 4S /R

P
O
delay   2 RTT   idleTime p fourth w indow
= 8S /R
R p 1
P
O S S
  2 RTT   [  RTT  2 k  1 ]
R k 1 R R object
com plete
transm ission
delivered
O S S
  2 RTT  P[ RTT  ]  (2 P  1) tim e at
R R R tim e at
client
server

Transport Layer 3-106


TCP Delay Modeling (4)
Recall K = number of windows that cover object

How do we calculate K ?

K min{k : 20 S  21 S    2 k  1 S O}
min{k : 20  21    2 k  1 O / S }
k O
min{k : 2  1  }
S
O
min{k : k log 2 (  1)}
S
 O 
 log 2 (  1)
 S 

Calculation of Q, number of idles for infinite-size object,


is similar (see HW).

Transport Layer 3-107


HTTP Modeling
 Assume Web page consists of:
1 base HTML page (of size O bits)
 M images (each of size O bits)
 Non-persistent HTTP:
 M+1 TCP connections in series
 Response time = (M+1)O/R + (M+1)2RTT + sum of idle
times
 Persistent HTTP:
 2 RTT to request and receive base HTML file
 1 RTT to request and receive M images
 Response time = (M+1)O/R + 3RTT + sum of idle times
 Non-persistent HTTP with X parallel connections
 Suppose M/X integer.
 1 TCP connection for base file
 M/X sets of parallel connections for images.
 Response time = (M+1)O/R + (M/X + 1)2RTT + sum of
idle times Transport Layer 3-108
HTTP Response time (in seconds)
RTT = 100 msec, O = 5 Kbytes, M=10 and
X=5 20
18
16
14
non-persistent
12
10
persistent
8
6
4 parallel non-
persistent
2
0
28 100 1 10
Kbps Kbps Mbps Mbps
For low bandwidth, connection & response time dominated by
transmission time.
Persistent connections only give minor improvement over parallel
connections.
Transport Layer 3-109
HTTP Response time (in seconds)
RTT =1 sec, O = 5 Kbytes, M=10 and
X=5
70
60
50
non-persistent
40
30 persistent
20
parallel non-
10 persistent
0
28 100 1 10
Kbps Kbps Mbps Mbps
For larger RTT, response time dominated by TCP establishment
& slow start delays. Persistent connections now give important
improvement: particularly in high delaybandwidth networks.
Transport Layer 3-110
Chapter 3: Summary
 principles behind transport
layer services:
 multiplexing,
demultiplexing
 reliable data transfer
 flow control
 congestion control
Next:
 instantiation and  leaving the network
implementation in the Internet “edge” (application,
 UDP
transport layers)
 TCP
 into the network
“core”

Transport Layer 3-111

You might also like