Survivable Network Design
Survivable Network Design
David Tipper
Associate Professor
Department of Information Science and
Telecommunications
University of Pittsburgh
Telcom 2110 Slides 15
1
Classification of
Survivability Techniques
Working path 6
12
10 5
Failure Dependent backup 4
2
SCA Problem
• SCA for Failure Independent Shared Backup Path
Restoration
• Notation
r = 1,2,…, D set of demands (source-destination pairs)
p = 1,2,…, Pr set of possible paths for demand pair r
l = 1,2,…, L set of network links
• Input parameters (constants)
αr offered traffic load of demand pair r
cl unit cost of capacity on link l
δl r,p = 1 if l belongs to path p realizing demand r
= 0, otherwise
f set of link failure scenarios
• variables
x r,p flow of demand r on path p
sl spare capacity on link l
s.t. ∑x r, p
= 1, ∀r ∈ D
p∈P r
Enough spare capacity on each link
∑α ∑ δ
r∈D f
r
l
r, p
⋅ x r , p ≤ sl , ∀l ∈ L − { f }, ∀f , f ∈ L
p∈P r
3
Matrix Based Formulation of SCA
Example
From working and
Link i 1 2 3 4 5 6 7
backup paths, G= QT P Backup path link
incident matrix
s G QT
From G, 1 2 0 2 1 1 1 0 1 0 0 1 1 0 1 1 0 0 0
2 2 2 0 2 1 1 0 0 1 1 0 0 0 1 1 0 0 0
s=maxG 3 1 0 1 0 0 0 1 1 0 0 1 0 0 0 0 0 1 0
4 1 1 1 1 0 0 1 0 1 0 0 1 1 0 0 0 1 0
5 2 1 1 1 0 0 0 2 0 1 1 0 0 0 0 0 0 1
6 1 0 0 1 0 1 0 1 0 0 0 0 1 0 0 1 0 1
Working path 7 2 1 0 2 0 2 0 0 0 1 0 0 0 1 0 1 0 0
Backup path 11
Flows 1 2 3 4 5 6 7 8 9 10
src dst
1 0 0 0 0 0 0 1 a b
1 0 1 0 0 0 0 2 a c
b 3 c P 0 1 0 0 0 0 1 3 a d
1 0 1 0 0 0 0 0 4 a e
Working path link 0 0 1 0 0 0 0 5 b c
a 4 6 5 incident matrix 0 0 1 0 1 0 0 6 b d An example:
2
0 0 0 1 0 0 0 7 b e when link 2 fails,
0 0 0 0 1 0 0 8 c d
0 0 0 0 0 1 0 9 c e how much capacity is
e 7 d
0 0 0 0 0 0 1 10 d e need on link 1 ?
4
Matrix Based SCA for Link Failures
min S = eT s Total spare capacity
Q,s
s.t. s≥G Enough spare capacity on each link
G = QT M P Calculation of spare provision matrix
P+Q≤1 Link-disjointed backup paths
Q BT = D (mod 2) Flow conservation of backup
Q is a binary matrix Integer programming
Decision variable: Q, s
Given: M – traffic demand matrix
P – working path link incidence matrix
B and D – node-link & flow-node incidence matrices
Links G
Failures
G1
Q
G2
…
Flows
P GR-1
GR
5
Find spare capacity s
Link i 1 2 3 4 5 6 7
P, get G 1
s =max(G)
2 0 2
G=QT P
1 1 1 0 1 0 0 1 1 0 1 1 0 0 0
2 2 2 0 2 1 1 0 0 1 1 0 0 0 1 1 0 0 0
3 1 0 1 0 0 0 1 1 0 0 1 0 0 0 0 0 1 0
From G, get s 4
5
1
2
1
1
1
1
1 0
1 0
0
0
1
0
0
2
1
0
0
1
0
1
1
0
1
0
0
0
0
0
0
0
1
0
0
1
6 1 0 0 1 0 1 0 1 0 0 0 0 1 0 0 1 0 1
7 2 1 0 2 0 2 0 0 0 1 0 0 0 1 0 1 0 0
b 3 c Total spare 11
Flows 1 2 3 4 5 6 7 8 9 10
1 src dst
1 0 0 0 0 0 0 1 a b
a 4 6 5 1 0 1 0 0 0 0 2 a c r=2
0 1 0 0 0 0 1 3 a d G2
2 0 1 0 0 0 0 0 4 a e 0 0 0 0 0 0 0
Working path and link 0 0 1 0 0 0 0 5 b c 1 0 1 0 0 0 0
e 7 d incident matrix 0 0 1 0 1 0 0 6 b d 0 0 0 0 0 0 0
P 0 0 0 1 0 0 0 7 b e 0 0 0 0 0 0 0
0 0 0 0 1 0 0 8 c d 1 0 1 0 0 0 0
G2=qrT pr
Approximation algorithm
• Decomposition
– multi-commodity flow Æ multiple single flows
• Using shortest path algorithm for each flow to
– route link-disjointed backup paths
– using spare provision matrix G to calculate
link cost – incremental spare reservation vr ;
• Flows successively update their backup paths
Æ termed: successive survivable routing (SSR)
6
Link cost and local objective
Link i 1 2 3 4 5 6 7
Total working Spare path and link
w i =sum P(*,i ) 2 2 3 1 2 1 2 13 incident matrix Assume backup path are
QT using all possible links
s=max(G) G=QT.P qr+ = (e-pr)
1 2 0 2 1 1 1 0 1 0 0 1 1 0 1 1 0 0 0 0
2 2 2 0 2 1 1 0 0 1 1 0 0 0 1 1 0 0 0 1
3 1 0 1 0 0 0 1 1 0 0 1 0 0 0 0 0 1 0 1 Find the contribution
4 1 1 1 1 0 0 1 0 1 0 0 1 1 0 0 0 1 0 1
1
G+ = (e-pr)T pr
5 2 1 1 1 0 0 0 2 0 1 1 0 0 0 0 0 0 1
6 1 0 0 1 0 1 0 1 0 0 0 0 1 0 0 1 0 1 1
7 2 1 0 2 0 2 0 0 0 1 0 0 0 1 0 1 0 0 1 Find vr=s+ – s,
Total spare 11 where s+ = max(G+G+)
Flows 1 2 3 4 5 6 7 8 9 10 11
src dst m
1 0 0 0 0 0 0 1 a b 1 Find qr, using shortest path
1 0 1 0 0 0 0 2 a c 1 r = 11 routing with link metric vr
0 1 0 0 0 0 1 3 a d 1 G+ v11 q11
0 1 0 0 0 0 0 4 a e 1 0 0 0 0 0 0 0 ∞ 0 ∞ b 0 3 c
Working path and link 0 0 1 0 0 0 0 5 b c 1 1 0 0 0 0 0 0 1 1
1
incident matrix 0 0 1 0 1 0 0 6 b d 1 1 0 0 0 0 0 0 0 1 1 0 0
P 0 0 0 1 0 0 0 7 b e 1 1 0 0 0 0 0 0 1 0 a 4 6 5
0 0 0 0 1 0 0 8 c d 1 1 0 0 0 0 0 0 0 0 1
0 0 0 0 0 1 0 9 c e 1 1 0 0 0 0 0 0 0 1 2
0 0 0 0 0 0 1 10 d e 1 1 0 0 0 0 0 0 0 0 0
0 0 0 1 0 0 0 12 b e 1 e 7 d
1 0 0 0 0 0 0 11 a b 1
7
SSR flowchart of flow r
Complexity
8
Numerical comparison
Experiment networks
1 2 3 4
1 2 13 9 7
1 10 9 7
13
17 4 2 3
4 5 2 12 11 9 6
4 2 3
6 12
3 6 12
10 4 5
8
7 8 3 10 4 5
16
6 11 8
4 7 14
9 10 11 8
5
15
5 6 7 8 1 2 4 9 14
13 17 26
8 46
3 13
8
2 3 4 5 6 7 8
6 5
1
2
3 12 16
1 21 20 47
18 45
17 16 11 16 7 38
5 13 20
9
22 21
6 25
5 23 15 48
10 8
14 15 17 22
9 1 24
14 18
39
15 18 19 23 10 49 50
19 19 27
3 17 9
2 12 6 24 44
7 12 11
4 25 11 20 23 28 40
4 16
43
18 7 10 26
15 14 13 12 11 10 22 29 41
32 42
36
30
21 33
34
35
31 37
9
Redundancy versus Time
on Network 3
75
• SSR, SR, have 64 70 Fast response
Network 3
Worse solutions,
random cases with fast
65
different flow orders
60 RAFT
Redundancy (%)
• Range of solutions Near optimal
55 SPI
solutions, fast
• Time is the sum of
50
time to compute all Better solutions,
45
slow, not scalable
64 cases
40 SR SSR SA
LP BB
35
Infeasible
30
-2 0 2 4
10 10 10 10
Time (second)
100
LP BB SA SSR SR SPI RAFT
90
80
70
60
Redundancy
50
40
30
20
10
0
1 2 3 4 5 6 7 8
Networks
10
Multilayer Networks
• Backbone networks have multiple technology layers
• Converging toward IP/MPLS/WDM
• Typically have survivability at each layer
• Multiple Layers present several survivability challenges
• Coordination of recovery actions at different layers
– Which layer is responsible for fault recovery?
• Spare Capacity Allocation (SCA)
– How to prevent over allocation, when each layer provides spare resources?
• Failure Propagation
– Lower layer failure can affect multiple higher layer links!
3
1 MPLS connections
5 WDM Physical Path
2 3
4 5
11
SCA model for arbitrary
failures
min S = eT s Total spare capacity
Q,s
s.t. s≥G Enough spare capacity on each link
G=Q MU T
Calculation of spare provision matrix
T+Q≤1 Failure-disjointed backup paths
Q BT = D (mod 2)
Flow conservation of backup
Q is a binary matrix
U = P ☼ FT Integer programming
T=U☼F Path failure incident matrix
Decision variable: Q, s
Path tabu-link incident matrix
Given: M – traffic demand matrix
P – working path link incidence matrix
B and D – node-link & flow-node incidence matrices
Solve SCA model using Branch and Bound algorithm – NP hard
• *Apparatus and Method for Spare Capacity Allocation, Y. Liu and D. Tipper
, U.S. Patent 6,744,727 B2, June 1, 2004
• Presented in part (IEEE Infocom 2001, IEEE Trans. On Networking Feb.,2005)
12
Why Optical Layer Protection?
• Backbone networks have multiple technology layers
• Converging toward IP/MPLS/WDM
• Typically have survivability at each layer
• Optical layer provides lightpath services to its client layers
(e.g., SONET, IP, ATM)
• Protection mechanisms exist in the client layers, so why need
protection in optical layer?
– IP and ATM networks don’t have extensive protection functions as
SONET
– Capacity efficient due to protection capacity sharing across multiple
pairs of client layer equipments
– Significant savings in equipment cost
– Handle fiber cuts more efficiently than the client layers
– Provide an additional degree of resilience (e.g., protect against
multiple failures)
– Can use mesh-based protection schemes that require significantly less
protection capacity than ring-based schemes
13
Where are Optical Networks Used?
• Access networks:
– Fat pipes for transporting traffic between multiple end users and
POPs
• Metro networks:
– Fat pipes for interconnecting multiple access networks, and
providing access to backbone networks
• Long haul backbone networks:
– Fat pipes for transporting aggregates of traffic in the backbone
• Grid networks (e-science):
Traffic
– Fat pipes for transferring large files Aggregation
• Storage networks:
– Fat pipes for transferring large files
Interconnected rings
Evolution of the OXC
OXC and mesh topologies
OADM
Architecture OADM OXC
OADM OADM
OXC
WDM rings OADM OADM OADM
OADM
OADM OADM
WDM point-to-point
SONET/SDH ADM
14
Optical Layer Protection Schemes
• Optical channel (OCh) layer (or path layer) protection
schemes
– Restore one lightpath at a time
– Need demultiplex all wavelengths
• Optical multiplex section (OMS) layer (or line layer)
protection schemes
– Restore the entire group of lightpaths on a link
– Require less equipment
• OLTs and OADMs can provide both OCh and OMS layer
protection in linear or ring configurations
• OXCs can provide OCh layer protection in linear, ring, and
mesh configurations
• Backbone networks: use unprotected WDM point-to-point
systems and rely on OXCs to perform the protection functions
• Metropolitan networks: use WDM line terminals and OADMs
to perform protection functions
15
1:1 OMS Protection
OMS-DPRing
• Dedicated protection ring
• Two fibers operate in opposite directions
• Each node transmits on both directions of the ring
– Different nodes must transmit at different wavelengths
• Normal operation: the ring functions as a bus, with
one pair of amplifiers turned off and all the others
turned on
• When a link fails: an amplifier pair next to the
failed link are turned off and the ones that were
originally inactive are turned on
• Equivalent to a Sonet USHR
16
OMS-SPRing
• Shared protection ring
• Four fibers, analogous to a SONET BLSR/4 (BSHR)
• The two protection fibers do not have attached WDM
equipment
• Use either span switch or ring switch
• Two-fiber version
– Dedicate half the wavelengths on each fiber for protection
purposes
– Make the protection wavelengths on one fiber correspond
to the working wavelengths on the other fiberÎsignals can
be rerouted w/o requiring wavelength conversion
17
OCh-SPRing
OCh-Mesh Protection
18
Path-Based v.s Link-Based
Protection
• Offline protection
– Protection path and wavelengths are reserved at the time
of connection setup
• In path-based scheme, a link-disjoint protection path is reserved
• In link-based scheme, protection paths are reserved around each
link of the working lightpath
– Fast and guaranteed restoration
• Online protection
– Search for protection paths using the spare capacity in
the network upon a failure
– Capacity efficient
– Slow and no guarantee of restoration
19
Dedicated v.s Shared Protection
Classification of OCh-Mesh
Protection Schemes
offline online
20
Internetworking between Layers
– Concern: a fiber cut or link failure can result in a loss of a large volume of data.
– Solution: mesh restoration
21
The Traffic Grooming Problem
22
Traffic Grooming Problems
23