BANANAS: An Evolutionary Framework for Explicit and
∗
Multipath Routing in the Internet
H. Tahilramani Kaur, S. Kalyanaraman, A. Weiss, S. Kanwar, A. Gandhi
ECSE Department,Rensselaer Polytechnic Institute, Troy, NY-12180.
{hema,shivkuma, kanwas}@networks.ecse.rpi.edu,
[email protected]ABSTRACT faces and autonomous systems (both enterprises and ISPs
Today the Internet offers a single path between end-systems of various sizes) are multi-homed [1, 2, 3]. It is interesting
even though it intrinsically has a large multiplicity of paths. to ponder on two questions:
This paper proposes an evolutionary architectural frame- a) Why is path multiplicity a valuable architectural feature?
work “BANANAS” aimed at simplifying the introduction b) Why have we not significantly exploited the intrinsic path
of multipath routing in the Internet. The framework starts multiplicity in the Internet ?
with the observation that a path can be encoded as a short The answer to the first question is that multi-path trans-
hash (“PathID”) of a sequence of globally known identi- mission can be fundamentally more efficient than the cur-
fiers. The PathID therefore has global significance (unlike rent single-path paradigm. Just like packet switching is fun-
MPLS or ATM labels). This property allows multipath ca- damentally more efficient than circuit switching because it
pable nodes to autonomously compute PathIDs in a par- offers the potential to leverage both spatial and temporal
tially upgraded network without requiring an explicit sig- multiplexing gains at a single link (see [4], chapter 1,2), a
naling protocol for path setup. We show that this frame- network offers one more dimension where spatio-temporal
work allows the introduction of sophisticated explicit rout- multiplexing gains may be obtained: different paths. Packet
ing and multipath capabilities within the context of widely switching does not waste unused capacity if user demand is
deployed connectionless routing protocols (e.g. OSPF, IS-IS, available at a single link; similarly, with path multiplicity
BGP) or overlay networks. We establish these characteris- available to end-to-end flows, unused capacity in paths will
tics through the development of PathID encoding and route- not be wasted if user demand is available. Using our pro-
computation schemes. The BANANAS framework also al- posed BANANAS framework, such multiple paths may be
lows considerable flexibility in terms of architectural func- leveraged at different levels in the networking stack: legacy
tion placement and complexity management. To illustrate OSPF or BGP networks, overlay networks, peer-to-peer net-
this feature, we develop an efficient variable-length hashing works (e.g. dynamically instantiated overlays using a peer-
scheme that moves control-plane complexity and state over- to-peer lookup infrastructure to support video-conferencing)
heads to network edges, allowing a very simple interior node and last-mile multi-hop fixed-wireless networks.
design. All the schemes have been evaluated using both siz- The answer to the second question is clearly not the lack of
able SSFNet simulations and Linux/Zebra implementation algorithms and protocols. There have been several proposals
evaluated on Utah’s Emulab testbed facility. for multipath route-computation [5, 6, 7, 8], Internet signal-
ing architectures [9, 10, 11, 12, 13], novel overlay routing
methods [14, 15] and transport-level approaches for multi-
1. INTRODUCTION homed hosts [16, 17]. The fact that these developments have
Today’s Internet routing protocols like OSPF and BGP not triggered widespread deployment suggests that the core
were designed to provide one primary end-to-end service: problem is an architectural one 1 . The Internet lacks an evo-
“best effort reachability.” These protocols realize the “best- lutionary framework that admits incremental deployment
effort” concept by offering a single-path to destination sub- of path multiplicity, while providing sufficient flexibility in
nets. However, the internet topology has an intrinsic multi- terms of architectural function-placement and management
plicity of paths: hosts have multiple potential network inter- of complexity. This paper proposes to fill that void with a
∗The project was supported in part by DARPA Contract framework called “BANANAS” 2 .
F30602-00-2-0537 and grants from Intel Corp. and AT&T At the highest level, BANANAS proposes a simple ex-
Corp. tension of Internet operation to admit and leverage end-
to-end path-multiplicity (PM). In this model, source-hosts
initiate one or more end-to-end “flows” and map flows to
Permission to make digital or hard copies of all or part of this work for local network interfaces. The “network” provides one or
personal or classroom use is granted without fee provided that copies are more end-to-end paths through the independent upgrades
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to 1
republish, to post on servers or to redistribute to lists, requires prior specific Another key problem involves incentives; but incentives de-
permission and/or a fee. pend upon attributes of the underlying architectural frame-
ACM SIGCOMM 2003 Workshops August 25&27, 2003, Karlsruhe, Ger- work.
2
many BANANAS is not an acronymn! It is adapted from the car
Copyright 2003 ACM 1-58113-748-6/03/0008 ...$5.00. racing comedy movie title Herbie goes Bananas
Proceedings of the ACM SIGCOMM 2003 Workshops 277 August 2003
of a subset of network nodes, possibly situated in multiple also building a medium-sized multi-hop 802.11 community
administrative domains. A subset of these upgraded nodes wireless network on which this framework will be deployed.
(e.g. selected edge-nodes) may also map “flows” to avail- We believe that the mere expectation of multiple end-to-
able “paths” 3 . Source-hosts may arbitrarily map “pack- end paths will trigger application innovation in new areas
ets” to “flows.” Observe that today’s single-path model such as end-to-end bandwidth aggregation [17], end-to-end
is a special case of this PM-model. The PM model also resilience and video transmission over multi-paths [14, 15,
allows a subset of source-hosts and routers to be indepen- 23] and end-to-end multi-path based security strategies (e.g.
dently upgraded within the scope of usual administrative protecting data integrity using multipaths).
boundaries. Upgraded node may “see” only a subset of The rest of the paper is organized as follows. Section 2
available paths within appropriate administrative bound- introduces the abstract framework and concepts. Section 3
aries. This high-level model is a best-effort path multiplic- explores the architectural flexibility in BANANAS by con-
ity model, clearly different from IPv4/IPv6 connectionless sidering an alternate index-based PathID encoding. Sec-
loose-source-routing model [18, 19] and from end-to-end sig- tion 4 summarizes the intra-domain routing extensions for
naled source-route models used in ATM networks (e.g. PNNI link-state protocols, OSPF and IS-IS. Section 5 develops the
[20]) or MPLS networks [21]. inter-domain ideas of BANANAS in the context of BGP-4.
BANANAS provides a set of concepts and building blocks Section 6 presents both simulation and linux-based imple-
to realize this high-level PM model. A core abstract idea in mentation results to illustrate the architectural features of
BANANAS is that a path can be efficiently encoded as a BANANAS. Related work is surveyed in Section 7, followed
short hash (called the “PathID”) of a sequence of globally- by summary and concluding remarks in Section 8.
known identifiers (e.g. router IDs, link interface IDs, link
weights, AS numbers etc.). This concept has some very im- 2. THE BANANAS FRAMEWORK
portant advantages. First, a hash-based data-plane encod-
ing is more efficient than IPv4/IPv6’s loose-source-routing 2.1 PathID: Abstract Concept
encoding [18, 19] that is an uncompressed string of IP ad-
dresses. Second, since the PathID is a function of globally- Consider a network modelled as a graph G = (V, E) where
known quantities, it inherits their global significance, i.e., it V is the set of vertices or nodes and E is the set of edges or
can be computed and interpreted within the same scope of links in the network. Let N denote the number of nodes
visibility. This “global” scope may refer to a single rout- in the network, i.e. the cardinality of the set V . Each
ing domain if router/link IDs are involved; or may refer to link (i, j) ∈ E has an identifier associated with it, denoted
the universe of BGP-4 routers if AS numbers are used. The by li,j . Each node i also has an identifier denoted by ni .
global PathID semantics allows any upgraded multipath ca- Consider a path Pi,j from node i to node j, which passes
pable (MPC) node to autonomously compute the PathID through nodes i, 1, 2, ..., m − 1, j. This path can be repre-
without any changes in legacy single-path capable nodes. It sented as a sequence of globally-known node and link iden-
also removes the need for an explicit out-of-band signaling tifiers [ni , li,1 , n1 , l1,2 , n2 , ..., lm−1,j , nj ]. This path sequence
protocol as a path-setup mechanism. Note that one purpose can be compactly represented by a hash of its elements. A
of signaling in ATM and MPLS is to map global IDs (global path identifier (or, in short “PathID”) is defined as a hash
addresses, path specifications) to locally assigned IDs (la- of the above sequence or any non-null subsequence derived
bels). The global PathID semantics allow the mapping of from it. Observe that the IP destination address (j), the un-
BANANAS in an incremental manner to connectionless In- compressed IPv4/v6 loose-source-routes [18, 19], the XOR
ternet routing protocols (e.g. OSPF, BGP-4). of router IDs proposed in LIRA [11], or a hash of the sub-
In addition, the BANANAS framework allows consider- sequence of link weights are all examples of valid PathIDs,
able flexibility in terms of architectural function placement obviously with differing characteristics. Therefore the par-
and complexity management. These intangible aspects are ticular subsequence and PathID encoding function chosen
crucial for tailoring the proposed building blocks and estab- is crucial in determining the utility of the PathID. These
lishing the appropriate incentives for adoption by vendors abstract concepts are illustrated in Figure 1.
and ISPs. For example, the framework allows considerable
flexibility in the choice of multipath route-computation al-
gorithms. It also provides a distributed validation proce-
dure to ensure the validity of computed PathIDs, i.e. to
check if forwarding exists in all downstream routers for the
PathIDs. As another example of architectural flexibility, we
propose an efficient variable-length hash realization of the
abstract framework: this scheme moves control-plane com-
plexity and state overheads to network edges, allowing a very
simple interior node design. The proposed scheme realiza- Figure 1: Path and PathID Concepts
tions are evaluated using integrated OSPF/BGP simulations
in sizable topologies and Linux/Zebra implementation run
A desirable hash is compact, easy to compute and has
on Utah’s Emulab emulation testbed facility.
a low collision probability (i.e. high uniqueness probabil-
We are currently deploying the BANANAS framework on
ity). This demands a hash function that offers low collision
the worldwide PlanetLab infrastructure [22] as an public
probabilities. A simple hash of the path sequence may be
experimental wide-area network overlay service. We are
obtained by using the sum or XOR function (suggested in
3
E.g. Packets from TCP connections would be mapped sin- LIRA [11]). While these are simple and fast, it may lead to
gle “path” to avoid out-of-order packets non-unique PathIDs. Our canonical hash function choice is
Proceedings of the ACM SIGCOMM 2003 Workshops 278 August 2003
a 128-bit MD5 hash followed by a 32-bit CRC of the 128 bit
MD5 hash (resulting in a final 32-bit hash value). We use
the notation (MD5 + CRC32) hash to represent the above
two-step hashing process. Alternatively, 32-bits of the 128-
bit MD5 hash could also have been used. This hash value is
used in conjunction with the destination address (j); leading
to a two-tuple hash: [j, PathID]. For convenience, we refer
to the second tuple value as PathID. The collision proba-
bility, probability that multiple paths lead to same PathID,
depends only on the number of paths to any given destina-
tion prefix, and the nature of the path subsequence on which Figure 2: Multi-Path Forwarding with Partial Up-
the MD5+CRC32 function is applied. Assuming a random grades
bit-string as input and all the 232 outputs to be equally
n!
likely, the probability for collision is given by 1 − nk (n−k)! , A is the originating node for a packet destined to node F.
where, n is the number of possible outcomes (232 ) and k is The shortest path from intermediate node B to node F is
the number of paths to a destination. B-D-F and path A-B-C-F is not available for forwarding
A sequence of well-known link interface IDs, router IDs because node B is a non-upgraded node and the next-hop of
and link weights (in OSPF or IS-IS) on the path can be default shortest path of B is not C. However, paths such as
used to generate the underlying path sequence. However, A-B-D-C-F, A-D-E-F, A-D-C-E-F etc. are available. If the
link-weights are usually non-unique, chosen from a narrow path A-B-D-E-F is chosen, then the PathID of an incoming
range and may be dynamic (to implement traffic engineer- packet will be Hash(A-B-D-E-F). A sets the PathID field to
ing/ adaptive routing), whereas router IDs and link interface Hash(D-E-F), i.e. the hash of the path suffix from the next
IDs are unique identifiers. Our canonical choice is the subse- MPC router to destination. Node B forwards the packet on
quence of all node IDs on the path (generalizes to a sequence its shortest-path (i.e. to D). Node D sets the PathID to
of AS numbers in BGP-4). Section 3 develops an alternative zero, because there is no MPC router on the path to F.
hash function that is a concatenation of well-known link ID
indices at nodes. 2.3 Path and PathID Computation
The BANANAS framework not only supports upgrades
2.2 Packet Forwarding of a subset of nodes, but also allows heterogeneity in mul-
This section describes the forwarding table structure and tipath computation algorithms used at different upgraded
forwarding algorithm corresponding to our canonical choice routers. The fundamental tradeoff in link-state protocols
of hash function and path subsequence made in Section 2.1. (given our canonical choice of PathID hashing method) is
Section 3 develops an alternative forwarding algorithm (for route-computation and space complexity incurred at up-
OSPF/IS-IS) that does not require a large forwarding table graded routers to avoid signaling.
at interior nodes. In link-state protocols each router has a complete map of
IP forwarding tables essentially contain two-tuple entries the network in the form of link-state database. We propose
of the form [destination prefix, outgoing interface]. A to first annotate this “map” at an upgraded node with the
longest-prefix-match lookup procedure is employed. At up- knowledge of other upgraded nodes (we defer the discussion
graded routers we propose to use four-tuple entries of the of how this is achieved in case of OSPF/IS-IS and BGP to
form [destination prefix, incoming PathID, outgoing sections 4 and 5). In Figure 2, upgraded node A will know
interface, outgoing PathID]. The “incoming PathID” that nodes C and D are upgraded and vice versa.
field represents the hash of the explicit path from the current Presently, consider a single flat, link-state routing domain.
router to the destination prefix. The “outgoing PathID” We do not consider extension of BANANAS to distance-
field is the hash of the corresponding path suffix from the vector routing algorithms (e.g. RIP). Using the link-state
next upgraded router to the destination. database (“map”) and knowledge of upgraded routers, ev-
An upgraded router first matches the destination IP ad- ery router can locally compute available network paths. The
dress using the longest prefix match, followed by an exact simplest model that admits the largest number of paths is
match of the PathID for that destination. If matched, the where each upgraded router can forward to any neighbor.
incoming PathID in the packet is replaced by the outgoing The paths can be computed by performing a depth-first-
PathID, and the packet is sent to the outgoing interface. search (DFS) [24] that traverses every neighbor of upgraded
If an exact match is not found (i.e. errant hash value in nodes and the shortest-path neighbor at non-upgraded nodes.
packet), then the hash value in the packet is set to zero, and The shortest path next-hops of non-upgraded nodes can be
the packet is sent on the default path (i.e. shortest path in found by performing multiple Dijkstra’s or an all-shortest
OSPF/IS-IS or default policy route in BGP-4). The hash paths algorithm e.g. Floyd-Warshall [24]. This results in a
value may also be set to zero if the next-hop is the desti- table containing next-hops for all paths to a destination un-
nation itself, or there are no upgraded routers in the path der the constraint of a known subset of MPC nodes. We refer
specified by the incoming PathID. A non-upgraded router to this strategy as DFS under partial upgrade constraints or
simply ignores the PathID field and forwards the packet on DFS-PU for shorthand. This simple approach is expensive
the shortest path. The global PathIDs may be computed at in both computational and storage terms, especially as the
each router with minor modifications to OSPF LSAs (See number of MPC nodes grows.
Section 4). The BANANAS framework allows an upgraded router to
Figure 2 shows a partially upgraded network. Nodes A, compute and store only a valid subset of available paths
C and D are multipath capable (MPC). Assume that node under partial constraints. The subset of available loop-
Proceedings of the ACM SIGCOMM 2003 Workshops 279 August 2003
free paths can be computed using a multipath computa- and use the same value of k.
tion algorithm available in literature, for example k-shortest- In summary, Algorithm 1 is a general 2-phase valida-
paths, all k-hop paths, k-disjoint paths (see [5] and refer- tion procedure that can be applied to validate paths com-
ences within), DFS with constrained depth ([7] uses a depth- puted using any deterministic path computation algorithm
constraint of 1-hop) etc. The only constraint is that the at MPC routers that also computes the default shortest
algorithm should also compute the shortest (default) path. path.
These algorithms may be adapted for the MPC constraint,
i.e. there is a known subset of upgraded nodes. Algorithm 1 Algorithm for validating paths at a router in
However, there is a second, more subtle problem: if dif- a partially upgraded network
ferent routers compute and store different sets of paths, it 1: Let N U and U denote the set of all non-upgraded and up-
is possible that the path computed by one upgraded node graded nodes respectively
may not be supported by another upgraded or non-upgraded 2: for all u ∈ U do
node that lies downstream on this path. We term such paths 3: newPaths ← Compute paths using u’s advertised algorithm
4: Routing Map.append(newPaths)
as “invalid”, i.e., forwarding support for the path does not 5: end for
exist at some downstream node. 6: for all n ∈ N U do
To solve the above problem, we propose a distributed val- 7: newPaths ← Compute shortest path using Dijkstra’s algo-
idation algorithm that ensures validity of chosen paths. The rithm
main idea behind the validation algorithm is that a path 8: Routing Map.append(newPaths)
is valid (i.e. forwarding for a path exists) if all its path 9: end for
10: All 1-hop paths are valid
suffixes are valid. This suggests a mathematical induction 11: Initialize suffixLength ← 2
based approach. We know that all one-hop paths are always 12: while suffixLength < maxHops do
valid because they represent a direct link. A two-hop path 13: for all path ∈ Routing Map do
is valid if its one-hop path suffix is valid. 14: if hop count of path ≥ suffixLength then
The proposed algorithm (see Algorithm 1) has two phases. 15: temp pair.hopcount ← suffixLength-1;
In the first phase a node computes the paths using the cho- 16: temp pair.PathString ← last suffixLength nodes in
path;
sen algorithm. For example, let us assume that node i uses a
17: if Routing Map.find(temp pair) == FALSE then
ki -shortest-path algorithm. The ki paths computed to each 18: delete path
destination are input into a map data structure that is or- 19: end if
dered by hop-count. In phase 2, the validation phase, the 20: end if
node needs to know the path computation algorithm and 21: end for
parameters used by other upgraded nodes. In our exam- 22: suffixLength++;
ple, node i needs to know the kj parameter associated with 23: end while
each upgraded node j. With this knowledge, it can com-
pute the kj paths for node j and input it into the hop-count
ordered map data-structure (lines 2-5 in Algorithm 1). At 3. ARCHITECTURAL FLEXIBILITY IN
non-upgraded nodes, kj is 1 (lines 6-9 in Algorithm 1). Es- BANANAS
sentially we have computed all potentially available paths in
A general concern with the canonical description so far
phase 1.
is the increase in computational and space complexity at
Phase 2 operates similar to mathematical induction. All
upgraded nodes (both edge and core nodes). An interest-
one-hop paths in the map are declared as valid. For each 2-
ing question is whether we can use an alternative hashing
hop path, the algorithm simply searches for the 1-hop path
method that leads to overall complexity reduction and a
suffix in the just-validated set. If a match is not found,
more attractive division of functions between the edge and
the path is invalid and is discarded. If the path (i.e. the
core, and between data-plane and control-plane. To demon-
corresponding PathID entry) exists in the forwarding table,
strate the affirmative answer, we develop a new index-based
it is removed. In this process, validating an m-hop path
encoding scheme that moves complexity to network edges,
entry implies looking up its (m-1)-hop path suffix in the just-
and simplifies core node operations by using an efficient, re-
validated set of (m-1)-hop paths and finding a match (the
versible hash. The tradeoff is to use a variable-length PathID
variable temp pair and the lines 16,17 in Algorithm 1 are
encoding instead of the canonical 32-bit fixed length encod-
used to find a suffix match in the Routing Map structure).
ing. Moreover, the scheme is only applicable to link-state
By mathematical induction, when the entire map has been
protocols, where the neighbor relationships do not change
linearly traversed, the remaining paths are valid.
often. Specifically, the index-based scheme is not applicable
The computational complexity of this approach can be
to path-vector based protocols like BGP-4, or mobile ad-hoc
estimated as follows. In a N-node network with u upgraded
networks where neighbor relationships change rapidly.
routers, the complexity of first phase is given uC(k) + (N −
u)C(1) where, C(k) denotes the complexity of computing 3.1 Index-based Scheme: PathID Encoding
k-shortest paths, C(1) denotes the complexity of Dijkstra’s
To motivate the scheme, consider an example. An up-
algorithm. The total number of paths, T , computed
P at the
graded node orders its link interface IDs (or alternatively
end of first phase is equal to (N −1)((N −u)+ i=u i=1 ki ). The neighbor node IDs)and represents each link by its index in
complexity of the validation phase is O(T log(T )h̄) where,
this ordering (see Figure 3). This link ID, i.e. index, can
h̄ is the average hop count for the paths. The log(T) term
now be efficiently encoded. For example, a router with 15
arises due to searching for a suffix in the M ap (see Algorithm
interfaces will need 4-bit link indices. In general, the link
1, line 18). The validation algorithm may be optimized or
or interface IDs of a node may be locally hashed using a
be eliminated for special cases, e.g. if all nodes are upgraded
globally-known hash function. Since every node knows the
Proceedings of the ACM SIGCOMM 2003 Workshops 280 August 2003
global hash function and it operates on globally-known link link-state database and knows that node 6 has two interfaces
IDs (e.g. IP addresses of interfaces) each node can indepen- and the next-hop index at node 6 is 2, encoded using two-
dently compute the hashes of any other node. bits. Note that the interface indexing starts from 1 because
PathID of zero still refers to the default (shortest) path.
Likewise, the index at node 4 for this path is 3, encoded
using three bits. The PathID of the packet sent from node
S is 0...011102 = 14, indicating an index (102 = 2 for node
6 and 0112 = 3 for node 4). Node 6 has an index table
with 2 entries mapping the link indices to the interface IP
addresses. On receiving a packet with PathID in the routing
header, it extracts the last two bits and then looks up its
index table. The PathID is also right-shifted by two bits in
this operation so that the next upgraded router can extract
its index from the last bits of the PathID. Similarly, node
4 will extract three bits from the PathID and right shifts it
Figure 3: Explanation of Index-Based Encoding by the same number before forwarding it. The remaining
Scheme PathID will now be zero. The non-upgraded routers merely
forward packets along the default shortest paths, oblivious
A path can now be specified as a concatenation of such
of the PathID field.
link-indices (e.g. Figure 3 shows PathID, in binary, of a path
via nodes 9-10-6 ). This PathID encoding is guaranteed to
be unique (unlike the earlier MD5+CRC32 encoding which
had a very small collision probability). For a reasonable
maximum bit-budget in the packet header (e.g. 128 bits),
and an average of 15 interfaces per router, up to 32-hop
paths can be encoded with this technique. The limitation of
32-hops is not too restrictive (in [25], authors find that the
average number of hops to reach a destination in the Internet
is 19); it applies only within a single area or a domain. The
PathID is re-initialized by the first upgraded router after
crossing any area or domain boundary.
The concatenation operation used here is an example of
a reversible or perfect hash, i.e., the local hash (i.e. next-
hop information) can be extracted from the overall PathID Figure 4: Forwarding with the Index-based PathID
without needing a per-path table entry. The state needed encoding scheme (Note: “0b” indicates binary en-
at interior nodes is a small; only a table mapping link in- coding)
dices to link-IDs is needed. For example, at a router with
15 interfaces, a 15 entry index-table is needed irrespective of 3.3 Index-based Scheme: Path Computation
network size. No other control-plane computation or state- In this scheme, “source” (or edge routers) can indepen-
complexity is required at interior nodes. Since the interior dently use any multipath computation algorithm to find a
nodes can forward to any neighbor now, a large number of subset of available paths, similar to the discussion in Sec-
network paths may be supported. Edge-nodes can compute tion 2.3. The only information needed is the knowledge of
paths using heterogeneous algorithms, and use a simpler val- which routers in the network are upgraded (available with
idation algorithm (see Section 3.3). the MPC-bit in LSAs).
To summarize the impact in terms of function placement Path validation is only necessary to impose the constraint
and complexity management, the index-based scheme uses that non-upgraded nodes can forward packets only on their
per-hop PathID processing instead of a table-driven per-hop default shortest paths. Algorithm 2 shows the pseudo-code
PathID swapping strategy. Only edge routers need to com- of a generic validation algorithm for edge routers. Only
pute the multipaths and their PathIDs using a simplified those paths are valid, where the next-hop of the non-upgraded
validation procedure. The memory requirements at the core routers corresponds to their shortest path next-hop. Again,
routers are also greatly reduces. the validation algorithm consists of two phases. First phase
deals with the computation of shortest paths for non-upgraded
3.2 Index-Based Scheme: Packet Forwarding nodes (lines 4-6 in Algorithm 2) and computation of mul-
Upgraded interior routers maintain an index table that tiple paths using any desired multipath computation algo-
maps the interface index to the link interface IP address. rithm. In second phase, the paths are checked for pass-
On receiving a packet, an upgraded interior router extracts ing through non-upgraded nodes. If a path passes through
the interface index of the outgoing interface (next-hop) from a non-upgraded node, the next-hop must be same as the
the PathID field in the packet header and uses the interface next-hop in the pre-computed shortest path. A path is
index table to forward the packet on the appropriate link invalid if this condition is not met (lines 14-16). In a N-
(see Figure 4). node network with u upgraded routers, the complexity of
Figure 4 shows a packet being sent from node S to node first phase is given C(k) + (N − u)C(1) where, C(k) de-
7 along the path S-6-2-4-3-7, the PathID at various points notes the complexity of computing k paths (assuming the
and various interface indices. Only nodes S, 6 and 4 are upgraded router keeps k paths), C(1) denotes the complex-
upgraded. Node S has complete map of the network from the ity of Dijkstra’s single-shortest-path algorithm. The com-
Proceedings of the ACM SIGCOMM 2003 Workshops 281 August 2003
plexity of the second phase of the validation algorithm is
O(k × (N − 1) × (N − u)), where k is the maximum number
of paths for each destination to be stored in the forwarding
table. Note that the validation phase in the index-based
path encoding scheme is simpler compared to the validation
phase in Algorithm 1. This is because the upgraded routers
can forward packets to any of their interfaces. Recall that in
Algorithm 1, the validation phase also needed to ensure that
the downstream upgraded nodes of a path would indeed pro-
vide forwarding for that path (i.e. have a forwarding table
entry for that path).
Figure 5: Proposed Modifications to OSPF Link
Algorithm 2 Algorithm for validating paths in new Scheme State Advertisements (LSAs)
1: Let N denote the set of nodes in a network and N U denote cate the choice of route computation algorithm along with
the set of non-upgraded nodes
2: Compute multiple paths using desired multipath computation its parameters (E.g. the value of k in k-shortest paths al-
algorithm gorithm). In our Zebra-based implementation, we have as-
3: Let P(dst) denote the set of paths to destination dst sumed that upgraded nodes implement the k-shortest-path
4: for n ∈ N U do algorithm with different values of k. Therefore, we leverage
5: Compute Dijkstra the currently unused 8-bits after the router type field in the
6: end for
LSA to indicate the value of k.
7: for dst ∈ N do
8: Compute the desired paths to destination dst using any For the alternative index-based path encoding scheme, the
of k-shortest paths, k-disjoint paths, all paths upto k-hops concatenation of indices is done from the lower-order-bits to
etc. the high-order-bits. Each router simply shifts the PathID to
9: for path ∈ P(dst) do the right by the number of bits needed to encode its interface
10: for n ∈ N U do index. This allows upgraded interior routers to extract the
11: if path.find(n)==TRUE then
next-hop index from the lowest-order-bits without knowing
12: // nextHopSP is the next-hop in the shortest path
from n to dst its position within the path, i.e. without the knowledge of
13: // nextHop(path) denotes the next-hop of n in the how many upgraded nodes are on the path. The upgraded
path interior routers only need to set the MPC bit in their LSA
14: if nextHop(path) ! = nextHopSP then and need not advertise the route computation algorithm.
15: delete path Each upgraded router must maintain an ordered list of its
16: end if
own interfaces and the corresponding index. The upgraded
17: end if
18: end for edge routers can use any multipath algorithm to compute
19: end for multiple paths. However, they need to validate the paths
20: end for using the validation algorithm (Algorithm 2). All upgraded
routers must always compute the default shortest paths to
all destinations. This is necessary in order to forward pack-
4. BANANAS EXTENSIONS FOR INTRA- ets with no PathID option, zero or erroneous PathID.
DOMAIN PROTOCOLS 4.1 Forwarding Across Multiple Areas
In this section, we summarize the extensions to OSPF/IS- Large OSPF and IS-IS networks support hierarchical rout-
IS to support the BANANAS framework. A 32-bit PathID ing with up to two levels of hierarchy. Our approach is to
field is required in the packet header, that can be imple- view each area as a flat routing domain for the purpose of
mented as a new routing option, called i-PathID (in the multipath computation. Multiple paths are found locally
context of intra-domain routing, PathID actually refers to within areas, and crossing areas are view as crossing to a
i-PathID). The route computation algorithm (Dijkstra’s al- new multipath routing domain, i.e. we re-use the i-PathID
gorithm) at upgraded routers must be extended to compute field. For example, if a source needs to send a packet outside
multiple paths (e.g. DFS under partial upgrade constraints an area, it chooses one of the multipaths to the area border
(DFS-PU), k-shortest paths [5] etc), and a validation al- router (ABR). Then, the ABR may choose among the sev-
gorithm (Algorithm 1). The upgraded nodes must compute eral multipaths within area 0 to other ABRs. The i-PathID
the shortest path as the default path. Incoming packets with field is re-initialized by the first ABR at the area-boundary.
erroneous PathIDs are forwarded on the shortest paths and
the PathID field set to zero. The intra-domain forwarding
tables at upgraded routers would have tuples (destination 5. BANANAS EXTENSIONS TO BGP
prefix, incoming PathID, outgoing interface (next-hop), out-
going PathID). As indicated in Figure 5, one bit in the OSPF 5.1 Motivation and Goals
Link State Advertisements (LSAs) [26] must be used to in- BGP-4 [27] is the inter-domain routing protocol in the
dicate that the router is multipath capable (MPC). In the Internet. BGP uses a path vector and policy routing ap-
Linux/Zebra based implementation as well as in the SSFNet proach to announce a subset of actively used paths to its
simulations, we have used the eighth bit in the LSA options neighbors. Load-balancing and traffic engineering in BGP
field of the router-LSA as the MPC bit. are becoming important as operators attempt to deploy ser-
Also, if we allow different upgraded routers to compute vices like virtual private networks (VPNs), and optimize on
paths using different algorithms, we need some bits to indi- complex peering agreements [1, 28, 29, 30]. Enterprises are
Proceedings of the ACM SIGCOMM 2003 Workshops 282 August 2003
also increasingly multi-homed and are increasingly active in distributed mechanism to send packets along an arbitrary,
managing their inbound and outbound traffic [1, 31]. but validated AS-PATH. The idea is similar to the explicit
While BANANAS is not designed to address multitude path routing introduced for OSPF/IS-IS, except that we
of configuration, stability and load-balancing problems [32, now refer to explicit AS-PATHs rather than a sequence of
29, 33] of BGP, it does provide a set of building blocks to contiguous routers and links. In particular, we propose a
enable fine-grained BGP traffic engineering both within and separate hash field called external-PathID or e-PathID in
across domains. In particular, BANANAS introduces two packets for this function. The e-PathID is the hash of the
new capabilities: explicit exit forwarding and explicit AS- desired AS-PATH, i.e., hash of the sequence of AS numbers.
PATH forwarding. We examine these aspects further in the The e-PathID hash is processed as follows. First, in an up-
following sections. graded AS, assume that at least the entry and exit AS border
routers (ASBRs) are upgraded to support the explicit AS-
5.2 Explicit-Exit Forwarding PATH function. Assume that a border router (called the en-
The idea of explicit-exit routing is quite simple. The over- try ASBR) receives a packet with a non-zero, valid e-PathID.
all objective is to define a traffic aggregate and then map The incoming e-PathID is used by the entry ASBR to deter-
it to a chosen exit router (ASBR). Traffic aggregates may mine an appropriate exit ASBR. The packet is then explic-
be chosen at per-packet, per-flow or per-prefix granulari- itly sent to this exit ASBR using the mechanisms described
ties by the upgraded EBGP or IBGP routers, i.e., ISPs can in the earlier section, i.e. address-stacking. Indeed, once
define fine-grained bundles of outbound traffic. Unlike LO- the address is stacked, the i-PathID may also be explicitly
CAL PREF, the explicit exit capability can map traffic for chosen to indicate a specific route to that exit ASBR. Note
the same destination prefix to multiple exits (based upon that the e-PathID is not swapped at the entry ASBR. The
the autonomous decisions at upgraded IBGP nodes). outgoing e-PathID (for the AS-PATH suffix) replaces the in-
The explicit exit mechanism works as follows. An up- coming e-PathID only at the exit ASBR. This convention is
graded IBGP router chooses an arbitrary exit AS border required because the autonomous system is an atomic entity
router (ASBR) for a given traffic aggregate (e.g. a flow or (similar to a node) as far as the e-PathID is concerned. How-
all traffic to a destination prefix). It then “pushes” the desti- ever, the AS physically breaks up into an entry- and exit-
nation address into a “address stack” field, and replaces the ASBR (similar to input and output interfaces of a node). If
destination address with the exit ASBR address (adjusting we imagine that the abstract PathID swapping happens at
the checksum appropriately). Now, intermediate routers for- the output interface, that corresponds to our convention of
ward the packet to the exit-ASBR to which it is addressed. swapping the e-PathID at the exit ASBR. Observe, that we
The exit-ASBR then simply “pops” the address from the have required only EBGP routers to be aware of the multi-
address-stack field back into the destination address field AS-PATH feature, and do not require upgrades in selected
(and adjusts the checksum) before forwarding it along to IBGP routers (unlike the explicit exit case discussed earlier).
the next AS.
The upgraded IBGP node would hence have table en-
tries of the form: [Dest-Prefix Exit-ASBR Next-Hop-
to-Exit-ASBR] and [Dest-Prefix Default-Next-Hop].
The second tuple is the regular IBGP-defined default pol-
icy route for the destination prefix: this forwarding entry
is used for all traffic for which this IBGP router does not
decide the exit router. The first 3-tuple is applied only to
the traffic aggregates for which this IBGP router chooses an
explicit exit. This kind of operation is important to avoid
conflicting exit routing decisions by upgraded IBGP routers.
Observe that only a subset of IBGP routers and exit AS-
BRs (eBGP) routers need to be upgraded. All BGP routers
synchronize on their default policy routes as usual [27]. In
addition, the upgraded exit ASBRs should also synchronize
with the upgraded IBGP routers so that they know which
exits are available for any given prefix.
The explicit-exit mechanisms proposed are similar in spirit Figure 6: Topology for illustrating explicit AS-
to the label-stacking (multi-level tunnelling) ideas in MPLS[21]. PATH forwarding
A key difference is that BANANAS proposes only a single- To illustrate the explicit AS-PATH feature, we consider
level address stack, whereas MPLS can have multiple levels the AS-graph topology in Figure 6, and assume that we
in its label-stack. Note that the explicit exit routing is a would like to send traffic from AS1 to AS5, i.e. to the IP pre-
special case of explicit path routing introduced in earlier fix 0.0.0.48 along AS-PATH AS1-AS2-AS3-AS5, represented
sections. The PathID “hash” in this case is simply the exit as (1 2 3 5). The AS-PATHs available are AS1-AS2-AS5,
ASBR IP address. This address stacking procedure operates AS1-AS2-AS4-AS3-AS5, AS1-AS2-AS3-AS5. The explicit
in the fast processing path at all routers (both upgraded and path (1 2 3 5) is chosen at router 1; the suffix AS-PATH
non-upgraded), unlike IP loose-source-routing that defaults is (2 3 5) whose hash is placed in the e-PathID field in the
to the slow-processing path because it is an IP option. outgoing IP packet. The next-hop is an entry router in AS2.
An exact match of prefix and e-PathID results in the packet
5.3 Explicit AS-PATH Forwarding being forwarded to the AS3. The e-PathID will be swapped
The goal of explicit AS-PATH forwarding is to provide a only at the exit ASBR (i.e. Router 2 in AS2). A simi-
Proceedings of the ACM SIGCOMM 2003 Workshops 283 August 2003
lar sequence of events occurs in AS3 involving entry ASBR ular Router package [34] (data-plane) and GNU Zebra rout-
(router 1) and exit ASBR (router 3) before the packet is ing sofware version 0.92a [35] (control-plane). These imple-
forwarded to AS5. The outgoing e-PathID from AS3 will be mentations are tested on Utah’s Emulab testbed [36] to em-
set to 0 because AS5 is the destination AS. ulate sizable topologies running real implementation code.
In spite of these apparent reductions in upgrade complex- In particular, we test three cases: a) when an upgraded
ity, BGP’s path-vector nature poses a more important prob- router keeps all available paths (as computed by the DFS-
lem. Specifically, a new AS-PATH is unknown to an up- PU strategy), b) when upgraded nodes compute k-shortest
stream AS unless the intervening AS explicitly advertises it paths, with heterogeneous values of k at different nodes, and
(after internal synchronization). In other words, even if ISPs c) the index-based scheme to illustrate architectural flexibil-
were interested in AS-PATH multiplicity, increased control ity.
traffic is necessary to advertise the existence of multiple AS- We use SSFNet [37] for larger integrated BGP/OSPF sim-
PATHs to neighbor AS’es. Recall that such excess control ulations. These SSFNet simulations illustrate the frame-
traffic was not required in link-state algorithms (we merely work in larger network topologies that integrate both OSPF
piggybacked LSAs with minimal information). On the other and BGP BANANAS functionalities. Note that in this sec-
hand, the path-vector nature of BGP-4 also implies that no tion, we have intentionally preferred simplicity in terms of
path computation is necessary once the multiple AS-PATHs topology/test-case choices. We have performed a larger set
have been received and filtered for acceptance. of SSFNet simulations and Emulab runs in more complex
We recognize that this increased control traffic require- scenarios, all of which support our assertions. These results
ment poses a significant disincentive for ISPs against adopt- will be reported in a detailed technical report.
ing multi-AS-PATH capabilities en masse. Given the scal-
ability and instability issues with adding control traffic, we 6.1 Linux Implementation Results
expect that ISPs may choose to advertise only a small set Figure 7 shows the topology of a simple validation ex-
of multiple AS-PATHs to their neighbor AS’es. For exam- periment conducted on Utah’s Emulab [36] testbed with
ple, some AS’es may collaborate to allow forwarding along the Linux Zebra version 0.92a of OSPF (i.e. control-plane)
multiple paths to certain destination prefixes and advertise upgraded with our BANANAS building blocks. The for-
this as a non-transitive attribute to certain AS’es only. warding plane was implemented in Linux using MIT’s Click
Modular Router package [34]. Note that this is a partially
5.4 BANANAS Extensions to BGP-4 upgraded network: only nodes 1 and 2 (the dark colored
In summary, we propose two capabilities in the context of nodes) are upgraded in this configuration. Figure 7 also in-
inter-domain routing: explicit exit routing and explicit AS- dicates the IP addresses of various router interfaces and the
PATH routing. For the former, we propose a 32-bit “address link weights. The router ID is statically defined to be the
stack” field in the routing header into which the destination smallest interface IP address.
IP address will be “pushed”. The destination field in the IP 39.3
43
39.9 69.9 75 69.6
header is overwritten with the exit ASBR’s IP address. The 43.3 3
3.3
9 6 67.6
6.6
Exit ASBR will simply “pop” the destination address back 53 45
21 45
from the ”address stack” to the destination IP address. This 43.4 51
3.1 6.2 67.7
address stacking procedure (similar to MPLS) operates in 4
4.4 4.1
1
1.1 83 1.2
2
7.2
73
7.7
7
the fast processing path unlike the IP loose source routing 45.4 5.1 8.2 78.7
38
option. Moreover, it allows flexibility for only a subset of 67 5.5
93
67 8.8
51
55
BGP routers to be upgraded to support such explicit exit 45.5 5
51.5 51.1
10
81.1 81.8
8 78.8
choice.
For explicit AS-PATH forwarding we propose a new 32-bit
field in the packet routing header called the external PathID All IP−addresses denoted by a.b are actually 192.168.a.b
or e-PathID. This field stores a hash of the sequence of ASNs
along the desired explicit AS-PATH. ISPs may choose to Figure 7: Experimental Topology on Utah Emulab
only advertise a small set of multiple AS-PATHs to their using Linux Zebra/Click Platforms (Note: only dark
selected neighbor AS’es. In a multi AS-PATH capable AS, colored nodes are multi-path capable)
only the entry ASBRs and exit ASBRs (i.e. only the EBGP
routers) need to be upgraded and synchronized on the avail- 6.1.1 All Paths with Partial Upgrades (DFS-PU Al-
able multiple AS paths. The incoming ePathID hash is gorithm)
swapped with the outgoing AS-PATH suffix hash only at Table 1 illustrates a partial forwarding table computed at
the exit AS border router. The forwarding from the entry node 1 (IP address 192.168.1.1) for destination 3 (192.186.3.3).
ASBR to the exit ASBR uses the explicit exit mechanisms Note that the path string shown in Table 1 is only for the
described above. Multiple paths between the entry and exit sake of illustration and is not stored in the actual routing
ASBRs are possible using the i-PathID mechanism described table. The PathIDs are the (MD5 + CRC-32) hashes of the
earlier for intra-domain routing. router IDs (i.e. IP addresses of nodes) on the path. For
example, the PathID 2084819824 corresponds to a hash of
the set of router IDs {192.168.1.1, 192.168.1.2, 192.168.6.6,
6. IMPLEMENTATION AND SIMULATION
192.168.39.9, 192.168.3.3 }. The outgoing path ID is the
RESULTS hash of the suffix path formed after omitting 192.168.1.1. If
In this section, we illustrate the working of the proposed the path goes through other nodes which are not upgraded
framework. We have implemented the BANANAS frame- (e.g. 1-4-3), the outgoing path ID is the hash of the suffix
work schemes in the Linux kernel: we use MIT’s Click Mod- path starting from the next upgraded router on the path.
Proceedings of the ACM SIGCOMM 2003 Workshops 284 August 2003
In the case of the path 1-4-3, both nodes 4 and 3 are not Path Incoming PathID Next-hop Outgoing PathID
upgraded, so the suffix path ID is zero. 2-6 1973392862 0.0.0.0 1973392862
2-7-6 2123671348 192.168.7.7 2123671348
Outgoing I/f Path Incoming PathID Outgoing PathID
192.168.1.1 1-2-6-9-3 2084819824 664104731 Table 4: Part of routing table at 192.168.2.2 for des-
192.168.3.1 1-3 599270449 0 tination 192.186.6.6
192.168.4.1 1-4-3 4183108560 0
192.168.5.1 1-5-4-3 1365378675 0 Path Incoming PathID Next-hop Outgoing PathID
2-8 3491782861 0.0.0.0 0
Table 1: Partial routing table at 192.168.1.1 for des- 2-6-7-8 3645081405 192.168.6.6 0
tination 192.186.3.3
Table 5: Part of routing table at 192.168.2.2 for des-
6.1.2 k-Shortest Paths with Partial Upgrades tination 192.186.8.8
In this section we illustrate, using the Linux implemen-
tation, the case when the upgraded routers compute upto
k-shortest paths, and different upgraded routers using dif- those paths for destination node 7), and the i-PathIDs using
ferent values of k. index-based encodings. The node 6 may choose any one of
Consider the 10-node topology shown in Figure 7. This these paths for a packet to node 7. We have verified that
topology was setup in the Emulab network. We assume that the progression of i-PathIDs through the network follows the
the routers 192.168.1.1 and 192.168.1.2 are upgraded with description given in Section 3.2.
k equal to 3 and 2 respectively. The results are presented
to verify the correctness of the “validation phase” (Algo-
6.3 Integrated OSPF/BGP SSFNet Simulation
rithm 2). Tables 2, 3 show respectively part of the rout- In this section we use SSFNet simulation results to il-
ing tables at 198.168.1.1 for destinations 198.168.6.6 and lustrate the integrated operation of proposed framework in
198.168.8.8 respectively. Tables 4, 5 show the correspond- the Internet. This example demonstrates both the intra-
ing entries at router 198.168.2.2. For destination 198.168.6.6 domain (OSPF) and inter-domain (BGP-4) operation of the
the router 198.168.1.1 finds 3 paths, all of which are valid as framework with explicit AS-PATH as well as explicit exit
two paths have next-hop 198.168.2.2 and router 198.168.2.2 forwarding.
keeps 2 shortest paths. For destination 198.168.8.8, the Figure 9 shows the topology used for the results presented
router 198.168.1.1 computes 3-paths, 1-2-8, 1-2-6-7-8, 1-2-7- in this section. The topology has eight (8) autonomous sys-
8. The path 1-2-7-8 is invalidated in the “validation phase” tems (AS’es). Four of these AS’es, namely AS1, AS2, AS5
as router 198.168.2.2 only keeps 2 paths (2-8, 2-6-7-8). Note and AS6, have been upgraded to support explicit AS-PATH
that the Path string is shown in Tables 2-5 for the purpose forwarding. Even within these upgraded autonomous sys-
of explanation. tems, only a subset of routers are upgraded to support the
explicit AS-PATH and explicit exit routing as described in
Path Incoming PathID Next-hop Outgoing PathID Sections 5.3 and 5.2. The upgraded routers have been
1-2-6 1989316858 192.168.1.2 3491782861 marked with a “U” in Figure 9. A blow-up of the inter-
1-2-7-6 656924081 192.168.1.2 3645081405 nal topology of AS2 is shown in Figure 10; the upgraded
1-3-9-6 534784006 192.168.3.3 0 routers are again indicated with “U”
Consider forwarding of a packet from AS1 to AS8 (see Fig-
Table 2: Part of routing table at 192.168.1.1 for des- ure 9). Given the constraints that only a partial set of AS’es
tination 192.186.6.6 are upgraded, the following AS-PATHs may be used from
AS1 to reach AS8: AS2-AS4-AS8, AS2-AS5-AS6-AS7-AS8
Path Incoming PathID Next-hop Outgoing PathID
and AS2-AS5-AS6-AS4-AS8. These AS-PATHs and their
1-2-8 3654096761 192.168.1.2 1973392862 corresponding e-PathIDs are indicated in Table 7, which
1-2-7-6-8 1777786090 192.168.1.2 2123671348 is a part of the routing table at the AS border router in
Table 3: Part of routing table at 192.168.1.1 for des- 19
tination 192.186.8.8 16
18 17
6.2 Evaluation of Index-based Path Encoding 14 13 12
Scheme 15
The alternative index-based PathID encoding scheme was 2 1 2 1 4
implemented in the Linux kernel (MIT’s Click Router plat-
4 9 8
10 3 11
3 5 5
form) and simulated in SSFNet. We present our simula- 6
1
tion results in this section on a sizeable topology that cor- 7
responds to the old MCI topology of 1995 [38]. 1
5
13
2 4 3
2
In this configuration, only nodes 4, 6, 7, 9, 10 are up- 6 4 5
2
graded. The source node in this simulation is node 6. Ob-
serve that node 6 is the only node that computes the k-
2 1
shortest-paths (k = 5) for all destinations and runs the val-
idation algorithm (Algorithm 2). All other upgraded nodes
Figure 8: Old MCI Topology: Used for Testing the
merely keep an index table as described in Section 3.1). Ta-
Index-Based Scheme (Only Nodes 4, 6, 7, 9, 10 are
ble 6 shows a part of the forwarding table at node 6 (only
upgraded)
Proceedings of the ACM SIGCOMM 2003 Workshops 285 August 2003
Destination Path i-PathID
0.0.2.107/32 5-4-3-2 17
0.0.2.107/32 5-1-4-3-2 18
0.0.2.107/32 5-4-11-7-2 1669
0.0.2.107/32 5-4-8-7-2 201
0.0.2.24/32 5-4-11-10-15-14 69
0.0.2.24/32 5-4-8-7-6-14 169
0.0.2.24/32 5-4-8-16-15-14 105
0.0.2.24/32 5-1-4-8-16-15-14 106
0.0.2.24/32 5-4-11-9-10-15-14 101
0.0.2.24/32 5-1-4-11-9-10-15-14 102
Table 9: Forwarding table at Router 5 in AS2 (Fig-
ure 10): k Shortest Paths (k = 7)
AS1. Note that the AS-PATH AS2-AS4-AS6-AS7-AS8 is
Figure 9: Topology used for integrated SSFNet sim- not available because AS4 is not upgraded, and uses a de-
ulation fault AS-PATH of AS4-AS8. Also in this simulation, we
assumed that the upgraded routers do not do any further
Path Next-Hop i-PathID filtering, i.e., they re-advertise all their available AS-PATHs
6-2-4-3-7 2 0b01110 to their neighboring AS’es.
6-10-9-17-16-11-7 10 0b00110001
In our example simulation, the border router of AS1 chooses
6-10-14-11-7 10 0b00101
6-10-9-4-3-7 10 0b01110110001 the AS-PATH AS2-AS4-AS8, which corresponds to the e-
PathID of 3535826417 (see the first row of Table 7). When
Table 6: Paths at node 6 for destination node 7 the packet arrives at router 5 of AS2 (the entry ASBR), its
(Note: 0b indicates binary encoding) header looks like Figure 11(A). This entry ASBR (i.e. router
5) of AS2 examines the incoming e-PathID to find the exit
Forwarding Table of AS1 at Router 1 ASBR to be node 2 with IP address 0.0.2.107 (see first row
Dest NextHop In e-PathID AS-PATH Out e-PathID Exit ASBR
0.57/28 2.93/32 2025862315 2-4-8 3535826417 0.91/32
of Table 8). Note that it does not swap the e-PathID field,
0.57/28 2.93/32 4160716901 2-5-6-7-8 1248156781 0.91/32 because this will be done at the exit ASBR. To emphasize
0.57/28 2.93/32 669121903 2-5-6-4-8 2630971039 0.91/32
this point, observe that the outgoing e-PathID column in
Table 8 is the same as the incoming e-PathID for the desti-
Table 7: Integrated OSPF/BGP Simulation: For- nation prefix 0.0.0.57/28.
warding Table of the Border Router in AS1 (Note: The entry ASBR (router 5) now “pushes” the destination
0.57/28 refers to IP address 0.0.0.57/28 etc) IP address (i.e. 0.0.0.57) into the address stack field and
Forwarding Table of AS2 at Router 5
replaces it with the exit ASBR IP address. The entry ASBR
Dest NextHop In e-PathID ASPATH Out e-PathID Exit ASBR also chooses a path within the AS to the exit ASBR. Table 9
0.57/28 2.97/32 3535826417 2-4-8 3535826417 2.107/32
0.57/28 2.113/32 3535826417 2-4-8 3535826417 2.107/32 shows the intra-domain paths available to reach exit ASBR
0.57/28
0.57/28
2.97/32
2.113/32
1248156781
1248156781
2-5-6-7-8
2-5-6-7-8
1248156781
1248156781
2.24/32
2.24/32
(router 2). In this simulation, we have integrated the index-
based PathID encoding scheme as well as the k-shortest path
route computation scheme (k=7) with the OSPF protocol
Table 8: Integrated OSPF/BGP Simulation: For- running in AS2. In particular, the path 5-4-11-7-2 within the
warding Table Router 5 in AS2 (See Figure 10) AS is chosen that corresponds to a i-PathID of 1669 (see the
third row of Table 9). The header fields of the packet at this
stage are shown in Figure 11(B).
The packet proceeds on the explicit intra-domain path (as
19
described in earlier sections) to reach the exit router 2 with
an i-PathID value of 0. At this router, the destination ad-
17
dress (0.0.0.57) is “popped” back from the address stack.
18
0.0.2.24/32 The e-PathID is also replaced with the outgoing e-PathID
16 15 14 of 1895667324 (see Figure 11(C)). Now the packet is sent
U
13
to AS4, which is not upgraded, but sends the packet on its
12 11 10 default policy AS-PATH, i.e., directly to AS8. In summary,
we have shown how a distributed set of upgraded and non-
upgraded nodes, with explicit paths independently selected
9
8 6
7
within upgraded AS’es can honor an explicit AS-PATH re-
0.0..2.93/32 quest of the source AS.
5 4 3 2 0.0.2.107/32
U 0.0.2.97/32 U
U=Updated Router
7. RELATED WORK
Most related work for multipath routing have been done
1
in the context of intra-domain protocols. OSPF, the most
Figure 10: Blow-up of AS2’s Internal Topology in common intra-domain routing protocol used in the Internet
the Integrated OSPF/BGP Simulation (Figure 9) today is based on single shortest path with equal splitting
Proceedings of the ACM SIGCOMM 2003 Workshops 286 August 2003
Dest IP Add EPathID IPathID Add on Stack encoded as the XOR of router IDs along the path, and is pro-
A At the entry router for AS2
0.0.0.57 1248156781 − −
ie router #5 cessed along the path using a series of XOR operations. The
work in LIRA is a special case of the BANANAS framework.
In particular, the authors do not consider the larger archi-
tectural issues of partial upgrades, route-computation, state-
B
Dest IP Add EPathID IPathID Add on Stack
At router 5@AS2 after address computation tradeoffs, inter-domain operation etc. The fo-
0.0.2.107 1248156781 1669 0.0.0.57 is pushed on stack cus in their paper was also different: a framework for service
differentiation.
Dest IP Add EPathID IPathID Add on Stack
At router 2@AS2 which is the exit 8. SUMMARY AND CONCLUDING REMARKS
C router after address is popped from
0.0.0.57 1895667324 0 − The key contributions in this paper can be summarized
the stack
as follows.
Figure 11: Diagram Showing How e-PathID, i- a. Identification of abstract multipath architectural con-
PathID and Destination Address Change in the In- cepts (global PathID semantics, efficient path hashing) that
tegrated OSPF/BGP Simulation are crucial to avoiding the need for signaling and allowing
incremental network upgrades in connectionless routing pro-
tocols.
b. Canonical multipath and explicit path realizations in
between next-hops of equal cost paths. Lorenz et al [39]
the context of legacy routing protocols: OSPF, BGP-4.
show that OSPF routing performance could be improved by
c. Demonstration of significant architectural flexibility:
O(N ) if traffic-matrix aware explicit source-based multipath
alternative PathID encodings, alternative route-computation
routing is used (e.g. MPLS-based [40, 41]).
algorithms (DFS-PU, ki -shortest paths), movement of com-
Protocol extensions to support multipath routing (both
plexity to edges, division of functions between data-plane
in RIP and OSPF) have been studied by Narvaez et al [7],
and control-plane, development of distributed validation al-
Chen et al [6] and Vutukury et al [8]. In [7], authors propose
gorithms etc.
to find loop-free multipaths only by concatenating the short-
d. Linux implementation results and integrated OSPF/BGP
est paths of their neighbors with their link to the neighbors.
simulation results to validate various options
This approach essentially uses a depth first search with a
These building blocks can be used in two broad ways.
depth of 1, whereas we allow arbitrary depth in our DFS-PU
First, in the context of traffic engineering within a partially
algorithm. Chen et al and Vutukury et al [6, 8] propose more
upgraded legacy network. An operator may want to emu-
general multipath computations, but their schemes require
late signaled capabilities in a connectionless network (e.g.
the co-operation and upgrade of all the routers in the net-
see [41, 39]) or might desire fine-grained traffic management
work. Chen et al present a general concept of suffix-matched
control hard to extract from parameter tweaking (e.g. see
path identifier to allow multipath computation using dis-
[30, 29, 31, 32]). The building blocks may be mixed and
tributed computation, but they use local labels to realize the
matched in a limited number of ways. For example, one
path like in ATM networks [20] or MPLS [21]. Therefore,
could select a MD5+CRC32 encoding for BGP-4 (i.e. e-
they require a signaling protocol to map a global path spec-
PathIDs) and a index-based encoding for OSPF (i-PathID).
ification to locally assigned labels at each node.
Obviously, a common encoding must be chosen across ISPs
The proposed BANANAS framework allows source-based
for the explicit AS-PATH case.
multipath routing using a “PathID”. The use of a globally
Second, and perhaps more important, the BANANAS
significant path hash allows multipath capabilities without
framework building blocks could form the long-term basis for
signaling (i.e. in a connectionless manner) even in a partially
a best-effort end-to-end path multiplicity model. Through
upgraded network. The signaling requirement for source-
the independent partial upgrades of nodes in different au-
routing is seen in protocols like ATM networks, MPLS net-
tonomous systems, end-systems can have a growing expecta-
works [21] and NIMROD [12] routing (a link-state approach
tion of multiple end-to-end paths. We strongly believe that
to inter-domain routing). IPv4 [18] and IPv6 [19, 13] pro-
such a mere expectation of end-to-end path multiplicity will
vide a variable-length loose-source-routing option that may
trigger substantial application innovation. To test this hy-
be considered “data-plane” signaling. But IPv4/v6 uses a
pothesis, we plan to deploy the BANANAS framework on
uncompressed string of IP addresses in contrast to our effi-
the PlanetLab infrastructure [22] as a public experimental
cient PathID encoding schemes.
wide-area network overlay service by Fall 2003.
Even though MPLS has gained popularity in some large
ISPs, many ISPs may prefer using OSPF/IS-IS to enable
multipath and traffic engineering capabilities. This is due 9. REFERENCES
to the widespread deployment and operational experience
available with OSPF/IS-IS. Our approach extends the OSPF/IS- [1] G. Huston, “Commentary on inter-domain routing
IS to allow such capabilities even in partially upgraded net- in the internet,” RFC 3221, December 2001.
works. Our index-based scheme offers significant reduction [2] N. Spring, R. Mahajan, D. Wetherall, “Measuring
of state complexity in comparison to MPLS label tables. ISP Topologies with Rocketfuel,” SIGCOMM 2002,
Our computations can also be further optimized using incre- Pittsburg PA, August 2002.
mental k-shortest path algorithms similar to those suggested [3] H. Tangmunarunkit, R. Govindan, S. Jamin, S.
for OSPF’s Dijkstra algorithm [42, 43]. Shenker, W. Willinger, Network Topology
In LIRA [11], Stoica et al briefly propose a forwarding Generators – Structural vs. Degree-Based,
scheme which they suggest could replace MPLS. A path is Proceedings of the ACM SIGCOMM, August 2002.
Proceedings of the ACM SIGCOMM 2003 Workshops 287 August 2003
[4] Keshav, S., An Engineering Approach to Computer Unequal Error Protected MPEG Video Streams for
Networking, Addison-Wesley, 1997. Multiple Channel Transmission,” in IEEE
[5] D. Eppstein, “Finding the k shortest Paths,” International Conf. on Image Processing, Rochester,
Proceedings of 35th IEEE Symposium on NY, Sept 2002.
Foundations on Computer Science (FOCS), pp. [24] T. H. Cormen et. al. “Introduction to Algorithms,”
154-165, 1994. The MIT Press, McGraw Hill Book Company,
[6] J. Chen, P.Druschel, D.Subramanian, “An Efficient Second Edition, 2001.
Multipath Forwarding Method,” in INFOCOM’98, [25] Rka Albert, Hawoong Jeong, Albert-Lszl Barabsi,
March, 1998. “Diameter of the World-Wide Web,” in NATURE,
[7] P. Narvaez, K. Y. Siu, “Efficient Algorithms for VOL 401,9, SEPTEMBER 1999.
Multi-Path Link State Routing,” ISCOM’99, [26] J. Moy, “OSPF Version 2,” IETF RFC 2328, April
Kaohsiung, Taiwan, 1999. 1998.
[8] S. Vutukury and J.J. Garcia-Luna-Aceves, “ A [27] J. W. Stewart, “BGP-4 Inter-Domain Routing in
Simple Approximation to Minimum-Delay the Internet,” Addison Wesley, 1999.
Routing,” SIGCOMM ’99, September, 1999. [28] W. Norton, “Internet Service Providers and
[9] D. O. Awduche, L. Berger, D. Gan, T. Li, G. Peering,” White Paper, 2002.
Swallow, V. Srinivasan, “RSVP-TE: Extensions to [29] T. Griffin, G. Wilfong, ”Analysis of the MED
RSVP for LSP Tunnels,” IETF RFC 3209, oscillation problem in BGP,” Proceedings of ICNP
December 2001. 2002, Paris, France, November 2002.
[10] L. Andersson, P. Doolan, N. Feldman, A. Fredette, [30] N. Feamster, J. Borkenhagen, and J. Rexford
B. Thomas, “Label Distribution Protocol “Controlling the impact of BGP policy changes on
Specification,” IETF RFC 3036, January 2001. IP traffic,” AT&T Research Technical Report
[11] I. Stoica, H. Zhang, “LIRA: An Approach for 011106-02, November 2001.
Service Differentiation in the Internet,” in [31] S. Hares et al, “Smart Routing Technologies,”
Proceedings of NOSSDAV’98, Cambridge, England, NANOG Panel, Toronto, June 2002.
July 1998, pp. 115-128. http://www.nanog.org/mtg-0206/smart.html
[12] I. Castineyra, N. Chiappa, M. Steenstrup, “The [32] R. Mahajan, D. Wetherall, T. Anderson,
Nimrod Routing Architecture,” IETF RFC 1992, “Understanding BGP Misconfiguration,” In
August 1996. Proceedings of ACM SIGCOMM, 2002.
[13] M. O’Dell, “GSE – an alternate addressing [33] T. Griffin, G. Wilfong, “On the Correctness of
architecture for IPv6,” Expired Internet Draft, 1997. IBGP Configuration,” Proceedings of ACM
[14] D. G. Andersen, H. Balakrishnan, M. F. Kaashoek, SIGCOMM 2002, Pittsburg PA, 2002.
R. Morris, “Resilient Overlay Networks,” in [34] E. Kohler, R. Morris, B. Chen, J. Jannotti, and M.
Proceedings of 18th ACM Symposium on Operating F. Kaashoek, “The Click modular router,” ACM
Systems Principles, Banff, Canada, October 2001. Transactions on Computer Systems, Vol. 18, No. 3,
[15] S. Savage et al, “Detour: A Case for Informed August 2000, pages 263-297.
Internet Routing and Transport,” IEEE Micro, [35] GNU Zebra Open-Source Routing Software,
volume 19, no. 1, January 1999. http://www.zebra.org/
[16] R. Stewart, Q. Xie, K. Morneault, C. Sharp, H. [36] J. Lepreau, “The Utah Emulab Network Testbed,”
Schwarzbauer, T. Taylor, I. Rytina, M. Kalla, L. http://www.emulab.net/
Zhang, V. Paxson, “Stream Control Transmission [37] Scalable Simulation Framework (SSF) Network
Protocol,” IETF RFC 2960, October 2000. Models, available from http://www.ssfnet.org.
[17] H-Y. Hsieh, R. Sivakumar, “A Transport Layer [38] Q. Ma and P. Steenkiste, “On Path Selection for
Approach for Achieving Aggregate Bandwidths on Traffic with Bandwidth Guarantees,” Proceedings of
Multi-homed Mobile Hosts,” Proceedings of ACM IEEE International Conference on Network
Mobicom 2002, Atlanta, GA, September 2002. Protocols (ICNP), Atlanta, GA, October 1997.
[18] DARPA INTERNET PROGRAM, “Internet [39] D. H. Lorenz, A. Orda, D. Raz, Y. Shavitt, “How
Protocol,” IETF RFC 791, September 1981. good can IP routing be?”, DIMACS Technical
[19] S. Deering, R. Hinden, “Internet Protocol, Version 6 Report 2001-17, May 2001.
(IPv6) Specification,” IETF RFC 1883, 1995. [40] D. Awduche, “MPLS and traffic engineering in IP
[20] U. Black, “ATM, Volume I: Foundation for networks,” IEEE Communications Magazine, Vol.
Broadband Networks,”, Prentice Hall, 2nd Edition, 37, No. 12, pp. 42-47, 1999.
1999. [41] A. Elwalid, C. Jin and I. Widjaja, “MATE: MPLS
[21] E. Rosen et al, “Multiprotocol Label Switching Adaptive Traffic Engineering,” In Proceedings of
Architecture,” IETF RFC 3031, January 2001. INFOCOM’01, April 2001.
[22] L. Peterson, T. Anderson, D. Culler, T. Roscoe, “A [42] G. Ramalingam, and T. Reps, “An incremental
Blueprint for Introducing Disruptive Technology algorithm for a generalization of the shortest-path
into the Internet,” in Proceedings of the First ACM problem,” Journal of Algorithms, Vol.21, 1996.
Workshop on Hot Topics in Networks (HotNets-I), [43] P. Narvaez, K.Y. Siu and H.Y. Tzeng, “New Dynamic Algo-
Princeton, NJ, October 2002. rithms for Shortest Path Tree Computation,” IEEE Trans-
[23] W. Xu, S. S. Hemami, “Efficient Partitioning of actions on Networking, Vol. 8, No. 6, Dec. 2000.
Proceedings of the ACM SIGCOMM 2003 Workshops 288 August 2003