BANANAS: An Evolutionary Framework for Explicit and
∗
             Multipath Routing in the Internet
                   H. Tahilramani Kaur, S. Kalyanaraman, A. Weiss, S. Kanwar, A. Gandhi
                     ECSE Department,Rensselaer Polytechnic Institute, Troy, NY-12180.
                 {hema,shivkuma, kanwas}@networks.ecse.rpi.edu, 
[email protected]ABSTRACT                                                                                   faces and autonomous systems (both enterprises and ISPs
Today the Internet offers a single path between end-systems                                of various sizes) are multi-homed [1, 2, 3]. It is interesting
even though it intrinsically has a large multiplicity of paths.                            to ponder on two questions:
This paper proposes an evolutionary architectural frame-                                   a) Why is path multiplicity a valuable architectural feature?
work “BANANAS” aimed at simplifying the introduction                                       b) Why have we not significantly exploited the intrinsic path
of multipath routing in the Internet. The framework starts                                 multiplicity in the Internet ?
with the observation that a path can be encoded as a short                                    The answer to the first question is that multi-path trans-
hash (“PathID”) of a sequence of globally known identi-                                    mission can be fundamentally more efficient than the cur-
fiers. The PathID therefore has global significance (unlike                                rent single-path paradigm. Just like packet switching is fun-
MPLS or ATM labels). This property allows multipath ca-                                    damentally more efficient than circuit switching because it
pable nodes to autonomously compute PathIDs in a par-                                      offers the potential to leverage both spatial and temporal
tially upgraded network without requiring an explicit sig-                                 multiplexing gains at a single link (see [4], chapter 1,2), a
naling protocol for path setup. We show that this frame-                                   network offers one more dimension where spatio-temporal
work allows the introduction of sophisticated explicit rout-                               multiplexing gains may be obtained: different paths. Packet
ing and multipath capabilities within the context of widely                                switching does not waste unused capacity if user demand is
deployed connectionless routing protocols (e.g. OSPF, IS-IS,                               available at a single link; similarly, with path multiplicity
BGP) or overlay networks. We establish these characteris-                                  available to end-to-end flows, unused capacity in paths will
tics through the development of PathID encoding and route-                                 not be wasted if user demand is available. Using our pro-
computation schemes. The BANANAS framework also al-                                        posed BANANAS framework, such multiple paths may be
lows considerable flexibility in terms of architectural func-                              leveraged at different levels in the networking stack: legacy
tion placement and complexity management. To illustrate                                    OSPF or BGP networks, overlay networks, peer-to-peer net-
this feature, we develop an efficient variable-length hashing                              works (e.g. dynamically instantiated overlays using a peer-
scheme that moves control-plane complexity and state over-                                 to-peer lookup infrastructure to support video-conferencing)
heads to network edges, allowing a very simple interior node                               and last-mile multi-hop fixed-wireless networks.
design. All the schemes have been evaluated using both siz-                                   The answer to the second question is clearly not the lack of
able SSFNet simulations and Linux/Zebra implementation                                     algorithms and protocols. There have been several proposals
evaluated on Utah’s Emulab testbed facility.                                               for multipath route-computation [5, 6, 7, 8], Internet signal-
                                                                                           ing architectures [9, 10, 11, 12, 13], novel overlay routing
                                                                                           methods [14, 15] and transport-level approaches for multi-
1.     INTRODUCTION                                                                        homed hosts [16, 17]. The fact that these developments have
   Today’s Internet routing protocols like OSPF and BGP                                    not triggered widespread deployment suggests that the core
were designed to provide one primary end-to-end service:                                   problem is an architectural one 1 . The Internet lacks an evo-
“best effort reachability.” These protocols realize the “best-                             lutionary framework that admits incremental deployment
effort” concept by offering a single-path to destination sub-                              of path multiplicity, while providing sufficient flexibility in
nets. However, the internet topology has an intrinsic multi-                               terms of architectural function-placement and management
plicity of paths: hosts have multiple potential network inter-                             of complexity. This paper proposes to fill that void with a
∗The project was supported in part by DARPA Contract                                       framework called “BANANAS” 2 .
F30602-00-2-0537 and grants from Intel Corp. and AT&T                                         At the highest level, BANANAS proposes a simple ex-
Corp.                                                                                      tension of Internet operation to admit and leverage end-
                                                                                           to-end path-multiplicity (PM). In this model, source-hosts
                                                                                           initiate one or more end-to-end “flows” and map flows to
Permission to make digital or hard copies of all or part of this work for                  local network interfaces. The “network” provides one or
personal or classroom use is granted without fee provided that copies are                  more end-to-end paths through the independent upgrades
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to            1
republish, to post on servers or to redistribute to lists, requires prior specific           Another key problem involves incentives; but incentives de-
permission and/or a fee.                                                                   pend upon attributes of the underlying architectural frame-
ACM SIGCOMM 2003 Workshops August 25&27, 2003, Karlsruhe, Ger-                             work.
                                                                                           2
many                                                                                         BANANAS is not an acronymn! It is adapted from the car
Copyright 2003 ACM 1-58113-748-6/03/0008 ...$5.00.                                         racing comedy movie title Herbie goes Bananas
Proceedings of the ACM SIGCOMM 2003 Workshops                                        277                                                     August 2003
of a subset of network nodes, possibly situated in multiple               also building a medium-sized multi-hop 802.11 community
administrative domains. A subset of these upgraded nodes                  wireless network on which this framework will be deployed.
(e.g. selected edge-nodes) may also map “flows” to avail-                 We believe that the mere expectation of multiple end-to-
able “paths” 3 . Source-hosts may arbitrarily map “pack-                  end paths will trigger application innovation in new areas
ets” to “flows.” Observe that today’s single-path model                   such as end-to-end bandwidth aggregation [17], end-to-end
is a special case of this PM-model. The PM model also                     resilience and video transmission over multi-paths [14, 15,
allows a subset of source-hosts and routers to be indepen-                23] and end-to-end multi-path based security strategies (e.g.
dently upgraded within the scope of usual administrative                  protecting data integrity using multipaths).
boundaries. Upgraded node may “see” only a subset of                         The rest of the paper is organized as follows. Section 2
available paths within appropriate administrative bound-                  introduces the abstract framework and concepts. Section 3
aries. This high-level model is a best-effort path multiplic-             explores the architectural flexibility in BANANAS by con-
ity model, clearly different from IPv4/IPv6 connectionless                sidering an alternate index-based PathID encoding. Sec-
loose-source-routing model [18, 19] and from end-to-end sig-              tion 4 summarizes the intra-domain routing extensions for
naled source-route models used in ATM networks (e.g. PNNI                 link-state protocols, OSPF and IS-IS. Section 5 develops the
[20]) or MPLS networks [21].                                              inter-domain ideas of BANANAS in the context of BGP-4.
   BANANAS provides a set of concepts and building blocks                 Section 6 presents both simulation and linux-based imple-
to realize this high-level PM model. A core abstract idea in              mentation results to illustrate the architectural features of
BANANAS is that a path can be efficiently encoded as a                    BANANAS. Related work is surveyed in Section 7, followed
short hash (called the “PathID”) of a sequence of globally-               by summary and concluding remarks in Section 8.
known identifiers (e.g. router IDs, link interface IDs, link
weights, AS numbers etc.). This concept has some very im-                 2.    THE BANANAS FRAMEWORK
portant advantages. First, a hash-based data-plane encod-
ing is more efficient than IPv4/IPv6’s loose-source-routing               2.1     PathID: Abstract Concept
encoding [18, 19] that is an uncompressed string of IP ad-
dresses. Second, since the PathID is a function of globally-                 Consider a network modelled as a graph G = (V, E) where
known quantities, it inherits their global significance, i.e., it         V is the set of vertices or nodes and E is the set of edges or
can be computed and interpreted within the same scope of                  links in the network. Let N denote the number of nodes
visibility. This “global” scope may refer to a single rout-               in the network, i.e. the cardinality of the set V . Each
ing domain if router/link IDs are involved; or may refer to               link (i, j) ∈ E has an identifier associated with it, denoted
the universe of BGP-4 routers if AS numbers are used. The                 by li,j . Each node i also has an identifier denoted by ni .
global PathID semantics allows any upgraded multipath ca-                 Consider a path Pi,j from node i to node j, which passes
pable (MPC) node to autonomously compute the PathID                       through nodes i, 1, 2, ..., m − 1, j. This path can be repre-
without any changes in legacy single-path capable nodes. It               sented as a sequence of globally-known node and link iden-
also removes the need for an explicit out-of-band signaling               tifiers [ni , li,1 , n1 , l1,2 , n2 , ..., lm−1,j , nj ]. This path sequence
protocol as a path-setup mechanism. Note that one purpose                 can be compactly represented by a hash of its elements. A
of signaling in ATM and MPLS is to map global IDs (global                 path identifier (or, in short “PathID”) is defined as a hash
addresses, path specifications) to locally assigned IDs (la-              of the above sequence or any non-null subsequence derived
bels). The global PathID semantics allow the mapping of                   from it. Observe that the IP destination address (j), the un-
BANANAS in an incremental manner to connectionless In-                    compressed IPv4/v6 loose-source-routes [18, 19], the XOR
ternet routing protocols (e.g. OSPF, BGP-4).                              of router IDs proposed in LIRA [11], or a hash of the sub-
   In addition, the BANANAS framework allows consider-                    sequence of link weights are all examples of valid PathIDs,
able flexibility in terms of architectural function placement             obviously with differing characteristics. Therefore the par-
and complexity management. These intangible aspects are                   ticular subsequence and PathID encoding function chosen
crucial for tailoring the proposed building blocks and estab-             is crucial in determining the utility of the PathID. These
lishing the appropriate incentives for adoption by vendors                abstract concepts are illustrated in Figure 1.
and ISPs. For example, the framework allows considerable
flexibility in the choice of multipath route-computation al-
gorithms. It also provides a distributed validation proce-
dure to ensure the validity of computed PathIDs, i.e. to
check if forwarding exists in all downstream routers for the
PathIDs. As another example of architectural flexibility, we
propose an efficient variable-length hash realization of the
abstract framework: this scheme moves control-plane com-
plexity and state overheads to network edges, allowing a very
simple interior node design. The proposed scheme realiza-                           Figure 1: Path and PathID Concepts
tions are evaluated using integrated OSPF/BGP simulations
in sizable topologies and Linux/Zebra implementation run
                                                                             A desirable hash is compact, easy to compute and has
on Utah’s Emulab emulation testbed facility.
                                                                          a low collision probability (i.e. high uniqueness probabil-
   We are currently deploying the BANANAS framework on
                                                                          ity). This demands a hash function that offers low collision
the worldwide PlanetLab infrastructure [22] as an public
                                                                          probabilities. A simple hash of the path sequence may be
experimental wide-area network overlay service. We are
                                                                          obtained by using the sum or XOR function (suggested in
3
  E.g. Packets from TCP connections would be mapped sin-                  LIRA [11]). While these are simple and fast, it may lead to
gle “path” to avoid out-of-order packets                                  non-unique PathIDs. Our canonical hash function choice is
Proceedings of the ACM SIGCOMM 2003 Workshops                       278                                                                August 2003
a 128-bit MD5 hash followed by a 32-bit CRC of the 128 bit
MD5 hash (resulting in a final 32-bit hash value). We use
the notation (MD5 + CRC32) hash to represent the above
two-step hashing process. Alternatively, 32-bits of the 128-
bit MD5 hash could also have been used. This hash value is
used in conjunction with the destination address (j); leading
to a two-tuple hash: [j, PathID]. For convenience, we refer
to the second tuple value as PathID. The collision proba-
bility, probability that multiple paths lead to same PathID,
depends only on the number of paths to any given destina-
tion prefix, and the nature of the path subsequence on which              Figure 2: Multi-Path Forwarding with Partial Up-
the MD5+CRC32 function is applied. Assuming a random                      grades
bit-string as input and all the 232 outputs to be equally
                                                          n!
likely, the probability for collision is given by 1 − nk (n−k)! ,         A is the originating node for a packet destined to node F.
where, n is the number of possible outcomes (232 ) and k is               The shortest path from intermediate node B to node F is
the number of paths to a destination.                                     B-D-F and path A-B-C-F is not available for forwarding
   A sequence of well-known link interface IDs, router IDs                because node B is a non-upgraded node and the next-hop of
and link weights (in OSPF or IS-IS) on the path can be                    default shortest path of B is not C. However, paths such as
used to generate the underlying path sequence. However,                   A-B-D-C-F, A-D-E-F, A-D-C-E-F etc. are available. If the
link-weights are usually non-unique, chosen from a narrow                 path A-B-D-E-F is chosen, then the PathID of an incoming
range and may be dynamic (to implement traffic engineer-                  packet will be Hash(A-B-D-E-F). A sets the PathID field to
ing/ adaptive routing), whereas router IDs and link interface             Hash(D-E-F), i.e. the hash of the path suffix from the next
IDs are unique identifiers. Our canonical choice is the subse-            MPC router to destination. Node B forwards the packet on
quence of all node IDs on the path (generalizes to a sequence             its shortest-path (i.e. to D). Node D sets the PathID to
of AS numbers in BGP-4). Section 3 develops an alternative                zero, because there is no MPC router on the path to F.
hash function that is a concatenation of well-known link ID
indices at nodes.                                                         2.3    Path and PathID Computation
                                                                             The BANANAS framework not only supports upgrades
2.2    Packet Forwarding                                                  of a subset of nodes, but also allows heterogeneity in mul-
   This section describes the forwarding table structure and              tipath computation algorithms used at different upgraded
forwarding algorithm corresponding to our canonical choice                routers. The fundamental tradeoff in link-state protocols
of hash function and path subsequence made in Section 2.1.                (given our canonical choice of PathID hashing method) is
Section 3 develops an alternative forwarding algorithm (for               route-computation and space complexity incurred at up-
OSPF/IS-IS) that does not require a large forwarding table                graded routers to avoid signaling.
at interior nodes.                                                           In link-state protocols each router has a complete map of
   IP forwarding tables essentially contain two-tuple entries             the network in the form of link-state database. We propose
of the form [destination prefix, outgoing interface]. A                   to first annotate this “map” at an upgraded node with the
longest-prefix-match lookup procedure is employed. At up-                 knowledge of other upgraded nodes (we defer the discussion
graded routers we propose to use four-tuple entries of the                of how this is achieved in case of OSPF/IS-IS and BGP to
form [destination prefix, incoming PathID, outgoing                       sections 4 and 5). In Figure 2, upgraded node A will know
interface, outgoing PathID]. The “incoming PathID”                        that nodes C and D are upgraded and vice versa.
field represents the hash of the explicit path from the current              Presently, consider a single flat, link-state routing domain.
router to the destination prefix. The “outgoing PathID”                   We do not consider extension of BANANAS to distance-
field is the hash of the corresponding path suffix from the               vector routing algorithms (e.g. RIP). Using the link-state
next upgraded router to the destination.                                  database (“map”) and knowledge of upgraded routers, ev-
   An upgraded router first matches the destination IP ad-                ery router can locally compute available network paths. The
dress using the longest prefix match, followed by an exact                simplest model that admits the largest number of paths is
match of the PathID for that destination. If matched, the                 where each upgraded router can forward to any neighbor.
incoming PathID in the packet is replaced by the outgoing                 The paths can be computed by performing a depth-first-
PathID, and the packet is sent to the outgoing interface.                 search (DFS) [24] that traverses every neighbor of upgraded
If an exact match is not found (i.e. errant hash value in                 nodes and the shortest-path neighbor at non-upgraded nodes.
packet), then the hash value in the packet is set to zero, and            The shortest path next-hops of non-upgraded nodes can be
the packet is sent on the default path (i.e. shortest path in             found by performing multiple Dijkstra’s or an all-shortest
OSPF/IS-IS or default policy route in BGP-4). The hash                    paths algorithm e.g. Floyd-Warshall [24]. This results in a
value may also be set to zero if the next-hop is the desti-               table containing next-hops for all paths to a destination un-
nation itself, or there are no upgraded routers in the path               der the constraint of a known subset of MPC nodes. We refer
specified by the incoming PathID. A non-upgraded router                   to this strategy as DFS under partial upgrade constraints or
simply ignores the PathID field and forwards the packet on                DFS-PU for shorthand. This simple approach is expensive
the shortest path. The global PathIDs may be computed at                  in both computational and storage terms, especially as the
each router with minor modifications to OSPF LSAs (See                    number of MPC nodes grows.
Section 4).                                                                  The BANANAS framework allows an upgraded router to
   Figure 2 shows a partially upgraded network. Nodes A,                  compute and store only a valid subset of available paths
C and D are multipath capable (MPC). Assume that node                     under partial constraints. The subset of available loop-
Proceedings of the ACM SIGCOMM 2003 Workshops                       279                                                     August 2003
free paths can be computed using a multipath computa-                     and use the same value of k.
tion algorithm available in literature, for example k-shortest-              In summary, Algorithm 1 is a general 2-phase valida-
paths, all k-hop paths, k-disjoint paths (see [5] and refer-              tion procedure that can be applied to validate paths com-
ences within), DFS with constrained depth ([7] uses a depth-              puted using any deterministic path computation algorithm
constraint of 1-hop) etc. The only constraint is that the                 at MPC routers that also computes the default shortest
algorithm should also compute the shortest (default) path.                path.
These algorithms may be adapted for the MPC constraint,
i.e. there is a known subset of upgraded nodes.                           Algorithm 1 Algorithm for validating paths at a router in
   However, there is a second, more subtle problem: if dif-               a partially upgraded network
ferent routers compute and store different sets of paths, it               1: Let N U and U denote the set of all non-upgraded and up-
is possible that the path computed by one upgraded node                      graded nodes respectively
may not be supported by another upgraded or non-upgraded                  2: for all u ∈ U do
node that lies downstream on this path. We term such paths                3: newPaths ← Compute paths using u’s advertised algorithm
                                                                          4: Routing Map.append(newPaths)
as “invalid”, i.e., forwarding support for the path does not              5: end for
exist at some downstream node.                                            6: for all n ∈ N U do
   To solve the above problem, we propose a distributed val-              7: newPaths ← Compute shortest path using Dijkstra’s algo-
idation algorithm that ensures validity of chosen paths. The                    rithm
main idea behind the validation algorithm is that a path                   8: Routing Map.append(newPaths)
is valid (i.e. forwarding for a path exists) if all its path               9: end for
                                                                          10: All 1-hop paths are valid
suffixes are valid. This suggests a mathematical induction                11: Initialize suffixLength ← 2
based approach. We know that all one-hop paths are always                 12: while suffixLength < maxHops do
valid because they represent a direct link. A two-hop path                13: for all path ∈ Routing Map do
is valid if its one-hop path suffix is valid.                             14:      if hop count of path ≥ suffixLength then
   The proposed algorithm (see Algorithm 1) has two phases.               15:         temp pair.hopcount ← suffixLength-1;
In the first phase a node computes the paths using the cho-               16:         temp pair.PathString ← last suffixLength nodes in
                                                                                    path;
sen algorithm. For example, let us assume that node i uses a
                                                                          17:       if Routing Map.find(temp pair) == FALSE then
ki -shortest-path algorithm. The ki paths computed to each                18:          delete path
destination are input into a map data structure that is or-               19:       end if
dered by hop-count. In phase 2, the validation phase, the                 20:     end if
node needs to know the path computation algorithm and                     21: end for
parameters used by other upgraded nodes. In our exam-                     22: suffixLength++;
ple, node i needs to know the kj parameter associated with                23: end while
each upgraded node j. With this knowledge, it can com-
pute the kj paths for node j and input it into the hop-count
ordered map data-structure (lines 2-5 in Algorithm 1). At                 3. ARCHITECTURAL FLEXIBILITY IN
non-upgraded nodes, kj is 1 (lines 6-9 in Algorithm 1). Es-                  BANANAS
sentially we have computed all potentially available paths in
                                                                             A general concern with the canonical description so far
phase 1.
                                                                          is the increase in computational and space complexity at
   Phase 2 operates similar to mathematical induction. All
                                                                          upgraded nodes (both edge and core nodes). An interest-
one-hop paths in the map are declared as valid. For each 2-
                                                                          ing question is whether we can use an alternative hashing
hop path, the algorithm simply searches for the 1-hop path
                                                                          method that leads to overall complexity reduction and a
suffix in the just-validated set. If a match is not found,
                                                                          more attractive division of functions between the edge and
the path is invalid and is discarded. If the path (i.e. the
                                                                          core, and between data-plane and control-plane. To demon-
corresponding PathID entry) exists in the forwarding table,
                                                                          strate the affirmative answer, we develop a new index-based
it is removed. In this process, validating an m-hop path
                                                                          encoding scheme that moves complexity to network edges,
entry implies looking up its (m-1)-hop path suffix in the just-
                                                                          and simplifies core node operations by using an efficient, re-
validated set of (m-1)-hop paths and finding a match (the
                                                                          versible hash. The tradeoff is to use a variable-length PathID
variable temp pair and the lines 16,17 in Algorithm 1 are
                                                                          encoding instead of the canonical 32-bit fixed length encod-
used to find a suffix match in the Routing Map structure).
                                                                          ing. Moreover, the scheme is only applicable to link-state
By mathematical induction, when the entire map has been
                                                                          protocols, where the neighbor relationships do not change
linearly traversed, the remaining paths are valid.
                                                                          often. Specifically, the index-based scheme is not applicable
   The computational complexity of this approach can be
                                                                          to path-vector based protocols like BGP-4, or mobile ad-hoc
estimated as follows. In a N-node network with u upgraded
                                                                          networks where neighbor relationships change rapidly.
routers, the complexity of first phase is given uC(k) + (N −
u)C(1) where, C(k) denotes the complexity of computing                    3.1   Index-based Scheme: PathID Encoding
k-shortest paths, C(1) denotes the complexity of Dijkstra’s
                                                                             To motivate the scheme, consider an example. An up-
algorithm. The total number of paths, T , computed
                                                 P         at the
                                                                          graded node orders its link interface IDs (or alternatively
end of first phase is equal to (N −1)((N −u)+ i=u   i=1 ki ). The         neighbor node IDs)and represents each link by its index in
complexity of the validation phase is O(T log(T )h̄) where,
                                                                          this ordering (see Figure 3). This link ID, i.e. index, can
h̄ is the average hop count for the paths. The log(T) term
                                                                          now be efficiently encoded. For example, a router with 15
arises due to searching for a suffix in the M ap (see Algorithm
                                                                          interfaces will need 4-bit link indices. In general, the link
1, line 18). The validation algorithm may be optimized or
                                                                          or interface IDs of a node may be locally hashed using a
be eliminated for special cases, e.g. if all nodes are upgraded
                                                                          globally-known hash function. Since every node knows the
Proceedings of the ACM SIGCOMM 2003 Workshops                       280                                                    August 2003
global hash function and it operates on globally-known link             link-state database and knows that node 6 has two interfaces
IDs (e.g. IP addresses of interfaces) each node can indepen-            and the next-hop index at node 6 is 2, encoded using two-
dently compute the hashes of any other node.                            bits. Note that the interface indexing starts from 1 because
                                                                        PathID of zero still refers to the default (shortest) path.
                                                                        Likewise, the index at node 4 for this path is 3, encoded
                                                                        using three bits. The PathID of the packet sent from node
                                                                        S is 0...011102 = 14, indicating an index (102 = 2 for node
                                                                        6 and 0112 = 3 for node 4). Node 6 has an index table
                                                                        with 2 entries mapping the link indices to the interface IP
                                                                        addresses. On receiving a packet with PathID in the routing
                                                                        header, it extracts the last two bits and then looks up its
                                                                        index table. The PathID is also right-shifted by two bits in
                                                                        this operation so that the next upgraded router can extract
                                                                        its index from the last bits of the PathID. Similarly, node
                                                                        4 will extract three bits from the PathID and right shifts it
Figure 3:     Explanation of Index-Based Encoding                       by the same number before forwarding it. The remaining
Scheme                                                                  PathID will now be zero. The non-upgraded routers merely
                                                                        forward packets along the default shortest paths, oblivious
   A path can now be specified as a concatenation of such
                                                                        of the PathID field.
link-indices (e.g. Figure 3 shows PathID, in binary, of a path
via nodes 9-10-6 ). This PathID encoding is guaranteed to
be unique (unlike the earlier MD5+CRC32 encoding which
had a very small collision probability). For a reasonable
maximum bit-budget in the packet header (e.g. 128 bits),
and an average of 15 interfaces per router, up to 32-hop
paths can be encoded with this technique. The limitation of
32-hops is not too restrictive (in [25], authors find that the
average number of hops to reach a destination in the Internet
is 19); it applies only within a single area or a domain. The
PathID is re-initialized by the first upgraded router after
crossing any area or domain boundary.
   The concatenation operation used here is an example of
a reversible or perfect hash, i.e., the local hash (i.e. next-
hop information) can be extracted from the overall PathID               Figure 4: Forwarding with the Index-based PathID
without needing a per-path table entry. The state needed                encoding scheme (Note: “0b” indicates binary en-
at interior nodes is a small; only a table mapping link in-             coding)
dices to link-IDs is needed. For example, at a router with
15 interfaces, a 15 entry index-table is needed irrespective of         3.3   Index-based Scheme: Path Computation
network size. No other control-plane computation or state-                 In this scheme, “source” (or edge routers) can indepen-
complexity is required at interior nodes. Since the interior            dently use any multipath computation algorithm to find a
nodes can forward to any neighbor now, a large number of                subset of available paths, similar to the discussion in Sec-
network paths may be supported. Edge-nodes can compute                  tion 2.3. The only information needed is the knowledge of
paths using heterogeneous algorithms, and use a simpler val-            which routers in the network are upgraded (available with
idation algorithm (see Section 3.3).                                    the MPC-bit in LSAs).
   To summarize the impact in terms of function placement                  Path validation is only necessary to impose the constraint
and complexity management, the index-based scheme uses                  that non-upgraded nodes can forward packets only on their
per-hop PathID processing instead of a table-driven per-hop             default shortest paths. Algorithm 2 shows the pseudo-code
PathID swapping strategy. Only edge routers need to com-                of a generic validation algorithm for edge routers. Only
pute the multipaths and their PathIDs using a simplified                those paths are valid, where the next-hop of the non-upgraded
validation procedure. The memory requirements at the core               routers corresponds to their shortest path next-hop. Again,
routers are also greatly reduces.                                       the validation algorithm consists of two phases. First phase
                                                                        deals with the computation of shortest paths for non-upgraded
3.2    Index-Based Scheme: Packet Forwarding                            nodes (lines 4-6 in Algorithm 2) and computation of mul-
  Upgraded interior routers maintain an index table that                tiple paths using any desired multipath computation algo-
maps the interface index to the link interface IP address.              rithm. In second phase, the paths are checked for pass-
On receiving a packet, an upgraded interior router extracts             ing through non-upgraded nodes. If a path passes through
the interface index of the outgoing interface (next-hop) from           a non-upgraded node, the next-hop must be same as the
the PathID field in the packet header and uses the interface            next-hop in the pre-computed shortest path. A path is
index table to forward the packet on the appropriate link               invalid if this condition is not met (lines 14-16). In a N-
(see Figure 4).                                                         node network with u upgraded routers, the complexity of
  Figure 4 shows a packet being sent from node S to node                first phase is given C(k) + (N − u)C(1) where, C(k) de-
7 along the path S-6-2-4-3-7, the PathID at various points              notes the complexity of computing k paths (assuming the
and various interface indices. Only nodes S, 6 and 4 are                upgraded router keeps k paths), C(1) denotes the complex-
upgraded. Node S has complete map of the network from the               ity of Dijkstra’s single-shortest-path algorithm. The com-
Proceedings of the ACM SIGCOMM 2003 Workshops                     281                                                    August 2003
plexity of the second phase of the validation algorithm is
O(k × (N − 1) × (N − u)), where k is the maximum number
of paths for each destination to be stored in the forwarding
table. Note that the validation phase in the index-based
path encoding scheme is simpler compared to the validation
phase in Algorithm 1. This is because the upgraded routers
can forward packets to any of their interfaces. Recall that in
Algorithm 1, the validation phase also needed to ensure that
the downstream upgraded nodes of a path would indeed pro-
vide forwarding for that path (i.e. have a forwarding table
entry for that path).
                                                                             Figure 5: Proposed Modifications to OSPF Link
Algorithm 2 Algorithm for validating paths in new Scheme                     State Advertisements (LSAs)
1: Let N denote the set of nodes in a network and N U denote                 cate the choice of route computation algorithm along with
      the set of non-upgraded nodes
2: Compute multiple paths using desired multipath computation                its parameters (E.g. the value of k in k-shortest paths al-
      algorithm                                                              gorithm). In our Zebra-based implementation, we have as-
3:    Let P(dst) denote the set of paths to destination dst                  sumed that upgraded nodes implement the k-shortest-path
4:    for n ∈ N U do                                                         algorithm with different values of k. Therefore, we leverage
5:      Compute Dijkstra                                                     the currently unused 8-bits after the router type field in the
6:    end for
                                                                             LSA to indicate the value of k.
7:    for dst ∈ N do
8:      Compute the desired paths to destination dst using any                  For the alternative index-based path encoding scheme, the
        of k-shortest paths, k-disjoint paths, all paths upto k-hops         concatenation of indices is done from the lower-order-bits to
        etc.                                                                 the high-order-bits. Each router simply shifts the PathID to
 9:     for path ∈ P(dst) do                                                 the right by the number of bits needed to encode its interface
10:        for n ∈ N U do                                                    index. This allows upgraded interior routers to extract the
11:           if path.find(n)==TRUE then
                                                                             next-hop index from the lowest-order-bits without knowing
12:              // nextHopSP is the next-hop in the shortest path
                from n to dst                                                its position within the path, i.e. without the knowledge of
13:              // nextHop(path) denotes the next-hop of n in the           how many upgraded nodes are on the path. The upgraded
                path                                                         interior routers only need to set the MPC bit in their LSA
14:              if nextHop(path) ! = nextHopSP then                         and need not advertise the route computation algorithm.
15:                 delete path                                              Each upgraded router must maintain an ordered list of its
16:              end if
                                                                             own interfaces and the corresponding index. The upgraded
17:           end if
18:        end for                                                           edge routers can use any multipath algorithm to compute
19:      end for                                                             multiple paths. However, they need to validate the paths
20:    end for                                                               using the validation algorithm (Algorithm 2). All upgraded
                                                                             routers must always compute the default shortest paths to
                                                                             all destinations. This is necessary in order to forward pack-
4.     BANANAS EXTENSIONS FOR INTRA-                                         ets with no PathID option, zero or erroneous PathID.
       DOMAIN PROTOCOLS                                                      4.1   Forwarding Across Multiple Areas
   In this section, we summarize the extensions to OSPF/IS-                     Large OSPF and IS-IS networks support hierarchical rout-
IS to support the BANANAS framework. A 32-bit PathID                         ing with up to two levels of hierarchy. Our approach is to
field is required in the packet header, that can be imple-                   view each area as a flat routing domain for the purpose of
mented as a new routing option, called i-PathID (in the                      multipath computation. Multiple paths are found locally
context of intra-domain routing, PathID actually refers to                   within areas, and crossing areas are view as crossing to a
i-PathID). The route computation algorithm (Dijkstra’s al-                   new multipath routing domain, i.e. we re-use the i-PathID
gorithm) at upgraded routers must be extended to compute                     field. For example, if a source needs to send a packet outside
multiple paths (e.g. DFS under partial upgrade constraints                   an area, it chooses one of the multipaths to the area border
(DFS-PU), k-shortest paths [5] etc), and a validation al-                    router (ABR). Then, the ABR may choose among the sev-
gorithm (Algorithm 1). The upgraded nodes must compute                       eral multipaths within area 0 to other ABRs. The i-PathID
the shortest path as the default path. Incoming packets with                 field is re-initialized by the first ABR at the area-boundary.
erroneous PathIDs are forwarded on the shortest paths and
the PathID field set to zero. The intra-domain forwarding
tables at upgraded routers would have tuples (destination                    5.    BANANAS EXTENSIONS TO BGP
prefix, incoming PathID, outgoing interface (next-hop), out-
going PathID). As indicated in Figure 5, one bit in the OSPF                 5.1   Motivation and Goals
Link State Advertisements (LSAs) [26] must be used to in-                      BGP-4 [27] is the inter-domain routing protocol in the
dicate that the router is multipath capable (MPC). In the                    Internet. BGP uses a path vector and policy routing ap-
Linux/Zebra based implementation as well as in the SSFNet                    proach to announce a subset of actively used paths to its
simulations, we have used the eighth bit in the LSA options                  neighbors. Load-balancing and traffic engineering in BGP
field of the router-LSA as the MPC bit.                                      are becoming important as operators attempt to deploy ser-
   Also, if we allow different upgraded routers to compute                   vices like virtual private networks (VPNs), and optimize on
paths using different algorithms, we need some bits to indi-                 complex peering agreements [1, 28, 29, 30]. Enterprises are
Proceedings of the ACM SIGCOMM 2003 Workshops                          282                                                    August 2003
also increasingly multi-homed and are increasingly active in         distributed mechanism to send packets along an arbitrary,
managing their inbound and outbound traffic [1, 31].                 but validated AS-PATH. The idea is similar to the explicit
   While BANANAS is not designed to address multitude                path routing introduced for OSPF/IS-IS, except that we
of configuration, stability and load-balancing problems [32,         now refer to explicit AS-PATHs rather than a sequence of
29, 33] of BGP, it does provide a set of building blocks to          contiguous routers and links. In particular, we propose a
enable fine-grained BGP traffic engineering both within and          separate hash field called external-PathID or e-PathID in
across domains. In particular, BANANAS introduces two                packets for this function. The e-PathID is the hash of the
new capabilities: explicit exit forwarding and explicit AS-          desired AS-PATH, i.e., hash of the sequence of AS numbers.
PATH forwarding. We examine these aspects further in the                The e-PathID hash is processed as follows. First, in an up-
following sections.                                                  graded AS, assume that at least the entry and exit AS border
                                                                     routers (ASBRs) are upgraded to support the explicit AS-
5.2   Explicit-Exit Forwarding                                       PATH function. Assume that a border router (called the en-
   The idea of explicit-exit routing is quite simple. The over-      try ASBR) receives a packet with a non-zero, valid e-PathID.
all objective is to define a traffic aggregate and then map          The incoming e-PathID is used by the entry ASBR to deter-
it to a chosen exit router (ASBR). Traffic aggregates may            mine an appropriate exit ASBR. The packet is then explic-
be chosen at per-packet, per-flow or per-prefix granulari-           itly sent to this exit ASBR using the mechanisms described
ties by the upgraded EBGP or IBGP routers, i.e., ISPs can            in the earlier section, i.e. address-stacking. Indeed, once
define fine-grained bundles of outbound traffic. Unlike LO-          the address is stacked, the i-PathID may also be explicitly
CAL PREF, the explicit exit capability can map traffic for           chosen to indicate a specific route to that exit ASBR. Note
the same destination prefix to multiple exits (based upon            that the e-PathID is not swapped at the entry ASBR. The
the autonomous decisions at upgraded IBGP nodes).                    outgoing e-PathID (for the AS-PATH suffix) replaces the in-
   The explicit exit mechanism works as follows. An up-              coming e-PathID only at the exit ASBR. This convention is
graded IBGP router chooses an arbitrary exit AS border               required because the autonomous system is an atomic entity
router (ASBR) for a given traffic aggregate (e.g. a flow or          (similar to a node) as far as the e-PathID is concerned. How-
all traffic to a destination prefix). It then “pushes” the desti-    ever, the AS physically breaks up into an entry- and exit-
nation address into a “address stack” field, and replaces the        ASBR (similar to input and output interfaces of a node). If
destination address with the exit ASBR address (adjusting            we imagine that the abstract PathID swapping happens at
the checksum appropriately). Now, intermediate routers for-          the output interface, that corresponds to our convention of
ward the packet to the exit-ASBR to which it is addressed.           swapping the e-PathID at the exit ASBR. Observe, that we
The exit-ASBR then simply “pops” the address from the                have required only EBGP routers to be aware of the multi-
address-stack field back into the destination address field          AS-PATH feature, and do not require upgrades in selected
(and adjusts the checksum) before forwarding it along to             IBGP routers (unlike the explicit exit case discussed earlier).
the next AS.
   The upgraded IBGP node would hence have table en-
tries of the form: [Dest-Prefix Exit-ASBR Next-Hop-
to-Exit-ASBR] and [Dest-Prefix Default-Next-Hop].
The second tuple is the regular IBGP-defined default pol-
icy route for the destination prefix: this forwarding entry
is used for all traffic for which this IBGP router does not
decide the exit router. The first 3-tuple is applied only to
the traffic aggregates for which this IBGP router chooses an
explicit exit. This kind of operation is important to avoid
conflicting exit routing decisions by upgraded IBGP routers.
   Observe that only a subset of IBGP routers and exit AS-
BRs (eBGP) routers need to be upgraded. All BGP routers
synchronize on their default policy routes as usual [27]. In
addition, the upgraded exit ASBRs should also synchronize
with the upgraded IBGP routers so that they know which
exits are available for any given prefix.
   The explicit-exit mechanisms proposed are similar in spirit       Figure 6: Topology for illustrating explicit AS-
to the label-stacking (multi-level tunnelling) ideas in MPLS[21].    PATH forwarding
A key difference is that BANANAS proposes only a single-                To illustrate the explicit AS-PATH feature, we consider
level address stack, whereas MPLS can have multiple levels           the AS-graph topology in Figure 6, and assume that we
in its label-stack. Note that the explicit exit routing is a         would like to send traffic from AS1 to AS5, i.e. to the IP pre-
special case of explicit path routing introduced in earlier          fix 0.0.0.48 along AS-PATH AS1-AS2-AS3-AS5, represented
sections. The PathID “hash” in this case is simply the exit          as (1 2 3 5). The AS-PATHs available are AS1-AS2-AS5,
ASBR IP address. This address stacking procedure operates            AS1-AS2-AS4-AS3-AS5, AS1-AS2-AS3-AS5. The explicit
in the fast processing path at all routers (both upgraded and        path (1 2 3 5) is chosen at router 1; the suffix AS-PATH
non-upgraded), unlike IP loose-source-routing that defaults          is (2 3 5) whose hash is placed in the e-PathID field in the
to the slow-processing path because it is an IP option.              outgoing IP packet. The next-hop is an entry router in AS2.
                                                                     An exact match of prefix and e-PathID results in the packet
5.3   Explicit AS-PATH Forwarding                                    being forwarded to the AS3. The e-PathID will be swapped
  The goal of explicit AS-PATH forwarding is to provide a            only at the exit ASBR (i.e. Router 2 in AS2). A simi-
Proceedings of the ACM SIGCOMM 2003 Workshops                  283                                                     August 2003
lar sequence of events occurs in AS3 involving entry ASBR               ular Router package [34] (data-plane) and GNU Zebra rout-
(router 1) and exit ASBR (router 3) before the packet is                ing sofware version 0.92a [35] (control-plane). These imple-
forwarded to AS5. The outgoing e-PathID from AS3 will be                mentations are tested on Utah’s Emulab testbed [36] to em-
set to 0 because AS5 is the destination AS.                             ulate sizable topologies running real implementation code.
   In spite of these apparent reductions in upgrade complex-            In particular, we test three cases: a) when an upgraded
ity, BGP’s path-vector nature poses a more important prob-              router keeps all available paths (as computed by the DFS-
lem. Specifically, a new AS-PATH is unknown to an up-                   PU strategy), b) when upgraded nodes compute k-shortest
stream AS unless the intervening AS explicitly advertises it            paths, with heterogeneous values of k at different nodes, and
(after internal synchronization). In other words, even if ISPs          c) the index-based scheme to illustrate architectural flexibil-
were interested in AS-PATH multiplicity, increased control              ity.
traffic is necessary to advertise the existence of multiple AS-            We use SSFNet [37] for larger integrated BGP/OSPF sim-
PATHs to neighbor AS’es. Recall that such excess control                ulations. These SSFNet simulations illustrate the frame-
traffic was not required in link-state algorithms (we merely            work in larger network topologies that integrate both OSPF
piggybacked LSAs with minimal information). On the other                and BGP BANANAS functionalities. Note that in this sec-
hand, the path-vector nature of BGP-4 also implies that no              tion, we have intentionally preferred simplicity in terms of
path computation is necessary once the multiple AS-PATHs                topology/test-case choices. We have performed a larger set
have been received and filtered for acceptance.                         of SSFNet simulations and Emulab runs in more complex
   We recognize that this increased control traffic require-            scenarios, all of which support our assertions. These results
ment poses a significant disincentive for ISPs against adopt-           will be reported in a detailed technical report.
ing multi-AS-PATH capabilities en masse. Given the scal-
ability and instability issues with adding control traffic, we          6.1     Linux Implementation Results
expect that ISPs may choose to advertise only a small set                  Figure 7 shows the topology of a simple validation ex-
of multiple AS-PATHs to their neighbor AS’es. For exam-                 periment conducted on Utah’s Emulab [36] testbed with
ple, some AS’es may collaborate to allow forwarding along               the Linux Zebra version 0.92a of OSPF (i.e. control-plane)
multiple paths to certain destination prefixes and advertise            upgraded with our BANANAS building blocks. The for-
this as a non-transitive attribute to certain AS’es only.               warding plane was implemented in Linux using MIT’s Click
                                                                        Modular Router package [34]. Note that this is a partially
5.4    BANANAS Extensions to BGP-4                                      upgraded network: only nodes 1 and 2 (the dark colored
   In summary, we propose two capabilities in the context of            nodes) are upgraded in this configuration. Figure 7 also in-
inter-domain routing: explicit exit routing and explicit AS-            dicates the IP addresses of various router interfaces and the
PATH routing. For the former, we propose a 32-bit “address              link weights. The router ID is statically defined to be the
stack” field in the routing header into which the destination           smallest interface IP address.
IP address will be “pushed”. The destination field in the IP                                          39.3
                                                                                                             43
                                                                                                                   39.9         69.9 75 69.6
header is overwritten with the exit ASBR’s IP address. The                           43.3       3
                                                                                                      3.3
                                                                                                                           9                        6    67.6
                                                                                                                                             6.6
Exit ASBR will simply “pop” the destination address back                                 53             45
                                                                                                                                                   21          45
from the ”address stack” to the destination IP address. This                       43.4         51
                                                                                                             3.1                       6.2                      67.7
address stacking procedure (similar to MPLS) operates in                             4
                                                                                          4.4          4.1
                                                                                                             1
                                                                                                                   1.1 83        1.2
                                                                                                                                       2
                                                                                                                                             7.2
                                                                                                                                                    73
                                                                                                                                                         7.7
                                                                                                                                                               7
the fast processing path unlike the IP loose source routing                       45.4                       5.1                       8.2                      78.7
                                                                                                                                               38
option. Moreover, it allows flexibility for only a subset of                         67         5.5
                                                                                                           93
                                                                                                                                   67 8.8
                                                                                                                                                               51
                                                                                                             55
BGP routers to be upgraded to support such explicit exit                                 45.5 5
                                                                                                    51.5            51.1
                                                                                                                           10
                                                                                                                                81.1  81.8
                                                                                                                                           8 78.8
choice.
   For explicit AS-PATH forwarding we propose a new 32-bit
field in the packet routing header called the external PathID                             All IP−addresses denoted by a.b are actually 192.168.a.b
or e-PathID. This field stores a hash of the sequence of ASNs
along the desired explicit AS-PATH. ISPs may choose to                  Figure 7: Experimental Topology on Utah Emulab
only advertise a small set of multiple AS-PATHs to their                using Linux Zebra/Click Platforms (Note: only dark
selected neighbor AS’es. In a multi AS-PATH capable AS,                 colored nodes are multi-path capable)
only the entry ASBRs and exit ASBRs (i.e. only the EBGP
routers) need to be upgraded and synchronized on the avail-             6.1.1    All Paths with Partial Upgrades (DFS-PU Al-
able multiple AS paths. The incoming ePathID hash is                             gorithm)
swapped with the outgoing AS-PATH suffix hash only at                      Table 1 illustrates a partial forwarding table computed at
the exit AS border router. The forwarding from the entry                node 1 (IP address 192.168.1.1) for destination 3 (192.186.3.3).
ASBR to the exit ASBR uses the explicit exit mechanisms                 Note that the path string shown in Table 1 is only for the
described above. Multiple paths between the entry and exit              sake of illustration and is not stored in the actual routing
ASBRs are possible using the i-PathID mechanism described               table. The PathIDs are the (MD5 + CRC-32) hashes of the
earlier for intra-domain routing.                                       router IDs (i.e. IP addresses of nodes) on the path. For
                                                                        example, the PathID 2084819824 corresponds to a hash of
                                                                        the set of router IDs {192.168.1.1, 192.168.1.2, 192.168.6.6,
6.    IMPLEMENTATION AND SIMULATION
                                                                        192.168.39.9, 192.168.3.3 }. The outgoing path ID is the
      RESULTS                                                           hash of the suffix path formed after omitting 192.168.1.1. If
   In this section, we illustrate the working of the proposed           the path goes through other nodes which are not upgraded
framework. We have implemented the BANANAS frame-                       (e.g. 1-4-3), the outgoing path ID is the hash of the suffix
work schemes in the Linux kernel: we use MIT’s Click Mod-               path starting from the next upgraded router on the path.
Proceedings of the ACM SIGCOMM 2003 Workshops                     284                                                                                                  August 2003
In the case of the path 1-4-3, both nodes 4 and 3 are not                 Path         Incoming PathID                    Next-hop          Outgoing PathID
upgraded, so the suffix path ID is zero.                                  2-6             1973392862                       0.0.0.0            1973392862
                                                                          2-7-6           2123671348                     192.168.7.7          2123671348
 Outgoing I/f     Path         Incoming PathID     Outgoing PathID
 192.168.1.1      1-2-6-9-3    2084819824             664104731          Table 4: Part of routing table at 192.168.2.2 for des-
 192.168.3.1      1-3          599270449                  0              tination 192.186.6.6
 192.168.4.1      1-4-3        4183108560                 0
 192.168.5.1      1-5-4-3      1365378675                 0               Path          Incoming PathID                    Next-hop          Outgoing PathID
                                                                          2-8              3491782861                       0.0.0.0                 0
Table 1: Partial routing table at 192.168.1.1 for des-                    2-6-7-8          3645081405                     192.168.6.6               0
tination 192.186.3.3
                                                                         Table 5: Part of routing table at 192.168.2.2 for des-
6.1.2 k-Shortest Paths with Partial Upgrades                             tination 192.186.8.8
   In this section we illustrate, using the Linux implemen-
tation, the case when the upgraded routers compute upto
k-shortest paths, and different upgraded routers using dif-              those paths for destination node 7), and the i-PathIDs using
ferent values of k.                                                      index-based encodings. The node 6 may choose any one of
   Consider the 10-node topology shown in Figure 7. This                 these paths for a packet to node 7. We have verified that
topology was setup in the Emulab network. We assume that                 the progression of i-PathIDs through the network follows the
the routers 192.168.1.1 and 192.168.1.2 are upgraded with                description given in Section 3.2.
k equal to 3 and 2 respectively. The results are presented
to verify the correctness of the “validation phase” (Algo-
                                                                         6.3       Integrated OSPF/BGP SSFNet Simulation
rithm 2). Tables 2, 3 show respectively part of the rout-                   In this section we use SSFNet simulation results to il-
ing tables at 198.168.1.1 for destinations 198.168.6.6 and               lustrate the integrated operation of proposed framework in
198.168.8.8 respectively. Tables 4, 5 show the correspond-               the Internet. This example demonstrates both the intra-
ing entries at router 198.168.2.2. For destination 198.168.6.6           domain (OSPF) and inter-domain (BGP-4) operation of the
the router 198.168.1.1 finds 3 paths, all of which are valid as          framework with explicit AS-PATH as well as explicit exit
two paths have next-hop 198.168.2.2 and router 198.168.2.2               forwarding.
keeps 2 shortest paths. For destination 198.168.8.8, the                    Figure 9 shows the topology used for the results presented
router 198.168.1.1 computes 3-paths, 1-2-8, 1-2-6-7-8, 1-2-7-            in this section. The topology has eight (8) autonomous sys-
8. The path 1-2-7-8 is invalidated in the “validation phase”             tems (AS’es). Four of these AS’es, namely AS1, AS2, AS5
as router 198.168.2.2 only keeps 2 paths (2-8, 2-6-7-8). Note            and AS6, have been upgraded to support explicit AS-PATH
that the Path string is shown in Tables 2-5 for the purpose              forwarding. Even within these upgraded autonomous sys-
of explanation.                                                          tems, only a subset of routers are upgraded to support the
                                                                         explicit AS-PATH and explicit exit routing as described in
 Path        Incoming PathID      Next-hop      Outgoing PathID          Sections 5.3 and 5.2. The upgraded routers have been
 1-2-6          1989316858       192.168.1.2      3491782861             marked with a “U” in Figure 9. A blow-up of the inter-
 1-2-7-6        656924081        192.168.1.2      3645081405             nal topology of AS2 is shown in Figure 10; the upgraded
 1-3-9-6        534784006        192.168.3.3           0                 routers are again indicated with “U”
                                                                            Consider forwarding of a packet from AS1 to AS8 (see Fig-
Table 2: Part of routing table at 192.168.1.1 for des-                   ure 9). Given the constraints that only a partial set of AS’es
tination 192.186.6.6                                                     are upgraded, the following AS-PATHs may be used from
                                                                         AS1 to reach AS8: AS2-AS4-AS8, AS2-AS5-AS6-AS7-AS8
 Path         Incoming PathID      Next-hop      Outgoing PathID
                                                                         and AS2-AS5-AS6-AS4-AS8. These AS-PATHs and their
 1-2-8           3654096761       192.168.1.2      1973392862            corresponding e-PathIDs are indicated in Table 7, which
 1-2-7-6-8       1777786090       192.168.1.2      2123671348            is a part of the routing table at the AS border router in
Table 3: Part of routing table at 192.168.1.1 for des-                                                 19
tination 192.186.8.8                                                                                                              16
                                                                                              18                 17
6.2    Evaluation of Index-based Path Encoding                                                                                         14         13                12
       Scheme                                                                                          15
   The alternative index-based PathID encoding scheme was                                 2    1            2    1 4
implemented in the Linux kernel (MIT’s Click Router plat-
                                                                                                   4             9            8
                                                                                            10               3                                             11
                                                                                          3    5                     5
form) and simulated in SSFNet. We present our simula-                                      6
                                                                                                                                                                1
tion results in this section on a sizeable topology that cor-                                                                                              7
responds to the old MCI topology of 1995 [38].                                 1
                                                                                              5
                                                                                                               13
                                                                                                             2 4                              3
                                                                                                                                                       2
   In this configuration, only nodes 4, 6, 7, 9, 10 are up-                    6                            4     5
                                                                                   2
graded. The source node in this simulation is node 6. Ob-
serve that node 6 is the only node that computes the k-
                                                                                          2                                   1
shortest-paths (k = 5) for all destinations and runs the val-
idation algorithm (Algorithm 2). All other upgraded nodes
                                                                         Figure 8: Old MCI Topology: Used for Testing the
merely keep an index table as described in Section 3.1). Ta-
                                                                         Index-Based Scheme (Only Nodes 4, 6, 7, 9, 10 are
ble 6 shows a part of the forwarding table at node 6 (only
                                                                         upgraded)
Proceedings of the ACM SIGCOMM 2003 Workshops                      285                                                                                 August 2003
                                                                                                            Destination            Path          i-PathID
                                                                                                            0.0.2.107/32          5-4-3-2            17
                                                                                                            0.0.2.107/32         5-1-4-3-2           18
                                                                                                           0.0.2.107/32        5-4-11-7-2          1669
                                                                                                            0.0.2.107/32         5-4-8-7-2          201
                                                                                                            0.0.2.24/32      5-4-11-10-15-14         69
                                                                                                            0.0.2.24/32        5-4-8-7-6-14         169
                                                                                                            0.0.2.24/32       5-4-8-16-15-14        105
                                                                                                            0.0.2.24/32      5-1-4-8-16-15-14       106
                                                                                                            0.0.2.24/32     5-4-11-9-10-15-14       101
                                                                                                            0.0.2.24/32    5-1-4-11-9-10-15-14      102
                                                                                                   Table 9: Forwarding table at Router 5 in AS2 (Fig-
                                                                                                   ure 10): k Shortest Paths (k = 7)
                                                                                                   AS1. Note that the AS-PATH AS2-AS4-AS6-AS7-AS8 is
Figure 9: Topology used for integrated SSFNet sim-                                                 not available because AS4 is not upgraded, and uses a de-
ulation                                                                                            fault AS-PATH of AS4-AS8. Also in this simulation, we
                                                                                                   assumed that the upgraded routers do not do any further
        Path                          Next-Hop         i-PathID                                    filtering, i.e., they re-advertise all their available AS-PATHs
      6-2-4-3-7                           2            0b01110                                     to their neighboring AS’es.
 6-10-9-17-16-11-7                       10          0b00110001
                                                                                                      In our example simulation, the border router of AS1 chooses
   6-10-14-11-7                          10            0b00101
    6-10-9-4-3-7                         10         0b01110110001                                  the AS-PATH AS2-AS4-AS8, which corresponds to the e-
                                                                                                   PathID of 3535826417 (see the first row of Table 7). When
Table 6: Paths at node 6 for destination node 7                                                    the packet arrives at router 5 of AS2 (the entry ASBR), its
(Note: 0b indicates binary encoding)                                                               header looks like Figure 11(A). This entry ASBR (i.e. router
                                                                                                   5) of AS2 examines the incoming e-PathID to find the exit
                                  Forwarding Table of AS1 at Router 1                              ASBR to be node 2 with IP address 0.0.2.107 (see first row
   Dest           NextHop           In e-PathID    AS-PATH     Out e-PathID          Exit ASBR
 0.57/28          2.93/32          2025862315        2-4-8     3535826417             0.91/32
                                                                                                   of Table 8). Note that it does not swap the e-PathID field,
 0.57/28          2.93/32           4160716901     2-5-6-7-8    1248156781             0.91/32     because this will be done at the exit ASBR. To emphasize
 0.57/28          2.93/32            669121903     2-5-6-4-8    2630971039             0.91/32
                                                                                                   this point, observe that the outgoing e-PathID column in
                                                                                                   Table 8 is the same as the incoming e-PathID for the desti-
Table 7: Integrated OSPF/BGP Simulation: For-                                                      nation prefix 0.0.0.57/28.
warding Table of the Border Router in AS1 (Note:                                                      The entry ASBR (router 5) now “pushes” the destination
0.57/28 refers to IP address 0.0.0.57/28 etc)                                                      IP address (i.e. 0.0.0.57) into the address stack field and
                                 Forwarding Table   of AS2 at Router 5
                                                                                                   replaces it with the exit ASBR IP address. The entry ASBR
   Dest           NextHop          In e-PathID      ASPATH      Out e-PathID        Exit ASBR      also chooses a path within the AS to the exit ASBR. Table 9
 0.57/28          2.97/32          3535826417         2-4-8     3535826417           2.107/32
 0.57/28          2.113/32          3535826417        2-4-8      3535826417          2.107/32      shows the intra-domain paths available to reach exit ASBR
 0.57/28
 0.57/28
                   2.97/32
                  2.113/32
                                    1248156781
                                    1248156781
                                                    2-5-6-7-8
                                                    2-5-6-7-8
                                                                 1248156781
                                                                 1248156781
                                                                                      2.24/32
                                                                                      2.24/32
                                                                                                   (router 2). In this simulation, we have integrated the index-
                                                                                                   based PathID encoding scheme as well as the k-shortest path
                                                                                                   route computation scheme (k=7) with the OSPF protocol
Table 8: Integrated OSPF/BGP Simulation: For-                                                      running in AS2. In particular, the path 5-4-11-7-2 within the
warding Table Router 5 in AS2 (See Figure 10)                                                      AS is chosen that corresponds to a i-PathID of 1669 (see the
                                                                                                   third row of Table 9). The header fields of the packet at this
                                                                                                   stage are shown in Figure 11(B).
                                                                                                      The packet proceeds on the explicit intra-domain path (as
                                       19
                                                                                                   described in earlier sections) to reach the exit router 2 with
                                                                                                   an i-PathID value of 0. At this router, the destination ad-
                                       17
                                                                                                   dress (0.0.0.57) is “popped” back from the address stack.
                   18
                                                                               0.0.2.24/32         The e-PathID is also replaced with the outgoing e-PathID
                                16                                      15          14             of 1895667324 (see Figure 11(C)). Now the packet is sent
                                                                                     U
                   13
                                                                                                   to AS4, which is not upgraded, but sends the packet on its
                                        12          11                  10                         default policy AS-PATH, i.e., directly to AS8. In summary,
                                                                                                   we have shown how a distributed set of upgraded and non-
                                                                                                   upgraded nodes, with explicit paths independently selected
                                                                9
                    8                                                   6
                                                     7
                                                                                                   within upgraded AS’es can honor an explicit AS-PATH re-
   0.0..2.93/32                                                                                    quest of the source AS.
   5                4                   3            2   0.0.2.107/32
   U                    0.0.2.97/32                  U
                              U=Updated Router
                                                                                                   7. RELATED WORK
                                                                                                     Most related work for multipath routing have been done
                    1
                                                                                                   in the context of intra-domain protocols. OSPF, the most
Figure 10: Blow-up of AS2’s Internal Topology in                                                   common intra-domain routing protocol used in the Internet
the Integrated OSPF/BGP Simulation (Figure 9)                                                      today is based on single shortest path with equal splitting
Proceedings of the ACM SIGCOMM 2003 Workshops                                                286                                                     August 2003
     Dest IP Add EPathID       IPathID Add on Stack                                                encoded as the XOR of router IDs along the path, and is pro-
A                                                     At the entry router for AS2
     0.0.0.57     1248156781      −               −
                                                      ie router #5                                 cessed along the path using a series of XOR operations. The
                                                                                                   work in LIRA is a special case of the BANANAS framework.
                                                                                                   In particular, the authors do not consider the larger archi-
                                                                                                   tectural issues of partial upgrades, route-computation, state-
B
    Dest IP Add   EPathID      IPathID Add on Stack
                                                      At router 5@AS2 after address                computation tradeoffs, inter-domain operation etc. The fo-
    0.0.2.107     1248156781 1669      0.0.0.57       is pushed on stack                           cus in their paper was also different: a framework for service
                                                                                                   differentiation.
    Dest IP Add EPathID    IPathID Add on Stack
                                                      At router 2@AS2 which is the exit            8.    SUMMARY AND CONCLUDING REMARKS
C                                                      router after address is popped from
    0.0.0.57    1895667324    0          −                                                            The key contributions in this paper can be summarized
                                                      the stack
                                                                                                   as follows.
    Figure 11: Diagram Showing How e-PathID, i-                                                       a. Identification of abstract multipath architectural con-
    PathID and Destination Address Change in the In-                                               cepts (global PathID semantics, efficient path hashing) that
    tegrated OSPF/BGP Simulation                                                                   are crucial to avoiding the need for signaling and allowing
                                                                                                   incremental network upgrades in connectionless routing pro-
                                                                                                   tocols.
                                                                                                      b. Canonical multipath and explicit path realizations in
    between next-hops of equal cost paths. Lorenz et al [39]
                                                                                                   the context of legacy routing protocols: OSPF, BGP-4.
    show that OSPF routing performance could be improved by
                                                                                                      c. Demonstration of significant architectural flexibility:
    O(N ) if traffic-matrix aware explicit source-based multipath
                                                                                                   alternative PathID encodings, alternative route-computation
    routing is used (e.g. MPLS-based [40, 41]).
                                                                                                   algorithms (DFS-PU, ki -shortest paths), movement of com-
       Protocol extensions to support multipath routing (both
                                                                                                   plexity to edges, division of functions between data-plane
    in RIP and OSPF) have been studied by Narvaez et al [7],
                                                                                                   and control-plane, development of distributed validation al-
    Chen et al [6] and Vutukury et al [8]. In [7], authors propose
                                                                                                   gorithms etc.
    to find loop-free multipaths only by concatenating the short-
                                                                                                      d. Linux implementation results and integrated OSPF/BGP
    est paths of their neighbors with their link to the neighbors.
                                                                                                   simulation results to validate various options
    This approach essentially uses a depth first search with a
                                                                                                      These building blocks can be used in two broad ways.
    depth of 1, whereas we allow arbitrary depth in our DFS-PU
                                                                                                   First, in the context of traffic engineering within a partially
    algorithm. Chen et al and Vutukury et al [6, 8] propose more
                                                                                                   upgraded legacy network. An operator may want to emu-
    general multipath computations, but their schemes require
                                                                                                   late signaled capabilities in a connectionless network (e.g.
    the co-operation and upgrade of all the routers in the net-
                                                                                                   see [41, 39]) or might desire fine-grained traffic management
    work. Chen et al present a general concept of suffix-matched
                                                                                                   control hard to extract from parameter tweaking (e.g. see
    path identifier to allow multipath computation using dis-
                                                                                                   [30, 29, 31, 32]). The building blocks may be mixed and
    tributed computation, but they use local labels to realize the
                                                                                                   matched in a limited number of ways. For example, one
    path like in ATM networks [20] or MPLS [21]. Therefore,
                                                                                                   could select a MD5+CRC32 encoding for BGP-4 (i.e. e-
    they require a signaling protocol to map a global path spec-
                                                                                                   PathIDs) and a index-based encoding for OSPF (i-PathID).
    ification to locally assigned labels at each node.
                                                                                                   Obviously, a common encoding must be chosen across ISPs
       The proposed BANANAS framework allows source-based
                                                                                                   for the explicit AS-PATH case.
    multipath routing using a “PathID”. The use of a globally
                                                                                                      Second, and perhaps more important, the BANANAS
    significant path hash allows multipath capabilities without
                                                                                                   framework building blocks could form the long-term basis for
    signaling (i.e. in a connectionless manner) even in a partially
                                                                                                   a best-effort end-to-end path multiplicity model. Through
    upgraded network. The signaling requirement for source-
                                                                                                   the independent partial upgrades of nodes in different au-
    routing is seen in protocols like ATM networks, MPLS net-
                                                                                                   tonomous systems, end-systems can have a growing expecta-
    works [21] and NIMROD [12] routing (a link-state approach
                                                                                                   tion of multiple end-to-end paths. We strongly believe that
    to inter-domain routing). IPv4 [18] and IPv6 [19, 13] pro-
                                                                                                   such a mere expectation of end-to-end path multiplicity will
    vide a variable-length loose-source-routing option that may
                                                                                                   trigger substantial application innovation. To test this hy-
    be considered “data-plane” signaling. But IPv4/v6 uses a
                                                                                                   pothesis, we plan to deploy the BANANAS framework on
    uncompressed string of IP addresses in contrast to our effi-
                                                                                                   the PlanetLab infrastructure [22] as a public experimental
    cient PathID encoding schemes.
                                                                                                   wide-area network overlay service by Fall 2003.
       Even though MPLS has gained popularity in some large
    ISPs, many ISPs may prefer using OSPF/IS-IS to enable
    multipath and traffic engineering capabilities. This is due                                    9.    REFERENCES
    to the widespread deployment and operational experience
    available with OSPF/IS-IS. Our approach extends the OSPF/IS-                                        [1] G. Huston, “Commentary on inter-domain routing
    IS to allow such capabilities even in partially upgraded net-                                           in the internet,” RFC 3221, December 2001.
    works. Our index-based scheme offers significant reduction                                          [2] N. Spring, R. Mahajan, D. Wetherall, “Measuring
    of state complexity in comparison to MPLS label tables.                                                 ISP Topologies with Rocketfuel,” SIGCOMM 2002,
    Our computations can also be further optimized using incre-                                             Pittsburg PA, August 2002.
    mental k-shortest path algorithms similar to those suggested                                        [3] H. Tangmunarunkit, R. Govindan, S. Jamin, S.
    for OSPF’s Dijkstra algorithm [42, 43].                                                                 Shenker, W. Willinger, Network Topology
       In LIRA [11], Stoica et al briefly propose a forwarding                                              Generators – Structural vs. Degree-Based,
    scheme which they suggest could replace MPLS. A path is                                                 Proceedings of the ACM SIGCOMM, August 2002.
    Proceedings of the ACM SIGCOMM 2003 Workshops                                            287                                                    August 2003
   [4] Keshav, S., An Engineering Approach to Computer               Unequal Error Protected MPEG Video Streams for
       Networking, Addison-Wesley, 1997.                             Multiple Channel Transmission,” in IEEE
   [5] D. Eppstein, “Finding the k shortest Paths,”                  International Conf. on Image Processing, Rochester,
       Proceedings of 35th IEEE Symposium on                         NY, Sept 2002.
       Foundations on Computer Science (FOCS), pp.              [24] T. H. Cormen et. al. “Introduction to Algorithms,”
       154-165, 1994.                                                The MIT Press, McGraw Hill Book Company,
   [6] J. Chen, P.Druschel, D.Subramanian, “An Efficient             Second Edition, 2001.
       Multipath Forwarding Method,” in INFOCOM’98,             [25] Rka Albert, Hawoong Jeong, Albert-Lszl Barabsi,
       March, 1998.                                                  “Diameter of the World-Wide Web,” in NATURE,
   [7] P. Narvaez, K. Y. Siu, “Efficient Algorithms for              VOL 401,9, SEPTEMBER 1999.
       Multi-Path Link State Routing,” ISCOM’99,                [26] J. Moy, “OSPF Version 2,” IETF RFC 2328, April
       Kaohsiung, Taiwan, 1999.                                      1998.
   [8] S. Vutukury and J.J. Garcia-Luna-Aceves, “ A             [27] J. W. Stewart, “BGP-4 Inter-Domain Routing in
       Simple Approximation to Minimum-Delay                         the Internet,” Addison Wesley, 1999.
       Routing,” SIGCOMM ’99, September, 1999.                  [28] W. Norton, “Internet Service Providers and
   [9] D. O. Awduche, L. Berger, D. Gan, T. Li, G.                   Peering,” White Paper, 2002.
       Swallow, V. Srinivasan, “RSVP-TE: Extensions to          [29] T. Griffin, G. Wilfong, ”Analysis of the MED
       RSVP for LSP Tunnels,” IETF RFC 3209,                         oscillation problem in BGP,” Proceedings of ICNP
       December 2001.                                                2002, Paris, France, November 2002.
  [10] L. Andersson, P. Doolan, N. Feldman, A. Fredette,        [30] N. Feamster, J. Borkenhagen, and J. Rexford
       B. Thomas, “Label Distribution Protocol                       “Controlling the impact of BGP policy changes on
       Specification,” IETF RFC 3036, January 2001.                  IP traffic,” AT&T Research Technical Report
  [11] I. Stoica, H. Zhang, “LIRA: An Approach for                   011106-02, November 2001.
       Service Differentiation in the Internet,” in             [31] S. Hares et al, “Smart Routing Technologies,”
       Proceedings of NOSSDAV’98, Cambridge, England,                NANOG Panel, Toronto, June 2002.
       July 1998, pp. 115-128.                                       http://www.nanog.org/mtg-0206/smart.html
  [12] I. Castineyra, N. Chiappa, M. Steenstrup, “The           [32] R. Mahajan, D. Wetherall, T. Anderson,
       Nimrod Routing Architecture,” IETF RFC 1992,                  “Understanding BGP Misconfiguration,” In
       August 1996.                                                  Proceedings of ACM SIGCOMM, 2002.
  [13] M. O’Dell, “GSE – an alternate addressing                [33] T. Griffin, G. Wilfong, “On the Correctness of
       architecture for IPv6,” Expired Internet Draft, 1997.         IBGP Configuration,” Proceedings of ACM
  [14] D. G. Andersen, H. Balakrishnan, M. F. Kaashoek,              SIGCOMM 2002, Pittsburg PA, 2002.
       R. Morris, “Resilient Overlay Networks,” in              [34] E. Kohler, R. Morris, B. Chen, J. Jannotti, and M.
       Proceedings of 18th ACM Symposium on Operating                F. Kaashoek, “The Click modular router,” ACM
       Systems Principles, Banff, Canada, October 2001.              Transactions on Computer Systems, Vol. 18, No. 3,
  [15] S. Savage et al, “Detour: A Case for Informed                 August 2000, pages 263-297.
       Internet Routing and Transport,” IEEE Micro,             [35] GNU Zebra Open-Source Routing Software,
       volume 19, no. 1, January 1999.                               http://www.zebra.org/
  [16] R. Stewart, Q. Xie, K. Morneault, C. Sharp, H.           [36] J. Lepreau, “The Utah Emulab Network Testbed,”
       Schwarzbauer, T. Taylor, I. Rytina, M. Kalla, L.              http://www.emulab.net/
       Zhang, V. Paxson, “Stream Control Transmission           [37] Scalable Simulation Framework (SSF) Network
       Protocol,” IETF RFC 2960, October 2000.                       Models, available from http://www.ssfnet.org.
  [17] H-Y. Hsieh, R. Sivakumar, “A Transport Layer             [38] Q. Ma and P. Steenkiste, “On Path Selection for
       Approach for Achieving Aggregate Bandwidths on                Traffic with Bandwidth Guarantees,” Proceedings of
       Multi-homed Mobile Hosts,” Proceedings of ACM                 IEEE International Conference on Network
       Mobicom 2002, Atlanta, GA, September 2002.                    Protocols (ICNP), Atlanta, GA, October 1997.
  [18] DARPA INTERNET PROGRAM, “Internet                        [39] D. H. Lorenz, A. Orda, D. Raz, Y. Shavitt, “How
       Protocol,” IETF RFC 791, September 1981.                      good can IP routing be?”, DIMACS Technical
  [19] S. Deering, R. Hinden, “Internet Protocol, Version 6          Report 2001-17, May 2001.
       (IPv6) Specification,” IETF RFC 1883, 1995.              [40] D. Awduche, “MPLS and traffic engineering in IP
  [20] U. Black, “ATM, Volume I: Foundation for                      networks,” IEEE Communications Magazine, Vol.
       Broadband Networks,”, Prentice Hall, 2nd Edition,             37, No. 12, pp. 42-47, 1999.
       1999.                                                    [41] A. Elwalid, C. Jin and I. Widjaja, “MATE: MPLS
  [21] E. Rosen et al, “Multiprotocol Label Switching                Adaptive Traffic Engineering,” In Proceedings of
       Architecture,” IETF RFC 3031, January 2001.                   INFOCOM’01, April 2001.
  [22] L. Peterson, T. Anderson, D. Culler, T. Roscoe, “A       [42] G. Ramalingam, and T. Reps, “An incremental
       Blueprint for Introducing Disruptive Technology               algorithm for a generalization of the shortest-path
       into the Internet,” in Proceedings of the First ACM           problem,” Journal of Algorithms, Vol.21, 1996.
       Workshop on Hot Topics in Networks (HotNets-I), [43] P. Narvaez, K.Y. Siu and H.Y. Tzeng, “New Dynamic Algo-
       Princeton, NJ, October 2002.                          rithms for Shortest Path Tree Computation,” IEEE Trans-
  [23] W. Xu, S. S. Hemami, “Efficient Partitioning of       actions on Networking, Vol. 8, No. 6, Dec. 2000.
Proceedings of the ACM SIGCOMM 2003 Workshops             288                                               August 2003