0% found this document useful (0 votes)

131 views44 pages

BGP Innovations in Large Data Centers

The document discusses using BGP routing in large scale data centers. It describes: - The network design requirements of supporting east-west traffic at scale with high bandwidth and large numbers of servers and switches. - Why BGP was chosen over IGPs for the routing protocol due to its simpler design, easier troubleshooting, natural support for ECMP, and more constrained event propagation. - The specific BGP routing design implemented, including using a three-stage folded Clos topology, private ASNs, allowing AS in, and policies around default routing and summarization.

Uploaded by

palimarium

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

131 views44 pages

BGP Innovations in Large Data Centers

Uploaded by

palimarium

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 44

Experiences with BGP in Large Scale Data Centers:

Teaching an old protocol new tricks

Global Networking Services Team, Global Foundation Services, Microsoft Corporation

Agenda
Network design requirements
Protocol selection: BGP vs IGP
Details of Routing Design
Motivation for BGP SDN
Design of BGP SDN Controller
The roadmap for BGP SDN

Design Requirements
Scale of the data-center network:
100K+ bare metal servers
Over 3K network switches per DC
Applications:
Map/Reduce: Social Media, Web Index and Targeted Advertising
Public and Private Cloud Computing: Elastic Compute and Storage
Real-Time Analytics: Low latency computing leveraging distributed
memory across discrete nodes.

Key outcome:
East West traffic profile drives need for large bisectional
bandwidth.
3

Translating Requirement to Design

Network Topology Criteria

Network Protocol Criteria

Support East <-> West Traffic

Profile with no over-subscription
Minimize Capex and Opex
Cheap commodity switches
Low power consumption

Standards Based
Control Plane Scaling and Stability
Minimize resource consumption
e.g. CPU, TCAM usage predictable and low
Minimize the size of the L2 failure
domain
Layer3 equal-cost multipathing
(ECMP)
Programmable
Extensible and easy to
automate

Use Homogenous Components

Switches, Optics, Fiber etc
Minimize operational complexity
Minimize unit costs

Network Design: Protocol

Network Protocol Requirements
Resilience and fault containment
CLOS has high link count, link failure is common, so limit fault propagation on link
failure.

Control Plane Stability

Consider number of network devices, total number of links etc.
Minimize amount of control plane state.
Minimize churn at startup and upon link failure.

Traffic Engineering
Heavy use of ECMP makes TE in DC not as important as in the WAN.
However we still want to drain devices and respond to imbalances

Why BGP and not IGP?

Simpler protocol design compared to IGPs
Mostly in terms of state replication process
Fewer state-machines, data-structures, etc
Better vendor interoperability

Troubleshooting BGP is simpler

Paths propagated over link
AS PATH is easy to understand.
Easy to correlate sent & received state

ECMP is natural with BGP

Unique as compared to link-state protocols
Very helpful to implement granular policies
Use for unequal-cost Anycast load-balancing solution
7

Why BGP and not IGP? (cont.)

Event propagation is more constrained in BGP

More stability due to reduced event flooding domains

E.g. can control BGP UPDATE using BGP ASNs to stop info from
looping back
Generally is a result of distance-vector protocol nature

Configuration complexity for BGP?

Not a problem with automated configuration generation. Especially

in static environments such as data-center

What about convergence properties?

Simple BGP policy and route selection helps.

Best path is simply shortest path (respecting AP_PATH).
Worst case convergence is a few seconds, most cases less than a
second
8

Validating Protocol Assumptions

Lessons from Route Surge PoC Tests:
We simulated PoC tests using OSPF and BGP, details at end of Deck.
Note: some issues were vendor specific Link-state protocols could
be implemented properly!, but requires tuning.
Idea is that LSDB has many inefficient non-best paths.
On startup or link failure, these inefficient non-best paths become
best paths and are installed in the FIB.
This results in a surge in FIB utilization---Game Over.

With BGP, ASPATH keeps only useful paths---no surge.

Routing Design
Single logical link between devices, eBGP all the way down to the ToR.
Separate BGP ASN per ToR, ToR ASNs reused between containers.
Parallel spines (Green vs Red) for horizontal scaling.

~100 Spines

~200 Leafs
~2K ToR
~100K
10

BGP Routing Design Specifics

BGP AS_PATH Multipath Relax
For ECMP even if AS_PATH doesnt match.
Sufficient to have the same AS_PATH length

We use 2-octet private BGP ASNs

Simplifies path hiding at WAN edge (remove private AS)
Simplifies route-filtering at WAN edge (single regex).
But we only have 1022 Private ASNs

4-octet ASNs would work, but not widely supported

BGP Specifics: Allow AS In

This is a numbering problem: the
amount of BGP 16-bit private ASNs is
limited
Solution: reuse Private ASNs on the
ToRs.

Allow AS in on ToR eBGP sessions.

ToR numbering is local per
container/cluster.
Requires vendor support, but feature is
easy to implement
12

Default Routing and Summarization

Default route for external destinations only.
Dont hide server subnets.
O.W. Route Black-Holing on link failure!
If D advertises a prefix P, then some of the traffic
from C to P will follow default to A. If the link AD
fails, this traffic is black-holed.
If A and B send P to C, then A withdraws P when
link AD fails, so C receives P only from B, so all
traffic will take the link CB.
Similarly for summarization of server subnets.
13

Operational Issues with BGP

Lack of Consistent feature support:

Not all vendors support everything you need.

BGP Add-Path
32-bit ASNs
AS_PATH multipath relax

Interoperability issues:

Especially when coupled with CoPP and CPU

queuing (Smaller L2 domains helps---less dhcp)
Small mismatches may result in large outages!
14

Operational Issues with BGP

Unexpected default behavior

E.g. selecting best-path using oldest path

Combined with lack of as-path multipath relax
on neighbors

Traffic polarization due to hash function

reuse

This is not a BGP problem but you see it all the

time

Overly aggressive timers session flaps

on heavy CPU load
RIB/FIB inconsistencies
This is not a BGP problem but it is
consistently seen in all implementations

SDN Use Cases for Data-Center

Injecting ECMP Anycast prefixes
Already implemented (see references).
Used for software load-balancing in the network.
Uses a minimal BGP speaker to inject routes.
Moving Traffic On/Off of Links/Devices
Graceful reload and automated maintenance.
Isolating network equipment experiencing grey failures.
Changing ECMP traffic proportions
Unequal-cost load distribution in the network
E.g. to compensate for various link failures and re-balance traffic
(network is symmetric but traffic may not be).
17

BGP SDN Controller

Focus is the DC controllers scale
within DC, partition by cluster, region
and then global sync
Controller Design Considerations

Logical vs Literal
Scale - Clustering
High Availability
Latency between controller and network
element

Components of a Controller

Topology discovery
Path Computation
Monitoring and Network State Discovery
REST API

Analysis/
Correlation

BiG
DATA
REST API

Physical

Logical

Control

Path

Topology Module

Device
Manager
Controller
Collector

BGP
RIB MGR

PCE

Vendor
Agent

OPEN
FLOW

Monitor

State

Flow

Monitoring Module

BDM
SDK

Network Element

Controller is a component of a Typical

Software Orchestration Stack

BGP SDN Controller Foundations

Why BGP vs OpenFlow
No new protocol.
No new silicon.
No new OS or SDK bits.
Still need a controller.
Have literal SDN, software generates graphs that define physical, logical, and control planes.
Graphs define the ideal ground state, used for config generation.
Need the current state in real time.
Need to compute new desired state.
Need to inject desired forwarding state.
Programming forwarding via the RIB
Topology discovery via BGP listener (link state discovery).
RIB manipulation via BGP speaker (injection of more preferred prefixes).

Controller

Network Setup
Templates to peer with the central controller
(passive listening)
Policy to prefer routes injected from controller
Policy to announce only certain routes to the
controller

AS 64XXX

AS
64902

AS 64XXX

AS
64901

AS 64XXX

Multi-hop peering with all devices.

Key requirement: path resiliency
CLOS has very rich path set, network
partition is very unlikely.

AS
65501

Only Partial Peering Set

Displyaed

SDN Controller Design

Implemented a C# version
P.O.C used ExaBGP
eBGP
Sessions

REST API

BGP Speaker [stateful]

API to announce/withdraw a route.
Keep state of announced prefixes

Write
&
Notify

Speaker
Thread
Decision
Thread

Managed
Devices

Command
Center

Inject Route Command:

Prefix + Next-Hop + Router-ID

State
Sync Thread

BGP Listener [stateless]

Tell controller of prefixes received.
Tell controller of BGP up/down.

Receive Route Message:

Prefix + Router-ID

Listener
Thread

Wakeup
&
Read

Shared
State
Database

Network Graph
(bootstrap information)

Building Network Link State

Use a special form of control plane ping
Rely on the fact that BGP session reflects
link health
Assumes single BGP session b/w two devices
Create a /32 prefix for every device, e.g. R1.
Inject prefix into device R1.
Expect to hear this prefix via all devices
R2Rn directly connected to R1.
If heard, declare link R1 --- R2 as up.
Community tagging + policy ensures prefix only leaks one
hop from point of injection, but is reflected to the
controller.

Controller
Inject
Prefix for R1
with one-hop
community

Expect
Prefix for R1
from R2

R2
Prefix for R1
relayed

Prefix for R1
NOT relayed

Overriding Routing Decisions

The controller knows of all server subnets and devices.
The controller runs SPF and

Computes next hops for every server subnet at every device

Checks if this is different from static network graph decisions
Only pushes the deltas
These prefixes are pushed with third party next-hops (next slide)
and a better metric.

Controller has full view of the topology

Zero delta if no difference from default routing behavior
Controller may declare a link down to re-route traffic
23

Overriding Routing Decisions cont.

Injected routes have third-party next-hop

Those need to be resolved via BGP
Next-hops have to be injected as well!
A next-hop /32 is created for every device
Same one hop BGP community used

Controller
Inject
Prefix X/24
with Next-Hops:
N1, N2

Inject
Next-Hop
prefix N1/32
Inject
Next-Hop
prefix N2 /32

R2
R1

By default only one path allowed per

BGP session
Need either Add-Path or multiple
peering sessions
Worst case: # sessions = ECMP fan-out
Add-Path Receive-Only would help!

Next-hop prefix:
N1 /32

Next-hop prefix:
N2 /32

Overriding Routing Decisions cont.

Simple REST to manipulate network state overrides
Supported calls:

Logically shutdown/un-shutdown a link

Logically shutdown/un-shutdown a device
Announce a prefix with next-hop set via a device
Read current state of the down links/devices

PUT http://<controller>/state/link/up=R1,R2&down=R3,R4

State is persistent across controller reboots

State is shared across multiple controllers
25

Ordered FIB Programming

(2) Update these
devices second

If updating BGP RIBs on devices in

random order
RIB/FIB tables could go out of sync
Micro-loops problem!

S1
This link
overloaded

R1
Prefix X

R3
(1) Update these
devices first

Link b/w R2 and R4 goes down but R1 does not

know that

Traffic Engineering

R4
100%

This link
congested

Failures may cause traffic imbalances

This includes:
Physical failures
Logical link/device overloading

50%

50%
50%

R4
50%

Congestion
alleviated

R2
25%

75%

25%

75%

R1
Controller installs path with different ECMP
weights

Traffic Engineering (cont.)

Requires knowing
traffic matrix (TM)
Network topology and capacities
Solves Linear Programming problem
Computes ECMP weights
For every prefix
At every hop
Optimal for a given TM
Link state change causes reprogramming
More state pushed down to the network

66%

33%

Ask to the vendors!

Most common HW platforms can do it (e.g. Broadcom)
Signaling via BGP does not look complicated either
Note: Has implications on hardware resource usage

Goes well with weighted ECMP

Well defined in RFC 2992

Not a standard (sigh)

We really like receive-only functionality
30

What we learned
Does not require new firmware, silicon, or APIs.
Some BGP extensions are nice to have.
BGP Code is tends to be mature .
Easy to roll-back to default BGP routing.
Solves our current problems and allows solving more.

Questions?
Contacts:
Edet Nkposong - [email protected]
Tim LaBerge - [email protected]
Naoki Kitajima - [email protected]

References
http://datatracker.ietf.org/doc/draft-lapukhov-bgp-routing-large-dc/
http://code.google.com/p/exabgp/
http://datatracker.ietf.org/doc/draft-ietf-idr-link-bandwidth/
http://datatracker.ietf.org/doc/draft-lapukhov-bgp-sdn/

http://www.nanog.org/meetings/nanog55/presentations/Monday/Lapukhov.pdf
http://www.nanog.org/sites/default/files/wed.general.brainslug.lapukhov.20.pdf
http://research.microsoft.com/pubs/64604/osr2007.pdf
http://research.microsoft.com/en-us/people/chakim/slbsigcomm2013.pdf
34

Backup Slides

OSPF - Route Surge Test

Test bed that emulates 72 PODSETs
Each PODSET comprises 2 switches
Objective study system and route table behavior when
control plane is operating in a state that mimics production
SPINE
R1

PODSET
SW 1

PODSET
SW 2

PODSET 1

PODSET
SW 1

PODSET
SW 2

PODSET 2

-----

PODSET
SW 1

PODSET
SW 2

PODSET 72

Test Bed
4 Spine switches
144 VRFs created on a router
each VRF = 1x podset switch
Each VRF has 8 logical interfaces
(2 to each spine)
This emulates the 8-way required
by the podset switch
3 physical podset switches
Each podset carries 6 server-side
37
IP Subnets

Test Bed
Route table calculations
Expected OSPF state
144 x 2 x 4 = 1152 links for infrastructure
144 x 6 = 864 server routes (although these will be 4-way since we have brought
everything into 4 spines (instead of 8)
Some loopback addresses and routes from the real podset switches
We expect ~ (144 x 2 x 4) + (144 x 6) 144 = 1872 routes

Initial testing proved that the platform can sustain this scale (control and forwarding
plane)
What happens when we shake things up ?

2:30:32
2:30:40
2:30:45
2:30:52
2:30:59
2:31:06
2:31:13
2:31:22
2:31:33
2:31:45
2:31:57
2:32:10
2:32:22
2:32:34
2:32:45
2:32:56
2:33:06
2:33:18
2:33:29
2:33:43
2:33:54
2:34:12
2:34:24
2:34:31

OSPF Surge Test

Effect of bringing up 72 podset (144 OSPF neighbors) all at once

Route Table Growth 7508a

14000

12000

10000

8000

6000

4000

2000

OSPF Surge Test

Sample route
O

192.0.5.188/30 [110/21] via

via
via
via
via
via
via
via
via
via
via
via
via
via
via
via

Route Table Growth 7508a

14000
12000
10000
8000
6000
4000
2000
0

2:30:32
2:30:45
2:30:59
2:31:13
2:31:33
2:31:57
2:32:22
2:32:45
2:33:06
2:33:29
2:33:54
2:34:24

Why the surge ?

As adjacencies come up, the spine learns
about routes through other podset switches
Given that we have 144 podset switches, we
expect to see 144-way routes although only
16-way routes are accepted

192.0.1.33
192.0.2.57
192.0.0.1
192.0.11.249
192.0.0.185
192.0.0.201
192.0.2.25
192.0.1.49
192.0.0.241
192.0.11.225
192.0.1.165
192.0.0.5
192.0.12.53
192.0.1.221
192.0.1.149
192.0.0.149

Route table reveals that we can have 16-way

routes for any destination including infrastructure
routes
This is highly undesirable but completely expected
and normal
40

OSPF Surge Test

Instead of installing a 2-way towards the podset
switch, the spine ends-up installing a 16-way for
podset switches that are disconnected
If a podset switch-spine link is disabled, the spine will
learn about this particular podset switches IP subnets
via other podset switches
Unnecessary 16-way routes

6 server
vlans
PODSET
SW 2

For every disabled podset switch-spine link, the spine

will install a 16-way route through other podset
switches
The surge was enough to fill the FIB (same timeline
as graph on slide 12)
sat-a75ag-poc-1a(s1)#show log| inc OVERFLOW
2011-02-16T02:33:32.160872+00:00 sat-a75ag-poc-1a SandCell: %SAND-3ROUTING_OVERFLOW: Software is unable to fit all the routes in hardware
due to lack of fec entries. All routed traffic is being dropped.

PODSET
SW 1

6 server
vlans
41

BGP Surge Test

BGP design
Spine AS 65535
PODSET AS starting at 65001,
65002 etc
SPINE AS 65535
R1

PODSET
SW 1

PODSET
SW 2

PODSET
SW 1

PODSET
SW 2

-----

PODSET
SW 1

PODSET
SW 2

PODSET 1

PODSET 2

PODSET 72

AS 65001

AS 65002

AS 65072

BGP Surge Test

Effect of bringing up 72 PODSETs (144 BGP neighbors) all
at once
Route Table Growth 7508a
1800
1600
1400
1200
1000
800
600
400
200
0

OSPF vs BGP Surge Test Summary

With the proposed design, OSPF exposed a potential surge
issue (commodity switches have smaller TCAM limits) could
be solved by specific vendor tweaks non standard.
Network needs to be able to handle the surge and any
additional 16-way routes due to disconnected spine-podset
switch links
Protocol enhancements required
Prevent infrastructure routes from appearing as 16-way.

BGP advantages
Very deterministic behavior
Protocol design takes care of eliminating the surge effect (i.e. spine
wont learn routes with its own AS)
ECMP supported and routes are labeled by the podset they came from
(AS #) beautiful !
44

Brain-Slug: A BGP-Only SDN For Large-Scale Data-Centers: Global Foundation Services
No ratings yet
Brain-Slug: A BGP-Only SDN For Large-Scale Data-Centers: Global Foundation Services
29 pages
Advanced BGP Network Design For Stability and Security
No ratings yet
Advanced BGP Network Design For Stability and Security
14 pages
Lapukhov: BGP As IGP
No ratings yet
Lapukhov: BGP As IGP
28 pages
SDN Challenges and Solutions
No ratings yet
SDN Challenges and Solutions
54 pages
BGP Slides
No ratings yet
BGP Slides
60 pages
Singh Harsimran
No ratings yet
Singh Harsimran
76 pages
BGP Oreilly PDF
No ratings yet
BGP Oreilly PDF
89 pages
Fundamentals of BGP Operations
No ratings yet
Fundamentals of BGP Operations
5 pages
BGP (Border Gateway Protocol)
No ratings yet
BGP (Border Gateway Protocol)
4 pages
Agahian The Proper Way v1
No ratings yet
Agahian The Proper Way v1
57 pages
Bootcamp Building Core Networks
100% (1)
Bootcamp Building Core Networks
4 pages
Lecture 9+10 DT
No ratings yet
Lecture 9+10 DT
63 pages
Introduction To BGP
No ratings yet
Introduction To BGP
4 pages
Cern Thesis 2018 035 PDF
No ratings yet
Cern Thesis 2018 035 PDF
113 pages
Distance Vector Protocols Suffer From Count-To-Infinity - Link State Protocols Must Flood Information Through Network
No ratings yet
Distance Vector Protocols Suffer From Count-To-Infinity - Link State Protocols Must Flood Information Through Network
12 pages
Gogte Institute of Technology: Department of Electronics and Communication Engineering
No ratings yet
Gogte Institute of Technology: Department of Electronics and Communication Engineering
10 pages
BGP
No ratings yet
BGP
283 pages
Basics of Routing: Link State
No ratings yet
Basics of Routing: Link State
33 pages
SG 00298194
No ratings yet
SG 00298194
90 pages
BGP
No ratings yet
BGP
31 pages
Total BGP For Engineers
No ratings yet
Total BGP For Engineers
2 pages
Ospf Eigrp BGP
No ratings yet
Ospf Eigrp BGP
4 pages
300 420 Ensld
No ratings yet
300 420 Ensld
7 pages
Apnic36 BGP Techniques - 1377435604 PDF
No ratings yet
Apnic36 BGP Techniques - 1377435604 PDF
209 pages
CCIEEI v1.1 Final - Clean PDF
No ratings yet
CCIEEI v1.1 Final - Clean PDF
6 pages
All Notes
No ratings yet
All Notes
127 pages
Border Gateway Protocol (BGP) : A Comprehensive Technical Review For Senior Network Engineers in Enterprise Environments
No ratings yet
Border Gateway Protocol (BGP) : A Comprehensive Technical Review For Senior Network Engineers in Enterprise Environments
6 pages
AJER-12.a Student Guide Volume 2
No ratings yet
AJER-12.a Student Guide Volume 2
362 pages
Lesson 5 BGP
No ratings yet
Lesson 5 BGP
17 pages
Running BGP in Data Centers at Scale Final
No ratings yet
Running BGP in Data Centers at Scale Final
17 pages
Inter-Domain Routing: Outline
No ratings yet
Inter-Domain Routing: Outline
12 pages
Advancedservices'Building Corenetworkswith Ospf, Is-Is, BGP, Andmplsbootcamp (BCN) V6
No ratings yet
Advancedservices'Building Corenetworkswith Ospf, Is-Is, BGP, Andmplsbootcamp (BCN) V6
4 pages
Net-Mid Merge
No ratings yet
Net-Mid Merge
46 pages
ND - Assignment3 - Roshan Kumar Thapa
No ratings yet
ND - Assignment3 - Roshan Kumar Thapa
5 pages
Path Allocation in Backbone Networks Pro
No ratings yet
Path Allocation in Backbone Networks Pro
124 pages
31363-Doc-Session 5-1 - BGP Scaling Techniques
No ratings yet
31363-Doc-Session 5-1 - BGP Scaling Techniques
62 pages
1 Scaling BGP
No ratings yet
1 Scaling BGP
61 pages
Chapter 2 - Routing Protocols (BGP)
No ratings yet
Chapter 2 - Routing Protocols (BGP)
31 pages
Lecture 10 Control Plane Functions 28th July
No ratings yet
Lecture 10 Control Plane Functions 28th July
39 pages
12 BGP
No ratings yet
12 BGP
6 pages
BGP Load Balancing Guide
No ratings yet
BGP Load Balancing Guide
20 pages
BGP Scaling Techniques
No ratings yet
BGP Scaling Techniques
61 pages
6.888 Software Defined Networking: Mohammad Alizadeh
No ratings yet
6.888 Software Defined Networking: Mohammad Alizadeh
50 pages
BGP With Step by Step Configuration IOS IOS XR 1684049580
No ratings yet
BGP With Step by Step Configuration IOS IOS XR 1684049580
48 pages
Cis185 ROUTE 6 EnterpriseInternetConnectivity
No ratings yet
Cis185 ROUTE 6 EnterpriseInternetConnectivity
83 pages
p13 PDF
No ratings yet
p13 PDF
6 pages
AD Traffic Engineering
No ratings yet
AD Traffic Engineering
47 pages
Sar 7705
No ratings yet
Sar 7705
2,170 pages
CSE390 - Advanced Computer Networks: Lecture 22: Software Designed Networking
No ratings yet
CSE390 - Advanced Computer Networks: Lecture 22: Software Designed Networking
51 pages
Cis185 ROUTE Lecture6 BGP Part1
No ratings yet
Cis185 ROUTE Lecture6 BGP Part1
98 pages
BGP Understanding-Part 1
No ratings yet
BGP Understanding-Part 1
109 pages
BGP Fundamentals: Part 1: Facebook22Twitterlinkedinpinterestwhatsapp
No ratings yet
BGP Fundamentals: Part 1: Facebook22Twitterlinkedinpinterestwhatsapp
7 pages
Net 5
No ratings yet
Net 5
40 pages
Wireless Mobile Computer
No ratings yet
Wireless Mobile Computer
18 pages
Skidmore 1587051095
No ratings yet
Skidmore 1587051095
2 pages
BGP and Interdomain Routing Guide
No ratings yet
BGP and Interdomain Routing Guide
67 pages
Advanced Networking Course Guide
100% (1)
Advanced Networking Course Guide
282 pages
ND - Assignment4 - Roshan Kumar Thapa
No ratings yet
ND - Assignment4 - Roshan Kumar Thapa
8 pages
Unit 2 - Advance Network Protocols and Techniques
No ratings yet
Unit 2 - Advance Network Protocols and Techniques
6 pages
Cern Cloud Architecture 160216111709
No ratings yet
Cern Cloud Architecture 160216111709
18 pages
Puppet WP Devops Get Started Guide For It Managers
No ratings yet
Puppet WP Devops Get Started Guide For It Managers
24 pages
EuroPython 2016 Schedule
No ratings yet
EuroPython 2016 Schedule
7 pages
Linux Storage Stack Diagram v4.10
No ratings yet
Linux Storage Stack Diagram v4.10
1 page
Continuous Integration and Deployment With Rancher and Docker
No ratings yet
Continuous Integration and Deployment With Rancher and Docker
47 pages
Containers and OpenStack
No ratings yet
Containers and OpenStack
19 pages
VSP Install - Book
No ratings yet
VSP Install - Book
67 pages
Mesosphere Modern Ent Apps Operations With Dcos
No ratings yet
Mesosphere Modern Ent Apps Operations With Dcos
16 pages
Comparing Rancher Orchestration Engine Options
No ratings yet
Comparing Rancher Orchestration Engine Options
16 pages
Data Center Network Architecture
100% (2)
Data Center Network Architecture
32 pages
ODL and OpenStack - Workshop - 0
No ratings yet
ODL and OpenStack - Workshop - 0
38 pages
TS-251A & TS-451A: 2-Bay & 4-Bay Turbo NAS
No ratings yet
TS-251A & TS-451A: 2-Bay & 4-Bay Turbo NAS
8 pages
SSH Tips Tricks
No ratings yet
SSH Tips Tricks
12 pages
Rhoton - OpenStack Cloud Computing Ebook
No ratings yet
Rhoton - OpenStack Cloud Computing Ebook
67 pages
Python Crash Course Strings, Math
No ratings yet
Python Crash Course Strings, Math
27 pages
Python Crash Course Programming
No ratings yet
Python Crash Course Programming
28 pages
Introducing Open Platform For NFV: Please Direct Any Questions
No ratings yet
Introducing Open Platform For NFV: Please Direct Any Questions
22 pages
How To Split A Zone in Sub Zones
No ratings yet
How To Split A Zone in Sub Zones
2 pages
Classroom Requirements: Red Hat Training & Certification
100% (1)
Classroom Requirements: Red Hat Training & Certification
11 pages
Red Hat: Training and Certification
No ratings yet
Red Hat: Training and Certification
40 pages
Company Profile-23.8.19
No ratings yet
Company Profile-23.8.19
15 pages
Foxit Phantompdf User Manual
No ratings yet
Foxit Phantompdf User Manual
338 pages
Plant PAx
No ratings yet
Plant PAx
76 pages
The Effect of Social Media To The Brand Awareness of A Product of A Company
No ratings yet
The Effect of Social Media To The Brand Awareness of A Product of A Company
6 pages
ICTNWK529 - Session 1 Plan and Design A Complex Network 09.09.2020
100% (1)
ICTNWK529 - Session 1 Plan and Design A Complex Network 09.09.2020
26 pages
Technical Schematic for Engineers
No ratings yet
Technical Schematic for Engineers
6 pages
User Research, Quick 'N' Dirty
100% (10)
User Research, Quick 'N' Dirty
77 pages
(3) Nguyễn Trường Sinh, Lê Minh Hoàng, Hoàng Đức Hải (2005) - Sử Dụng PHP & MySQL Thiết Kế Web Động. NXB Thống Kê
No ratings yet
(3) Nguyễn Trường Sinh, Lê Minh Hoàng, Hoàng Đức Hải (2005) - Sử Dụng PHP & MySQL Thiết Kế Web Động. NXB Thống Kê
9 pages
Grafic Rambursare Dana Mihaela Platon 31523011ky52768091
No ratings yet
Grafic Rambursare Dana Mihaela Platon 31523011ky52768091
4 pages
HR Advertisement Only
No ratings yet
HR Advertisement Only
2 pages
Communication Skills & Tech in Education
No ratings yet
Communication Skills & Tech in Education
8 pages
Medical Device Software Setup Guide
No ratings yet
Medical Device Software Setup Guide
48 pages
How To Find The Name of A Porn Star by Image - Quora
0% (1)
How To Find The Name of A Porn Star by Image - Quora
1 page
Nokia - NB-IoT Deployment
No ratings yet
Nokia - NB-IoT Deployment
33 pages
NN46205-701 03.01 Logs-Reference PDF
No ratings yet
NN46205-701 03.01 Logs-Reference PDF
2,302 pages
Đất nước học
No ratings yet
Đất nước học
24 pages
Castle Zagyg - Yggsburgh - Town Halls
100% (6)
Castle Zagyg - Yggsburgh - Town Halls
49 pages
Identity, Identifi Ers and Identity Fraud: January 2005
No ratings yet
Identity, Identifi Ers and Identity Fraud: January 2005
8 pages
Britanico - Vocabulary Booklet
0% (3)
Britanico - Vocabulary Booklet
17 pages
Improving Crystal Reports Performance in Visual Studio .NET Applications
No ratings yet
Improving Crystal Reports Performance in Visual Studio .NET Applications
12 pages
Dan Snyder's Suit Against Media Entertainment Arts WorldWide
No ratings yet
Dan Snyder's Suit Against Media Entertainment Arts WorldWide
16 pages
3bds009030-600 C en Ac 800m 6.0 Profibus DP Configuration
No ratings yet
3bds009030-600 C en Ac 800m 6.0 Profibus DP Configuration
106 pages
Live Link For Excel Users Guide
No ratings yet
Live Link For Excel Users Guide
48 pages
Industrial-Grade Luna Cloud Voice Server
No ratings yet
Industrial-Grade Luna Cloud Voice Server
1 page
Ubuntu Install
No ratings yet
Ubuntu Install
6 pages
Arpit Sharma's Networking CV
No ratings yet
Arpit Sharma's Networking CV
3 pages
Advanced IP Addressing
No ratings yet
Advanced IP Addressing
14 pages
FortiGate Security
No ratings yet
FortiGate Security
3 pages
Screenshot 2024-03-21 at 09.48.05
No ratings yet
Screenshot 2024-03-21 at 09.48.05
1 page
ITN Module 1
No ratings yet
ITN Module 1
61 pages

BGP Innovations in Large Data Centers

Uploaded by

BGP Innovations in Large Data Centers

Uploaded by

Experiences with BGP in Large Scale Data Centers:

Teaching an old protocol new tricks

Global Networking Services Team, Global Foundation Services, Microsoft Corporation

Translating Requirement to Design

Network Protocol Criteria

Support East <-> West Traffic

Use Homogenous Components

Network Design: Topology

Network Design: Protocol

Control Plane Stability

Why BGP and not IGP?

Troubleshooting BGP is simpler

ECMP is natural with BGP

Why BGP and not IGP? (cont.)

More stability due to reduced event flooding domains

Configuration complexity for BGP?

Not a problem with automated configuration generation. Especially

What about convergence properties?

Simple BGP policy and route selection helps.

Validating Protocol Assumptions

With BGP, ASPATH keeps only useful paths---no surge.

BGP Routing Design Specifics

We use 2-octet private BGP ASNs

4-octet ASNs would work, but not widely supported

BGP Specifics: Allow AS In

Allow AS in on ToR eBGP sessions.

Default Routing and Summarization

Operational Issues with BGP

Lack of Consistent feature support:

Not all vendors support everything you need.

Especially when coupled with CoPP and CPU

Operational Issues with BGP

E.g. selecting best-path using oldest path

Traffic polarization due to hash function

This is not a BGP problem but you see it all the

Overly aggressive timers session flaps

SDN Use Cases for Data-Center

BGP SDN Controller

Controller is a component of a Typical

BGP SDN Controller Foundations

Multi-hop peering with all devices.

Only Partial Peering Set

SDN Controller Design

BGP Speaker [stateful]

Inject Route Command:

BGP Listener [stateless]

Receive Route Message:

Building Network Link State

Overriding Routing Decisions

Computes next hops for every server subnet at every device

Controller has full view of the topology

Overriding Routing Decisions cont.

Injected routes have third-party next-hop

By default only one path allowed per

Overriding Routing Decisions cont.

Logically shutdown/un-shutdown a link

State is persistent across controller reboots

Ordered FIB Programming

If updating BGP RIBs on devices in

Link b/w R2 and R4 goes down but R1 does not

Failures may cause traffic imbalances

Traffic Engineering (cont.)

Ask to the vendors!

Goes well with weighted ECMP

Not a standard (sigh)

OSPF - Route Surge Test

OSPF Surge Test

Effect of bringing up 72 podset (144 OSPF neighbors) all at once

Route Table Growth 7508a

OSPF Surge Test

192.0.5.188/30 [110/21] via

Route Table Growth 7508a

Why the surge ?

Route table reveals that we can have 16-way

OSPF Surge Test

For every disabled podset switch-spine link, the spine