(VG) H3C Data Center Network Solution Underlay - Network Design Guide
(VG) H3C Data Center Network Solution Underlay - Network Design Guide
Copyright © 2022 New H3C Technologies Co., Ltd. All rights reserved.
No part of this manual may be reproduced or transmitted in any form or by any means without prior written consent of New
H3C Technologies Co., Ltd.
Except for the trademarks of New H3C Technologies Co., Ltd., any trademarks that may be mentioned in this document
are the property of their respective owners.
The information in this document is subject to change without notice.
Contents
Overview ······································································································· 1
Solution architecture ······················································································ 1
Features ····················································································································································· 1
Network design principles ·························································································································· 1
Leaf gateway design ·································································································································· 2
Routing protocol selection ·························································································································· 5
Capacity planning······································································································································· 8
Multi-fabric expansion ······························································································································ 10
Recommended hardware ························································································································· 12
Access design ····························································································· 12
Server access design ······································································································································· 12
Connecting border devices to external gateways ···························································································· 14
M-LAG topology ······································································································································· 14
Triangle topology······································································································································ 15
Square topology ······································································································································· 16
IP planning principles ······································································································································· 17
Deployment modes ······················································································ 17
About the deployment modes ·························································································································· 17
Management network configuration ················································································································· 18
Overview
A data center network interconnects servers in a data center, distributed data centers, and data
center and end users. Data center underlay connectivity protocols have gradually evolved from
primarily Layer 2 protocols to primarily IP routing protocols. Driven by the scale of computing, the
physical topology of data centers has evolved from an access-aggregation-core three-tier network
architecture to a CLOS-based two-tier spine-leaf architecture.
This document describes the underlay network that uses the spine-leaf architecture.
Solution architecture
The underlay network uses OSPF or EBGP for communication between servers or between servers
and the external network, and uses M-LAG between leaf nodes for access reliability.
Select appropriate switch models and configure spine and leaf nodes as needed based on the
access server quantity, interface bandwidth and type, and convergence ratio to build a tiered DC
network for elastic scalability and service deployment agility.
Features
A typical data center fabric network offers the following features:
• Support for one or multiple spine-leaf networks.
• Flexible configuration and elastic scaling of spine and leaf nodes.
• VLAN deployment for isolation at Layer 2, and VPN deployment for isolation at Layer 3.
As a best practice, use the following network models in a spine-leaf network:
• Configure M-LAG for leaf nodes for access availability.
• Interconnect leaf and spine devices at Layer 3, and configure OSPF or EBGP for equal-cost
multi-path load balancing and link backup.
1
Figure 1 Spine-leaf architecture
Spin e 1 Spin e 2
(BGP E VPN RR) (BGP E VPN RR)
ECMP
Spine design
Use Layer 3 Ethernet interfaces to interconnect spine and leaf nodes to build an all-IP fabric.
Leaf design
• Configure M-LAG, S-MLAG, or IRF on the leaf nodes to enhance availability and avoid single
points of failure. As a best practice, configure M-LAG on the leaf nodes.
• Connect each leaf node to all spine nodes to build a full mesh mode network.
• Leaf nodes are typically Top of Rack (ToR) devices. As a best practice to decrease deployment
complexity, use the controller to deploy configuration automatically or use zero-touch
provisioning (ZTP) for deployment.
ZTP automates loading of system software, configuration files, and patch files when devices
with factory default configuration or empty configuration start up.
2
• Both the VRRP master and backup devices perform Layer 3
forwarding, but only the master device responds to ARP packets.
VRRP group
• The server-side devices can set up dynamic routing neighbor
relationships with the M-LAG member devices.
Server
To configure dual-active VLAN interfaces on the M-LAG system, perform the following tasks:
1. Create a gateway VLAN interface on each M-LAG member device for the same VLAN.
2. Assign the same IPv4 and IPv6 addresses and MAC address to the gateway VLAN interfaces.
The dual-active VLAN interfaces operate as follows:
• Each M-LAG member device forwards traffic locally instead of forwarding traffic towards the
M-LAG peer over the peer link. For example, Leaf 1 will directly respond to an ARP request
received from the server.
• When one uplink fails, traffic is switched to the other uplink. For example, when Leaf 1 is
disconnected from the spine device, traffic is forwarded as follows.
Downlink traffic is switched to Leaf 2 for forwarding.
When receiving uplink traffic destined for the spine device, Leaf 2 forwards the traffic locally.
When Leaf 1 receives uplink traffic, it sends the traffic to Leaf 2 over the peer link, and Leaf
2 forwards the traffic to the spine device.
• Traffic is load shared between the access links to increase bandwidth utilization.
The dual-active VLAN interfaces use the same IP address and MAC address. As shown in Figure 3
and Table 2, for the M-LAG member devices to set up routing neighbor relationships with the
server-side network device, perform the following tasks:
• Use the port m-lag virtual-ip or port m-lag ipv6 virtual-ip command to
assign an M-LAG virtual IP address to the dual-active VLAN interfaces.
• Configure routing protocols.
The dual-active VLAN interfaces will use the virtual IP addresses to establish routing neighbor
relationships.
3
Figure 3 Routing neighbor relationship setup
Spine
32.1.1.2/24 33.1.1.2/24
Server
100.1.1.103/24
Tasks Forwarding
• Dual-active VLAN interface configuration:
a. Create a VLAN interface on each M-LAG
member device for the same VLAN.
b. Assign the same IP address and MAC address
to the VLAN interfaces.
The traffic sent to the M-LAG system is load
c. Assign a unique virtual IP address from the shared between the uplinks or downlinks. The
same subnet to each of the VLAN interfaces M-LAG member devices forward traffic as
with the same VLAN ID. follows:
The VLAN interfaces use the virtual IP
addresses for BGP or OSPF neighbor • For Layer 2 traffic sent by the server, the
relationship setup. M-LAG member devices look up the MAC
address table and forward the traffic locally.
• Create a VLAN interface on each M-LAG member
• For Layer 3 traffic sent by the server, the
device for another VLAN, assign the peer-link
M-LAG member devices perform Layer 3
interfaces to this VLAN, and assign a unique IP
forwarding based on the FIB table.
address from the same subnet to each of the VLAN
interfaces. • For external traffic destined for the server,
The M-LAG member devices use these VLAN the M-LAG member devices perform
interfaces to forward Layer 3 traffic between them forwarding based on the FIB table.
when a link to the upstream spine device is failed.
• Use Layer 3 interfaces to connect the M-LAG
member devices to the upstream spine device, and
configure ECMP routes for load sharing between
the uplinks.
4
Figure 4 Network diagram
Spine
10.39.244.12/31 10.39.244.44/31
BGP
Vlan-int 4094 Vlan-int 4094
100.100.100.1/30 100.100.100.2/30
10.39.244.13/31 10.39.244.45/31
Peer link
Leaf1 Leaf2
VLAN 100
Server
100.127.128.3/24
Tasks Forwarding
• Configure a VRRP group on the M-LAG member devices
and use the VRRP virtual IP address as the gateway for the
attached server.
• VLAN interface configuration:
a. Create a VLAN interface for the VLAN where the
M-LAG interface resides on each M-LAG member The traffic sent to the M-LAG system is
device. load shared between the uplinks or
b. Assign a unique primary IP address from the same downlinks. The M-LAG member devices
subnet to each of the VLAN interfaces. forward traffic as follows:
c. Assign a unique secondary IP address from another • For the Layer 3 traffic sent by the
subnet to each of the VLAN interfaces. server, both M-LAG member
devices perform Layer 3
• Use the primary or secondary IP addresses of the VLAN forwarding.
interfaces to set up BGP or OSPF neighbor relationships
with the server-side network device. • For the external traffic destined for
the server, the M-LAG member
• Set up a Layer 3 connection over the peer link between the devices make forwarding decisions
M-LAG member devices. based on local routes.
The M-LAG member devices use the Layer 3 connection to
forward traffic between them when a link to the upstream
spine device is failed.
• Use Layer 3 interfaces to connect the M-LAG member
devices to the upstream spine device, and configure ECMP
routes for load sharing between the uplinks.
6
Recommended BGP deployment solution 1
Figure 5 Recommended EBGP route planning within a single fabric
PE PE
AS 65505
Border Border
Spine
AS 65504 Spine
IBGP
Leaf1 Leaf2 Leaf3 Leaf4 Service Leaf Service Leaf
AS 65501 AS 65502 AS 65503
7
Recommended BGP deployment solution 2
Figure 6 Recommended EBGP route planning within a single fabric
PE PE
AS 65501
Border Border
AS 65504 Spine
Spine
IBGP
Leaf Leaf Leaf1 Leaf2 Service Leaf Service Leaf
AS 65501 AS 65501 AS 65501
Capacity planning
This section introduces the server capacity design within a single fabric.
Design principles
The number of spine downlink interfaces is the number of leaf devices. Number of servers =
Number of leaf devices × number of leaf downlink interfaces / 2.
Typical design
• M-LAG access
When the access switches adopt M-LAG, the leaf devices that form an M-LAG system use two
high-speed ports to establish the peer link, and four ports in the uplink. Among the four ports,
each two ports are connected to one spine device.
8
Table 5 The leaf devices adopt 10-G interfaces to connect to the servers and 40-G uplink
interfaces
Table 6 The leaf devices adopt 25-G interfaces to connect to the servers and 100-G
uplink interfaces
Number
Spine Number
Spine convergence ratio Number of of
device of spine
(downlink/uplink) access switches access
model devices
servers
432 / 2 = 216
Evaluation based on 36 × 100-G Convergence ratio
interface card: (1:3)
That is, 400:1200 (4 216 × 48 /
Convergence ratio (1:3)
× 100-G uplink ports 2 = 5184
That is, 144 uplink ports and 432
and 48 × 25-G
downlink ports.
downlink ports per
switch).
S12516X-AF 2
288 / 2 = 144
Evaluation based on 36 × 100-G Convergence ratio
interface card: (1:3) 144 × 48 /
Convergence ratio (1:1) That is, 400:1200 (4 2 = 3456
× 100-G uplink ports
That is, 288 uplink ports and 288
and 48 × 25-G
downlink ports.
downlink ports per
switch).
• S-MLAG access
When the access switches adopt S-MLAG access, each access switch uses six or eight uplink
ports. Each port corresponds to one spine device.
9
Table 7 The leaf devices use 10-G interfaces to connect to the servers and 100-G uplink
interfaces
Table 8 The leaf devices use 25-G interfaces to connect to the servers and 100-G uplink
interfaces
Multi-fabric expansion
You can add more fabrics to further expand the data center network. Connect the fabrics with EDs.
10
Figure 7 Single fabric network diagram
Fabric1
Internet VPN
FW1 FW2
Peer link
Border1 Border2
Spine1 Spine2
Backup
11
Recommended hardware
Device
Scenario Device model
role
• Type H modules for the S12500X-AF
switches
Medium-sized and large networks
Border/ED • All modules supported by the S12500G-AF
switches
Small-sized networks Same as the leaf devices
• S6800
• S6860
• S6805
10-GE access • S6850-2C/S9850-4C with 10-GE interface
modules installed
• S6812/S6813
• S6880-48X8C
• S6825
• S6850-56HF
25-GE access • S6850-2C/S9820-4C with 25-GE interface
Leaf
modules installed
• S6880-48Y8C
• S6800
40-GE access • S6850-2C/S9850-4C with 40-GE interface
modules installed
• S9850-32H
• S6850-2C/S9850-4C with 100-GE interface
100-GE access modules installed
• S9820-8C
• S9820-64H
Access design
Server access design
Every two access devices form an M-LAG system of leaf devices to provide redundant and backup
access for servers.
12
Figure 9 Typical data center network
Internet
peer-link
Border1 Border2
L3GW L3GW
Spine1 Spine2
Leaf1 Leaf6
peer-link peer-link peer-link
Leaf2 Leaf3 Leaf4 Leaf5
L3GW L3GW L3GW L3GW L3GW L3GW
13
Figure 10 Server access methods
peer-link peer-link
peer-link peer-link
Leaf1 Leaf2 Leaf1 Leaf2
Du
e
al-
iv
ac
ct
tive
l-a
ua
Primary Backup Backup Primary
D
Eth0 Eth1 Eth0 Eth1 Eth0 Eth1 Eth0 Eth1
M-LAG topology
Network topology
As shown in Figure 11, two border devices form an M-LAG system, and two external gateways also
form an M-LAG system. The physical cables are cross-connected among border devices and
external gateways. Four links among border devices and external gateways are aggregated to form
a logical link through M-LAG. Because an external network specifies only one gateway IP for an
external gateway and the M-LAG network requires only one interconnect IP for one aggregate link,
this network topology is well compatible with the cloud network model. As a best practice, use the
M-LAG network if possible.
14
Figure 11 M-LAG topology for border devices and external gateways
peer-link
External External
gateway 1 gateway 2
peer-link
Border1 Border2
Reliability
On the M-LAG network, you do not need to deploy failover links between border devices. The traffic
will not be interrupted only if one of the four physical links among the border devices and external
gateways is normal.
• When one link between Border1 and external gateways fails, LACP automatically excludes the
faulty link and switches traffic to the normal link, which is transparent for routing.
• When both links between Border1 and external gateways fail, M-LAG keeps the aggregate
interface up, and the traffic is switched to Border2 over the peer link. Then, the traffic is
forwarded to external gateways through the normal aggregate interface, which is transparent
for routing.
• When Border1 fails, the underlay routing protocol senses the failure. On the spine device, the
routing protocol withdraws the route with the next hop VTEP IP as Border1, and switches to
Border2 the traffic previously load-shared to Border1 in equal cost mode.
Benefits and restrictions
The external gateways must also support M-LAG. An M-LAG system supports only two member
devices.
Fewer IP addresses and VLAN resources are consumed. The M-LAG topology is well compatible
with the cloud network model.
Triangle topology
Network topology
As shown in Figure 12, two border devices form a multi-active device group, which supports more
than two devices. Border devices use four Layer 3 interfaces to connect to eternal gateways, and
the physical cables are cross-connected. The border devices and external gateways can
communicate through static routes and OSPF routes.
Figure 12 Triangle topology for border devices and external gateways
External External
gateway 1 gateway 2
Border1 Border2
15
Reliability
In the triangle topology, failover links between Border1 and Border2 are not required. The failover
links between border devices are needed only when all links between a border device and external
gateways fail.
• When one link between Border1 and external gateways fails, the corresponding static routes
are invalidated or the corresponding dynamic routes are withdrawn. Then, the traffic previously
load-shared to the faulty link in equal cost mode can be switched to the other normal link.
• When both links between Border1 and external gateways fail, failover links are needed
between border devices for the network to operate normally. In this case, the corresponding
static routes are invalidated or the corresponding dynamic routes are withdrawn. Then, the
traffic on Border1 can pass through the failover links to reach Border2, and then be forwarded
to external gateways through Border2.
• When Border1 fails, the underlay routing protocol senses the failure. On the spine device, the
routing protocol withdraws the route with the next hop VTEP IP as Border1, and switches to
Border2 the traffic previously load-shared to Border1 in equal cost mode. In this case, the
corresponding static routes are invalidated or the corresponding dynamic routes are withdrawn
on the external gateways. Then, the traffic in the north-to-south direction can be switched to
Border2.
Benefits
Two or more border devices can be active. More IP addresses and VLAN resources are consumed.
Square topology
Network topology
As shown in Figure 13, border devices use two Layer 3 interfaces to connect to external gateways.
You must deploy a failover link between the two border devices, and make sure the preference of a
failover route is lower than that of a route for normal forwarding. Typically, dynamic routing protocols
are deployed between two external gateways and two border devices to automatically generate
routes for normal forwarding and failover routes.
Figure 13 Square topology for border devices and external gateways
External External
gateway 1 gateway 2
Border1 Border2
Reliability
In the square topology, you must deploy a failover link between Border1 and Border2. When the
interconnect link between a border device and external gateways fails, the failover link is required
for service continuity.
• When the interconnect link between Border1 and external gateways fails, the corresponding
static routes are invalidated or the corresponding dynamic routes are withdrawn. Then, the
traffic on Border1 can pass through the failover link to reach Border2, and then be forwarded to
external gateways through Border2.
• When Border1 fails, the underlay routing protocol senses the failure. On the spine device, the
routing protocol withdraws the route with the next hop VTEP IP as Border1, and switches to
16
Border2 the traffic previously load-shared to Border1 in equal cost mode. In this case, the
corresponding static routes are invalidated or the corresponding dynamic routes are withdrawn
on the external gateways. Then, the traffic in the north-to-south direction can be switched to
Border2.
Benefits
The square topology reduces the number of links between border devices and external gateways.
This topology is applicable when the border devices are far from the external gateways or hard to
connect to external gateways.
IP planning principles
On a single-fabric network, you must plan the spine-leaf interconnect interface addresses and
router ID addresses.
Spine-leaf interconnect interface IP addresses
As a best practice, configure the spine-leaf interconnect interfaces to borrow IP addresses from
loopback interfaces. On a Layer 3 Ethernet interface, execute the ip address unnumbered
interface LoopBack0 command to configure the interface to borrow an IP address from a
loopback interface.
Router ID address planning on an M-LAG network
For a routing protocol, two member devices of an M-LAG system are standalone and must be
configured with different router IDs. Please manually configure router IDs. If you do not do that, a
device will automatically select a router ID, which might cause router ID conflicts. In an
EVPN+M-LAG environment, as a best practice, configure the network as follows:
• Two member devices in the same M-LAG system use loopback0 interface addresses as the
local VTEP IP address and router ID of the M-LAG system, and cannot be configured with the
same address.
• Two member devices in the same M-LAG system use loopback1 interface addresses as the
virtual VTEP address (configured by using the evpn m-lag group command) and must be
configured with the same address.
Deployment modes
About the deployment modes
The underlay network contains spine and leaf nodes.
You can deploy the underlay network in the following modes:
• Manual deployment
• Automated deployment
Automated deployment can be implemented in the following ways:
• Automatic configuration deployment by the controller. You configure templates on the
controller and devices can be incorporated by the controller automatically when they start up
with factory default configuration. No underlay device provisioning is required.
• Automatic configuration (recommended in scenarios without a controller deployed)
a. The administrator saves the software image files (including startup image files and patch
packages) and configuration files for the device to an HTTP, TFTP, or FTP server. The
software image files and configuration files for the device can be identified by the SN of the
device.
17
b. The device obtains the IP address of the TFTP server through DHCP at startup and then
obtains the script for automatic configuration.
c. The device runs the script and obtains device information (for example, SN) and matches
the software image file and configuration file for the device.
d. The device downloads the software image file and configuration file from the file server,
and loads the image and deploys configuration automatically.
Output from a Python script for automatic configuration enables you to monitor the operations
during execution of that script, facilitating fault location and troubleshooting.
With the automatic configuration feature, the device can automatically obtain a set of
configuration settings at startup. This feature simplifies network configuration and
maintenance.
SeerEngine-DC Links
Spine 2
Spine 1 Spine 1 Spine 2
18