参考文档:
EtherChannel Negotiation
An EtherChannel can be established using one of three mechanisms:
- PAgP - Cisco's proprietary negotiation protocol
- LACP (IEEE 802.3ad) - Standards-based negotiation protocol
- Static Persistence ("On") - No negotiation protocol is used
没有配置etherchannel之前:stp会禁用端口
配置之后:
问题1:Nic Teaming可以聚合带宽,但是不会提升单个连接所获得带宽,为什么?
同一个Session中的数据包为啥不能做到Load Balancing?这是因为网络的7层模型中,一个Session在传输过程中会被拆分成多个数据包,并且到目的之后再重组,他们必须具有一定的顺序,如果这个顺序弄乱了,那么到达目的重组出来的信息就是一堆无意义的乱码。这就要求同一个session的数据包必须在同一个物理链路中按照顺序传输过去。所以,10条1Gb链路组成的10Gb的聚合链路,一定不如单条10Gb链路来的高速和有效。
cisco的EtherChannel reduces part of the binary pattern that the addresses in the frame form to a numerical value that selects one of the links in the channel in order to distribute frames across the links in a channel. EtherChannel frame distribution uses a Cisco-proprietary hashing algorithm. The algorithm is deterministic; if you use the same addresses and session information, you always hash to the same port in the channel. This method prevents out-of-order packet delivery.
All ports in each EtherChannel must be the same speed. You can base the load-balance policy (frame distribution) on a MAC address (Layer 2 [L2]), an IP address (Layer 3 [L3]), or a port number (Layer 4 [L4]). You can activate these policies, respectively, if you issue the command. The session keyword is supported on the Supervisor Engine 2 and Supervisor Engine 720. The ip-vlan-session keyword is only supported on the the Supervisor Engine 720. Use this keyword in order to specify the frame distribution method, with the IP address, VLAN, and Layer 4 traffic.问题2.LACP运行在ISO哪一层上?
如果物理交换机也做链路聚合,那么我们首先要搞清楚物理交换机和主机直接如何链路聚合,也就是LACP.
cisco专有的协议为EtherChannel,支持的场景为:
- One IP to many IP connections. (Host A making two connection sessions to Host B and C)
- Many IP to many IP connections. (Host A and B multiple connection sessions to Host C, D, etc)Note: One IP to one IP connections over multiple NICs is not supported. (Host A one connection session to Host B uses only one NIC).
LACP运行在MAC层上,假定所有链接是全双工,点对点,同等速率的端口
3.什么是转发乱序?
我们知道,基于网络分层的思想,TCP与IP转发,可以说是互不干涉的,转发平面(或者路由器)尽力而为的转发报文;而TCP对下层链路是不感知的,为了最大带宽的利用率,启动后以慢启动方式快速的扩大拥塞窗口,直到丢包发生,进入拥塞避免阶段(收到对方3个冗余ACK)或者慢启动阶段(超时丢包)收缩拥塞窗口,接着又开始继续扩大拥塞窗口发送报文。
虽然IP转发可以不理会TCP的处理方式,协议并没有要求。但如果IP转发能够做点事情,帮助TCP链路更为平滑,岂不是更好。
下面举个多核转发乱序,导致TCP流量下降,以及如何解决的问题。
假设发送端发送了5个报文,序号分别是1,2,3,4,5,接收端期望也是按顺序收到1,2,3,4,5,如果接受端收到了1之后,没有收到2,但收到了3,4,5,接收端会发送3个ACK,应答报文指明了期望收到的序号是2,发送端连续收到了3个冗余ACK,会进入拥塞避免阶段,拥塞窗口收缩为一半+3个报文段的大小,拥塞窗口的收缩,将影响了发送端发送报文的流量。可以简单理解为开始水龙头是全部打开的,这时候水流是比较大的,在出现问题后,水龙头只打开一半多一点点,水流就降低了很多。
单核转发,问题并不大,通常是报文先到先处理,那么顺序是可以保证的。
但在多核转发下,问题就很容易出现了。对于同一个输入端口,有多个核处理报文,由于各种报文的处理路径并不一致(TCP/UDP/ICMP等等),可能有些报文处理的快些,有些报文处理的慢些。比如前面的例子,假如系统有5个核,分别处理上面报文的1,2,3,4,5,核2因某些原因处理的较慢或者说被阻塞了,核3,4,5处理的较快,就先把报文3,4,5转发出去了,接受端由于先收到的报文不是期望的,就连续发送了3个ACK过去,表示期望的报文序号是2,导致发送端的窗口收缩,流量下降。
实际这种情况是由于转发系统乱序引起的。
4.ESXi IP hash nic team + 物理交换机LACP(动态)应用的场景?
参考:
条件:vsphere 5.1 + 分布式虚拟交换机,LACP只可以用vSphere web client设置
适用情况:到不同的IP上的流量,例如web 服务器。
好处:一个VM的多个IP会话会分布在多个物理网卡上。the same VM can use both links for different TCP or UDP sessions
不适用的情况:IP访问比较固定,例如存储访问,VM访问NFS存储。(IP包头里头源和目的地址固定了,)
概念:LACP需要虚拟交换机和物理交换机上都配置(进站流量),出站流量用nic teaming设定,且为IP-hash
for VMs that host applications needing access to multiple target IP addresses, LACP links combined with IP hash load balance algorithm provide good balance of traffic across all connections. Compared to traditional NIC teaming, all links get utilized simultaneously. While traditional NIC teaming is simple to configure, without any extra steps needed on the physical switch, a given VM could only be active on one link at a time (as the MAC appearing on two ports on the switch that are not LACP configured would cause one of the ports to be shutdown)
5.ESXi nic teaming + 物理交换机静态链路聚合的应用场景?
Static teaming (IEEE 802.3ad draft v1)
优点:如果交换机不支持LACP,只支持静态LACP,
缺点:一个VM只能利用一个网卡的带宽。静态LACP无法检测线缆或者配置错误。
In Static teaming mode there is no check for incorrectly plugged cables or other errors. This mode is useful when the preferred bandwidth exceeds a single physical NIC and the switch does not support LACP, but the switch does support static teaming.
6.将一个ESXi服务器上的2个网卡连接到一个物理交换机上有什么后果(不使用nic teaming)?
参考:
2个网卡属于一个vSwitch,由于vSwitch不支持LACP和STP,所以2个连接都是活动的。vSwitch不依赖STP或 port blocking而是依靠特殊的转发规则:split horizon switching(Cisco UCS documentation uses the term End Host Mode)
避免了转发循环。
7.vSwitch有哪些物理交换机不同的特点?
参考:
Ports are not equal
In a traditional Ethernet switch, the same forwarding rules are used for all ports. Virtual switch uses different forwarding rules for vNICs and uplinks.
No MAC address learning
The hypervisor knows the MAC addresses of all virtual machines running in the ESX server; there’s no need to perform MAC address learning.
Spanning Tree Protocol is ignored
Virtual switch is not running Spanning Tree Protocol (STP) and does not send STP Bridge Protocol Data Units (BPDU). STP BPDUs received by the virtual switch are ignored. Uplinks are never blocked based on STP information.
As ESX doesn’t run STP, you should also configure spanning-tree portfast on these ports.
Split-horizon forwarding
Packets received through one of the uplinks are never forwarded to other uplinks. This rule prevents forwarding loops through the virtual switch.
Limited flooding of broadcasts/multicasts
Broadcast or multicast packets originated by a virtual machine are sent to all other virtual machines in the same port group (VMware terminology for a VLAN). They are also sent through one of the uplinks like a regular unicast packet (they are not flooded through all uplinks). This ensures that the outside network receives a single copy of the broadcast.
The uplink through which the broadcast packet is sent is chosen based on the load balancing mode configured for the virtual switch or the port group.
Broadcasts/multicasts received through an uplink port are sent to all virtual machines in the port group (identified by VLAN tag), but not to other uplinks (see split-horizon forwarding).
No flooding of unknown unicasts
Unicast packets sent from virtual machines to unknown MAC addresses are sent through one of the uplinks (selected based on the load balancing mode). They are not flooded.
Unicast packets received through the uplink ports and addressed to unknown MAC addresses are dropped.
Reverse Path check based on source MAC address
The virtual switch sends a single copy of a broadcast/multicast/unknown unicast packet to the outside network (see the no flooding rules above), but the physical switch always performs full flooding and sends copies of the packet back to the virtual switch through all other uplinks. VMware thus has to check the source MAC addresses of packets received through the uplinks. Packet received through one of the uplinks and having a source MAC address belonging to one of the virtual machines is silently dropped.
8.何为BPDU filter?
参考:
BPDU包
BPDU包就是STP协议的一些交换包。没有验证机制信任所有的BPDU包,所以可能有假冒的BPDU包。
虚拟交换机不支持STP,自身也不会发送任何BPDU包,也不会处理任何来自物理交换机的BPDU包。
虚拟机上如果生成和传播BPDU包会将整个cluster瘫痪掉。例如发送一个假冒的包以便赢得ROOT bridge角色。
为防止特定端口接收BPDU包,发明了BPDU Guard in Cisco and BPDU Protection on HP network device.
一旦发现某端口有BPDU包就关闭该端口。
由此引出BPDU filter,适用于VDS和VSS两种交换机。需要每个主机一个一个的去修改
设置:
9.LACP是否可以绑定2个不同的交换机端口。
LACP itself doesn't provide the ability to bond across multiple switches; it bonds across multiple ports on a single ethernet switch, and depending on the vendor there might even be restrictions on which ports on a switch can be bonded together.
Some vendors have proprietary protocols (typically called ) that allow for bonded ethernet channels across different ethernet switches; this may not be helpful when working with a server's ethernet ports.
10.不对物理或者虚拟交换机正确配置的后果
Without synchronized ESX-switch configuration you can experience one of the following two symptoms:
- Enabling static LAG on the physical switch (pSwitch), but not using IP-hash-based load balancing on vSwitch: frames from the pSwitch will arrive to ESX through an unexpected interface and will be ignored by vSwitch. Definitely true if you use , probably also true in active/active per-VM-load-balancing configuration (need to test it, but I suspect loop prevention checks in vSwitch might kick in).
- Enabling IP-hash-based load balancing in vSwitch without corresponding static LAG on the pSwitch: pSwitch will go crazy with MACFLAP messages and might experience performance issues and/or block traffic from the offending MAC addresses (Duncan Epping has ).
11.总体图
12.LACP+nic teaming实例
虚拟交换机上的配置:
uplink port group上设置LACP Active or Passive mode
port group上设置IP hash
物理交换机上要正确配置LACP和Vlan(组内的vlan要相同)
参考:
13.LACP注意点
ESXi5.1上只支持一个vDS创建一个LAG,但是可以建立多个vDS,建立多个LAG.
In vSphere 5.1, LACP implementation has some constraints and those were: Supports only one LAG per VDS per host. All uplinks in the dvuplink port group are included in this LAG. Only the IP hash load balancing algorithm is supported. - See more at:
Hashing Algorithm - The hashing algorithm determines the LAG member used for traffic. LACP can use different properties of the outgoing traffic (e.g. source IP/Port number) to distribute traffic across all the links participating in a LAG.针对物理交换机配置LACP也需要选择hash算法决定入站流量在LAG内的分配
14.为什么需要多个LAG?
A:DC networks moving towards 10GbE, which require multiple etherchannels
B:Hosts with mix of 1GbE and 10GbE NICs need multiple etherchannel support
15.vsphere 5.5中LACP的增强 。
Enhancement In vSphere 5.5
Support multiple LACP LAGs
Max 32 LAG per Host
Max 64 LAG per VDS Support all supported hashing algorithms in LACP (22)
Note: Uplinks must be going to either the same switch or a pair of switches appearing as a single logical switch (using vPC, VSS, MLAG, SMLT, or similar technology).
16.Cisco的链路聚合概念,EtherChannel,PAgP?
catOS
The Cisco-proprietary hash algorithm computes a value in the range 0 to 7. With this value as a basis, a particular port in the EtherChannel is chosen. The port setup includes a mask which indicates which values the port accepts for transmission. With the maximum number of ports in a single EtherChannel, which is eight ports, each port accepts only one value. If you have four ports in the EtherChannel, each port accepts two values, and so forth. This table lists the ratios of the values that each port accepts, which depends on the number of ports in the EtherChannel:
Number of Ports in the EtherChannel | Load Balancing |
8 | 1:1:1:1:1:1:1:1 |
7 | 2:1:1:1:1:1:1 |
6 | 2:2:1:1:1:1 |
5 | 2:2:2:1:1 |
4 | 2:2:2:2 |
3 | 3:3:2 |
2 | 4:4 |
Note: This table only lists the number of values, which the hash algorithm calculates, that a particular port accepts. You cannot control the port that a particular flow uses. You can only influence the load balance with a frame distribution method that results in the greatest variety.
Note: The hash algorithm cannot be configured or changed to load balance the traffic among the ports in an EtherChannel.
Issue the command in order to check the frame distribution policy. In version 6.1(x) and later, you can determine the port for use in the port channel to forward traffic, with the frame distribution policy as the basis. The command for this determination is .
These are some examples:
Cisco IOS
EtherChannel load balancing can use MAC addresses, IP addresses, or Layer 4 port numbers with a Policy Feature Card 2 (PFC2) and either source mode, destination mode, or both. The mode you select applies to all EtherChannels that you configure on the switch. Use the option that provides the greatest variety in your configuration. For example, if the traffic on a channel only goes to a single MAC address, use of the destination MAC address results in the choice of the same link in the channel each time. Use of source addresses or IP addresses can result in a better load balance. Issue the global configuration command in order to configure the load balancing.
-
6509#remote login switch Trying Switch ... Entering CONSOLE for Switch Type "^C^C^C" to end this session 6509-sp#test etherchannel load-balance interface port-channel 1 ip 10.10.10.2 10.10.10.1 !--- This command should be on one line. Would select Gi6/1 of Po1 6509-sp#
-
6509#remote login switch Trying Switch ... Entering CONSOLE for Switch Type "^C^C^C" to end this session 6509-sp#test etherchannel load-balance interface port-channel 1 mac 00d0.c0d7.2dd4 0002.fc26.2494 !--- This command should be on one line. Would select Gi6/1 of Po1 6509-sp#
PAgP aids in the automatic creation of EtherChannel links. PAgP packets are sent between EtherChannel-capable ports in order to negotiate the formation of a channel. Some restrictions are deliberately introduced into PAgP. The restrictions are:
-
PAgP does not form a bundle on ports that are configured for dynamic VLANs. PAgP requires that all ports in the channel belong to the same VLAN or are configured as trunk ports. When a bundle already exists and a VLAN of a port is modified, all ports in the bundle are modified to match that VLAN.
-
PAgP does not group ports that operate at different speeds or port duplex. If speed and duplex change when a bundle exists, PAgP changes the port speed and duplex for all ports in the bundle.
-
PAgP modes are off, auto, desirable, and on. Only the combinations auto-desirable, desirable-desirable, and on-on allow the formation of a channel. The device on the other side must have PAgP set to on if a device on one side of the channel does not support PAgP, such as a router.
PAgP is currently supported on these switches:
-
Catalyst 4500/4000
-
Catalyst 5500/5000
-
Catalyst 6500/6000
-
Catalyst 2940/2950/2955/3550/3560/3750
-
Catalyst 1900/2820
These switches do not support PAgP:
-
Catalyst 2900XL/3500XL
-
Catalyst 2948G-L3/4908G-L3
-
Catalyst 8500
You can configure EtherChannel connections with or without Inter-Switch Link Protocol (ISL)/IEEE 802.1Q trunking. After the formation of a channel, the configuration of any port in the channel as a trunk applies the configuration to all ports in the channel. Identically configured trunk ports can be configured as an EtherChannel. You must have all ISL or all 802.1Q; you cannot mix the two. ISL/802.1Q encapsulation, if enabled, takes place independently of the source/destination load-balancing mechanism of Fast EtherChannel. The VLAN ID has no influence on the link that a packet takes. ISL/802.1Q simply enables that trunk to belong to multiple VLANs. If trunking is not enabled, all ports that are associated with the Fast EtherChannel must belong to the same VLAN.
要想把接口配置为PAGP 的desirable 模式使用命令:“channel-group 1 mode desirable”;
要想把接口配置为PAGP 的auto 模式使用命令:“channel-group 1 mode auto”;
要想把接口配置为LACP 的active 模式使用命令:“channel-group 1 mode active”;
要想把接口配置为LACP 的passive 模式使用命令:“channel-group 1 mode passive”。
端口通道负载均衡 port-channel load-balance
sw1(config)#port-channel load-balance ?
dst-ip Dst IP Addr
dst-mac Dst Mac Addr
src-dst-ip Src XOR Dst IP Addr
src-dst-mac Src XOR Dst Mac Addr
src-ip Src IP Addr
src-mac Src Mac Addr
1、以太网通道最多可以捆绑8条物理链路
2、捆绑遵循以下规则:
(1)相同VLAN
(2)端口中继模式
(3)相同speed和duplex
17.vmware对LACP的解释和限制:
LACP is a standards-based method to control the bundling of several physical network links together to form a logical channel for increased bandwidth and redundancy purposes. LACP enables a network device to negotiate an automatic bundling of links by sending LACP packets to the peer.
LACP works by sending frames down all links that have the protocol enabled. If it finds a device on the other end of the link that also has LACP enabled, it also sends frames independently along the same links, enabling the two units to detect multiple links between themselves and then combine them into a single logical link.This dynamic protocol provides these advantages over the static link aggregation method supported by previous versions of vSphere:- Plug and Play – Automatically configures and negotiates between host and access layer physical switch
- Dynamic – Detects link failures and cabling mistakes and automatically reconfigures the links
LACP limitations on a vSphere Distributed Switch
- vSphere supports only one LACP group (Uplink Port Group with LACP enabled) per distributed switch and only one LACP group per host (5.1版本)
- LACP does not support Port mirroring
- LACP settings do not exist in host profiles
- LACP only works with IP Hash load balancing and Link Status Network failover detection
- LACP between two nested ESXi hosts is not possible
18.cisco 链路聚合范例
Etherchannel 分为二层和三层etherchannel以太网链路捆绑用来增加带宽和负载均衡。拓扑如下: SW1的配置:interface FastEthernet0/1 channel-group 1 mode desirable switchport mode trunkinterface FastEthernet0/2 channel-group 1 mode desirable switchport mode trunkinterface Port-channel 1 switchport mode trunkSW2的配置:interface FastEthernet0/1 channel-group 1 mode desirable switchport mode trunkinterface FastEthernet0/2 channel-group 1 mode desirable switchport mode trunkinterface Port-channel 1 switchport mode trunkshow etherchannel summary 查看以太网通道的状态SW2#show etherchannel summaryFlags: D - down P - in port-channel I - stand-alone s - suspended H - Hot-standby (LACP only) R - Layer3 S - Layer2 U - in use f - failed to allocate aggregator u - unsuitable for bundling w - waiting to be aggregated d - default port
Number of channel-groups in use: 1Number of aggregators: 1Group Port-channel Protocol Ports------+-------------+-----------+----------------------------------------------1 Po1(SU) PAgP Fa0/1(P) Fa0/2(P)S代表的是二层以太网通道 U代表UP 通道起来了 P代表这两个接口参与了以太网通道注意的是:逻辑接口的配置会覆盖物理接口上的配置 这样就看到效果了吧!另外要注意以太网通道的模式 etherchannel 的模式: 1、PAGP的模式:on:不进行协商,没有协商traffic。类似nonegotiateauto:passive negotiat state。可接受对端发出的协商,但不会主动申请。(默认)desirable:active negotiat state。主动协商状态。主动发送PAGP包。2、LACP的模式:passive:passive negotiating state。被动状态,可接受。但不会主动申请(默认)active:active negotiating state。主动状态,主动申请。注意: on on OK desirable desirable OK desirable auto OK auto auto 形成不了 auto on 形成不了 active active OK passive active OK passive passive 形成不了
S1(config)# interface range f0/13 -15S1(config-if-range)# channel-group 1 mode ? active Enable LACP unconditionally auto Enable PAgP only if a PAgP device is detected desirable Enable PAgP unconditionally on Enable Etherchannel only passive Enable LACP only if a LACP device is detectedS1(config-if-range)# channel-group 1 mode activeCreating a port-channel interface Port-channel 1
19.虚拟交换机的三种配置模式VST VGT EST
VMFS的vmware的一种文件系统,VMDK是vmware的虚拟硬盘文件,RDM是Raw Device Mappings原生设备映射
在VMDK模式时,LUN是被ESXI挂成存储,并且以Datastore的方式来存放,这个LUN会被格式化为VMFS格式,VM的虚拟硬盘会以VMDK的文件格式存放在这个已经成为VMFS格式的Datastore的LUN中,在RDM模式时,LUN是被视为一个独立硬盘,也就是存储设备上的一个LUN,这个LUN可以是各种文件格式,如NTFS,EXT3,EXT4,FAT32等,视总控这个LUN的操作系统来决定。VM可以用bit by bit可写硬盘的方式直接可写这个LUN,而不需要通过hypervisor的翻译