I have for long been itching to blog about network teaming, load balancing and failover order. Let’s start off with the main benefit of NIC teaming. When uplinks are teamed, it can share the load of traffic between physical and virtual networks among some or all of its members, as well as provide passive failover in the event of a hardware failure or network outage. There are many KB articles from VMware explaining more details into this but that’s for later.
Below diagram is a vSwitch teaming and failover menu with an active active vmnic adapters.
NIC teaming within a vSwitch has several options which are more relevant when your vSwitch is using multiple uplinks in my case 2 uplinks per vmkernel and each portgrpup in my homelab.
Load Balancing Options
The first point of interest is the load-balancing policy. This is basically how we tell the vSwitch to handle outbound traffic, and there are four choices on a standard vSwitch:
Route based on the originating virtual port
Route based on IP hash
Route based on source MAC hash
Use explicit failover order
Keep in mind that we’re not concerned with the inbound traffic because that’s not within our control. Traffic arrives on whatever uplink the upstream switch decided to put it on, and the vSwitch is only responsible for making sure it reaches its destination.
The first option, route based on the originating virtual port, is the default selection for a new vSwitch. Every VM and VMkernel port on a vSwitch is connected to a virtual port. When the vSwitch receives traffic from either of these objects, it assigns the virtual port an uplink and uses it for traffic. The chosen uplink will typically not change unless there is an uplink failure, the VM changes power state, or the VM is migrated around via vMotion.
The second option, route based on IP hash, is used in conjunction with a link aggregation group (LAG), also called an EtherChannel or port channel. When traffic enters the vSwitch, the load-balancing policy will create a hash value of the source and destination IP addresses in the packet. The resulting hash value dictates which uplink will be used.
The third option, route based on source MAC hash, is similar to the IP hash idea, except the policy examines only the source MAC address in the Ethernet frame. To be honest, we have rarely seen this policy used in a production environment, but it can be handy for a nested hypervisor VM to help balance its nested VM traffic over multiple uplinks.
The last option, use explicit failover order, really doesn’t do any sort of load balancing. Instead, the first Active NIC on the list is used. If that one fails, the next Active NIC on the list is used, and so on, until you reach the Standby NICs. Keep in mind that if you select the Explicit Failover option and you have a vSwitch with many uplinks, only one of them will be actively used at any given time. Use this policy only in circumstances where using only one link rather than load balancing over all links is desired or required.
Network Failover Detection
There are two options for network failover detection.
Link Status only: This will detect the link state of the physical adapter. If the link state changes, for example, if the physical switch fails or the network cable is unplugged, failover to a working NIC will be initiated. Link Status works by checking physical (layer 1) connectivity. It is not able to determine or react to configuration issues such as misconfigured VLANs on trunks.
Beacon Probing: When this setting is enabled, beacon probes are sent out and listened for on all the NICs that are in the team. It uses the probes it receives to determine the link status, and, unlike the Link Status detection method, is able to detect downstream switch failures. Beacon probing works best when there are at least 3 NICs in the team. You can read more about it here. Note: Do not use beacon probing with IP-hash load balancing.
There are some additional settings associated with failover detection.
Notify Switches. When this is set to ‘Yes’, the host will notify the physical switch the NIC is connected to whenever a failure occurs. Generally this option is set to ‘Yes’, except when a VM using Microsoft NLB in unicast mode is assigned to the port group in question – In which case is should be set to ‘No’.
Failback. Select Yes or No for the Failback policy. If ‘Yes’ is selected then traffic will fail back to the original NIC once it has recovered following a failure.
The last policy relating to failover and load balancing is the Failover Order policy. This lets you define which adapters are in use, in standby or unused for each vSwitch or portgroup. The three categories available for placing NICs into are:
Active Adapters: NICs listed here are active and are being used for inbound/outbound traffic.
Standby Adapters: NICs listed here are on standby and only used when an active adapter fails.
Unused Adapters: NICs listed here will not be used.
Configure Explicit Failover to Conform with VMware Best Practices
When using Explicit Failover, each portgroup is given its own dedicated network adapter, however also has a standby adapter configured, which is the dedicated adapter for a different portgroup. For example, on the same vSwitch you could have a management portgroup and a vMotion portgroup. vmnic5 would be active for management and standby for vMotion, whilst vmnic0 would be active for vMotion and standby for management. This solution provides each port group with its own dedicated network adapter, effectively isolating it from the impact of any network activity for the others. However, it also allows each port group to be failed over to the remaining network adapters if its own network adapter loses connectivity.
Types of port binding (VMWare)
These three different types of port binding determine when ports in a port group are assigned to virtual machines:
- Static Binding
- Dynamic Binding
- Ephemeral Binding
When you connect a virtual machine to a port group configured with static binding, a port is immediately assigned and reserved for it, guaranteeing connectivity at all times. The port is disconnected only when the virtual machine is removed from the port group. You can connect a virtual machine to a static-binding port group only through vCenter Server.
Note: Static binding is the default setting, recommended for general use.
In a port group configured with dynamic binding, a port is assigned to a virtual machine only when the virtual machine is powered on and its NIC is in a connected state. The port is disconnected when the virtual machine is powered off or the NIC of the virtual machine is disconnected. Virtual machines connected to a port group configured with dynamic binding must be powered on and off through vCenter.
Dynamic binding can be used in environments where you have more virtual machines than available ports, but do not plan to have a greater number of virtual machines active than you have available ports. For example, if you have 300 virtual machines and 100 ports, but never have more than 90 virtual machines active at one time, dynamic binding would be appropriate for your port group.
Note: Dynamic binding is deprecated from ESXi 5.0, but this option is still available in vSphere Client. It is strongly recommended to use Static Binding for better performance.
In a port group configured with ephemeral binding, a port is created and assigned to a virtual machine by the host when the virtual machine is powered on and its NIC is in a connected state. When the virtual machine powers off or the NIC of the virtual machine is disconnected, the port is deleted.
You can assign a virtual machine to a distributed port group with ephemeral port binding on ESX/ESXi and vCenter, giving you the flexibility to manage virtual machine connections through the host when vCenter is down. Although only ephemeral binding allows you to modify virtual machine network connections when vCenter is down, network traffic is unaffected by vCenter failure regardless of port binding type.
Note:Ephemeral port groups are generally only used for recovery purposes when there is a need to provision ports directly on a host, bypassing vCenter Server.
VMware Validated Designs, for example, use these for the Management Domain to help allow flexibility in the management cluster in the event of a vCenter outage.
If a Management Cluster is not used, then it is recommended to create an ephemeral port group on the VDS for Management workloads (including vCenter), allowing them to attach to it during a vCenter outage.