How DPU`s accelerate VMware ESXi with NSX - a deeper look to the data path!

With vSphere 8 and NSX 4, VMware has introduced support for DPU`s (Data Process Units), see my blog post How NSX and SmartNICs (DPUs) accelerates the ESXi Hypervisor! as a introduction for this topic. DPU`s are more known from SmartNICS, but there is a slight difference between DPU`s and SmartNICs. DPUs and SmartNICs serve to accelerate and offload tasks in data centre environments. DPUs are more versatile and capable of handling a broader range of data-related workloads, including networking, storage, and security tasks. SmartNICs are more specialised and primarily focus on optimising network-related functions. The choice between the two depends on the specific needs and use cases of the data centre or cloud infrastructure. DPU`s running its own operation system (OS) and is completely managed independently. SmartNICs are integrated and managed from the operation system (OS) running on the CPU.

VMware is using DPU`s with ARM processor. The DPU support with vSphere 8 and NSX 4 is declared from VMware as Distributed Service Engine (DSE). NVIDIA and AMD Pensando currently supporting the DPU with vSphere and NSX. Dell EMC and HPE are supporting the solution from server vendor side. There are other NIC and server vendors on the roadmap. VMware also have plans to support vSAN and Baremetal for DPU`s in the future.

The DPU architecture accelerates the networking and security function in the modern "Software Defined Data Center". NSX networking and security services are offloaded and accelerated to the DPU. DPUs also provide enhanced visibility to show network communications. This helps with troubleshooting, mitigation against hacking attacks and compliance requirements. It enables VMware customers to run NSX services such as routing, switching, firewalling and monitoring directly on the DPU. This is particularly interesting for users who have significant demands in terms of high throughput, low latency and increased security standards. 

Due to offloading network and security services to the DPU the x86 frees up compute resources on the host for the applications. As a result, more workloads can be deployed on fewer servers - without compromising the monitoring, manageability and security features offered by vSphere and NSX. DPUs reduce the computational tasks of the main processors, thereby reducing energy consumption and the associated CO2 emissions. In addition, because DPUs distribute power and efficiency across fewer servers, the number of hardware components needed is reduced. This reduces waste and protects the environment.

With DPU`s, the NSX services (routing, switching, firewalling, monitoring) are outsourced from the hypervisor to the DPU (Data Process Unit), see figure 1. An additional modified and specified ESXi image is installed on the DPU for this purpose. The new architecture runs the infrastructure services on the DPU, providing the necessary separation between the application workloads running on the x86 computing platform and the infrastructure services. This is of enormous advantage for customers with high security and compliance requirements. Regulatory authorities such as the BSI (German Federal Office for Information Security) in particular often require separations of productive and management traffic for certain environments. 

Figure 1: x86 and DPU (Data Process Unit) architecture

Data-Path Model Evolution from VDS over SR-IOV/EDP to DPU

Before I want to describe the data-path model options of a DPU I want to show how is it currently running with a standard VDS (vSphere Distributed Switch). Afterwards I will have a look to the VMware performance data path models SR-IOV (Single-Root Input/Output Virtualization) and EDP (Enhanced Data Path) which has been designed for performance requirements before DPU. And finally I will come to the DPU data path options VMDirectPath (UPTv2) and Emulated Mode which brings the acceleration in hardware.

VDS Standard Datapath

In figure 2 there is the standard datapath for a VDS visible, it does not matter if a N-VDS or a VDS is in use, it is the same principle. When a packet arrives at the network card of the ESXi server, a short interruption causes a context switch at the CPU. After the routing and firewall rules have been verified in the slow path, the packet will be forwarded. The same process takes place at the VM level, the CPU is also loaded and a context change is brought about. This causes problems especially for applications with a high packet rate.

Figure 2: Standard VDS datapath

 

Data path models: SR-IOV or EDP

Even before DPUs, VMware introduced the SR-IOV (Single-Root Input/Output Virtualization) and EDP (Enhanced Data Path) data path models (see figure 3) to provide techniques for workloads with high performance requirements. SR-IOV bypasses the virtual switch completely. Traffic is passed directly from the physical network card to the virtual machine. The "Physical Function" (PF) and the "Virtual Function" (VF) map the communication from the physical to the VM. Since there is no virtual switch in the data path, the CPU is not loaded and there is no additional latency. There is a one-to-one relationship between a VF and a VM.

The number of Virtual Functions depends on the network card. SR-IOV must be supported by the PF driver, the ESXi host, the VF driver and the virtual machine operating system. As a virtual machine driver, SR-IOV relies on vendor-specific PMD (Poll Mode Driver) drivers to access the network adapter directly. The disadvantage of SR-IOV is that the hardware dependency means that the HA tools of vSphere such as vMotion1 or DRS2 (Distributed Resource Scheduler) are not supported by VMware.

Figure 3: Performace Data Path models SR-IOV and EDP

A second data path model for improving performance is Enhanced Data Path (EDP). EDP is an NSX-specific function. Dedicated CPU resources are reserved by the cores on the hypervisor for the "process forwarding" of the data packets. When a packet arrives at the ESXi server, a copy is sent to the fast path and the flow cache is checked. If the forwarding information and, in the case of an active firewall, the so-called five-tuples (source IP address, destination IP address, source port, destination port, protocol) are successfully verified, the packet is forwarded to the virtual machine. The flow cache is located at a dedicated storage location and is constantly polled by the CPU. If there are no control layer functions in the flow cache, the network and security configuration of the NSX Manager is verified in the so-called "slow path" in order to send the data packet to the respective destination. The slow path then sends an update to the fast path so that in future the packets are processed directly in the flow cache.

In the slow path, a processor-side load is placed on the hypervisor. The VMXNET3 PMD driver is used on the virtual machine. The clear advantage of this method: With EDP, the vSphere high availability models such as vMotion or DRS are still available. 

Data path models for a DPU

DPUs combine the advantages of SR-IOV (Single Root I/O Virtualisation) and EDP (Enhanced Data Path) and map them architecturally (see figure 4). The DPU contains the hardware accelerator component for fast packet forwarding and reserves dedicated CPU resources for packet processing in the fast path.

Figure 4: Performance Data Path models VMDirectPath (UPTv2) and Emulated Mode with a DPU

Thus, the DPU converts packet processing, which is otherwise implemented in software, into hardware pipelines and the processing of NSX software packets moves from the server to the DPU. This in turn conserves the server's CPU consumption and frees up cache and memory resources that are shared with the VM and container workloads.

VMs can use Passthrough Virtual Functions and use the NSX functions. The hardware packet processing pipeline as well as the embedded processors implement the NSX datapath functionalities for this traffic.

The DPU architecture combines the advantages of Passthrough and the current NSX Enhanced Data Path with the VMXNET3 drivers. A dedicated VMDirectPath VF module implements the new UPT (Uniform Passthrough) architecture on the ESXi hypervisor. Virtual Functions based on VMDirectPath (UPTv2) then represent the virtualised instances of the physical network adapter. The VMDirectPath (UPTv"2) can be activated on the vCenter over a checkmark on the VM level.

If emulated mode (default mode) is used, traffic runs through a distributed MUX switch on the ESXi hypervisor. Besides the acceleration in hardware the packet forwording will be processed in software (Fast Path/Slow Path) in case of a HW table miss.

SmartNICs have the advantage that virtual machines can operate in pass-through mode while functionalities such as vSphere vMotion remain intact. In addition, there are no dependencies on the hardware for the VMs' guest drivers as with SR-IOV.

Please check out following YouTube video from my colleague Meghana Badrinath for a data path deep dive DPU-based Acceleration for NSX: Deep Dive

Summary:

Through DPUs with NSX 4 and vSphere 8, VMware improves speed at the hypervisor while taking into account the current network and security requirements of modern applications. Especially in times of increased security requirements due to ransomware and other potential attacks, this is an enormous advantage and the physical isolation of the workload and infrastructure domains as well. Purchases of new dedicated hardware in the form of additional DPU network cards with their own ARM processors must be taken into account and should be considered accordingly in future architecture planning. These investments are offset by savings in energy costs and minimize the total number of servers.

Joerg Roesch

Joerg is working as a Lead Solution Engineer at VMware. His focus is Network & Security for Multi Cloud architectures.

Next
Next

VMware Cloud on AWS - Part 2 - Automated Deployment with Terraform