Session

Unleashing SR-IOV Offload on Virtual Machines

Speakers

Yui Washizu

Label

Nuts and Bolts

Session Type

Talk

Contents

Description

Hardware offloading through Single Root I/O Virtualization (SR-IOV) is one of the methods to accelerate virtual networks, used by platforms like OpenStack and Kubernetes. It is super fast and useful for performance-sensitive applications. However, currently it can only offload Linux network functions on physical machines and can’t be used for Linux network functions on virtual machines (VMs). Thus, offloading virtual networks for containers running on VMs is not possible although there are cases where deploying containers to VMs rather than physical machines is preferred for high flexibility. The same is true for nested VMs.

We aim to achieve offloading Linux network on VMs to SR-IOV physical NICs. The reason why Linux network on VMs can’t be offloaded is that network construction software for SR-IOV offload (e.g. SR-IOV CNI plug-in) creates and configures Virtual Functions (VFs) by accessing Physical Functions (PFs). One naive way to solve this is assigning PFs to VMs and allowing the VMs to exclusively access PFs, but this lacks scalability (one VM per PF) and also introduces security concerns as guests can control the entire NIC hardware.

We propose a method that emulates PFs that have SR-IOV feature while offloading data plane to hardware. This can be achieved by emulating SR-IOV virtio_net devices in QEMU with vDPA, which can offload only data plane without offloading control plane, as its backend. With our method, we can use embedded switch on the physical SR-IOV NIC and it’s possible to offload Linux network functions on VMs in a scalable way without security concerns stated above. Our PoC implementation employs an L2 switching feature in the SR-IOV’s legacy mode. We measured throughput and latency of container networks on VMs with our PoC. The results were several times better compared to when not using SR-IOV.

In this talk we’ll show the details of our approach, the performance numbers of our PoC and the plans for future switchdev mode.