User Tools

Site Tools


No renderer 'pdf' found for mode 'pdf'
0x13:reports:d2t3t05-v-switch-live-migration-support-for-virtio-with-sriov-vf-datapath

This is an old revision of the document!


Day 2 / Common Track / Talk 5 Talk – Nuts-n-Bolts: V-switch Live-migration Support For Virtio With SRIOV VF Datapath Speakers: Or Gerlitz and Parav Pandit Report by: Mitu Aggarwal

This talk started with a quick reminder to give background – SRIOV has drawbacks especially with live migration. If support for live migration is required, some solutions suggested integration with VirtIO; summarizing this in terms of how the vswitch looks and how the guest looks for this setup;

Three-netdev model was proposed as a solution for live migration where the VirtIO-net is used as a failover device. Mentioned are other models that use 2 netdev or 1 netdev models; Michal from Broadcom has a blog post that discusses this; Sridhar from Intel has also worked on this in the past.

During live migration, the VF device is hot-unplugged; the traffic failover to the emulated datapath and then the migration can happen. Once the migration has completed, the VF can be hot-plugged back;

Because we are dealing with sriov, we need to take care of both the SW v-switch as well as the HW e-switch. Two HW e-switch modes were discussed. Legacy mode and Switchdev mode. This talk was about the switchdev mode. As a summary of the switchdev mode – there is a software representation in the hypervisor for the NIC eswitch VPorts in the VM. The representors used for the slow-path when the traffic are not offloaded and this is the offloading device knob.

The tc flower is the mechanism to offload vswitch flows to NIC eswitch. A question raised by the presentor was how do we do live migration with this vswitch? The answer was that we don’t want the SW switch to know there are two paths to the VF in the VM. So we want to do something that will support the switchdev model

Two things need to be done. The first is Flow based forwarding is applied to the port. The second is the need to tie the representor in the host with the paravirtualized device in the VM - so we need a mechanism to stitch it to the emulated interfaces.

The proposed design include the use of two virtual functions for each VM – one that uses the accelerated path and the other that uses the emulated path and also bond the two representors in the hypervisor. The switching in this case is always done in HW even during live migration; The standard teaming or bonding (active backup team mode) can be used; when the LM is not going, the accelerated path will be used; when the LM happens, the accelerated path needs to unplug and also unplug the representor for the accelerated path and then give a kick to the bonding driver and have it failover to the emulated path.

On ingress the packets are received through the VPort and then matched against the set of flows, manipulated ad forwarded to a VPort. We don’t want to reinstall the data flows during LM; so the virtual switch installs the flows on the bonding device and shares the TC block of the master of bonding device with eh lower device. So the smartNIC should know that when the failover device becomes active, the flows need to move to the VPort for the emulated device

When the hypervisor changes the channel, the hardware eswitch changes the datapath as well.

Questions raised was adding the tc rules to the bond, the lower devices will also get the rules? So instead of adding the rule to the device, you are adding the rule to the tc block. Or said that the FW could bind to this block and do it smartly so that we don’t have to duplicate the flows in the block

In the Microsoft model, only the guest knows which channel is being used in steady state; so that the model doesn’t work with switchdev. Will this solution work with RDMA traffic?

The answer was that currently doesn’t support RDMA since the failover is only for TCP/IP traffic. The RDMA connections will break if we failover to the emulated device; If the RDMA driver somehow knows how to do the mount on the other side and restart the connections, then it will work

No need for dedicated interrupts or dedicated resources in the HW for this – Can just work with a dummy VPort that isn’t backed with resources.

Site: https://www.netdevconf.org/0x13/session.html?talk-v-switch-virtio-sriov-vf-datapath

0x13/reports/d2t3t05-v-switch-live-migration-support-for-virtio-with-sriov-vf-datapath.1554900560.txt.gz · Last modified: 2019/09/28 17:04 (external edit)

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki