0x13:reports:d1t1t04-hardware-offload-workshop
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
0x13:reports:d1t1t04-hardware-offload-workshop [2019/04/03 14:59] – ehalep | 0x13:reports:d1t1t04-hardware-offload-workshop [2019/09/28 17:04] (current) – external edit 127.0.0.1 | ||
---|---|---|---|
Line 4: | Line 4: | ||
Report by: Anjali Singhai | Report by: Anjali Singhai | ||
- | During this session there were discussions relating to hardware offload. | + | During this session there were discussions relating to hardware offload |
- | The discussion started | + | |
- | There was a discussion about the need to have more hardware | + | |
- | The discussion | + | The discussion |
- | Add a mechanism to allow ASICs to report dropped packets to user space | + | There was a discussion about the need to have more hardware counter visibility for upper layers in the stack. Right now the hardware has lots of stat counters, programmable ones but they are not tied well into the different layers in the stack. |
- | 2. Metadata can be attached to the packet | + | The discussion then shifted to packet drop visibility in the Control plane which is very important. The proposed solutions are: |
+ | 1. The addition of a mechanism to allow ASICs to report dropped packets to user space. | ||
+ | 2. Metadata can be attached to the packet. | ||
+ | 3. Drop reason, ingress port, egress port, timestamp etc | ||
+ | 4. Drop reasons should be standardized and correspond to kernel drops ( ex: Ingress VLAN filter) | ||
+ | 5. Mechanism should allow the user to filter noisy drops, sample and truncate. | ||
+ | 6. Filtering could be based on stages in the pipeline. | ||
+ | 7. Devlink packet trap set DEV (all | group…) enable/ | ||
+ | 8. Show status and supported metadata | ||
+ | 9. Monitor dropped packets | ||
+ | 10. An eBPF filter could be attached to the netlink socket | ||
- | 3. Drop reason, ingress port, egress port, timestamp etc | + | Discussion ensued with Jiri mentioning that iptables enables tracing and looks like this infrastructure is missing in route susbsystem, tc subsystem, and then map that to HW |
+ | Tom then added that if all that drop packets are received, how will this scale? how do you weed one particular drop? | ||
- | 4. Drop reasons should be standardized | + | The discussion continued with talks about policers being configured between ASIC and CPU, to limit the number of packets. The point made was not eliminate stats but augment it more. |
- | 5. | + | The next talk was about Doorbell overflow recovery. The topic of discussion was the discovery and recovery for RDMA queues. Possible solutions were fast dequeuing, CPU stall and drop message detection |
- | 6. | + | This talk was followed with Qos Ingress Rate limiting and OVS offload with TC. The focus was on ingress rate limiting and policing. The rate limited was done with TC offload by adding a matchall type cls with police action and introducing reserved priorities. OVS should install Tc filters with priority offset, reserve higher priority for rate limiting. |
+ | A possible issue with ovs-tc offload is when going from software to hardware, tc police is in software and filters are offloaded, this could break semanting. Possible solutions include reverting to original semantics of policing with offload isn't supported and ovs forcing tc filters | ||
- | 7. | + | Rony raised the question of why were priorities chosen vs chains. The answer was that recirculation is a good use case for chains. |
- | 8. Show status and supported metadata | + | This was followed by a small test demo. |
- | 9. Monitor dropped packets | + | Finally |
- | + | 1. Scale without using SRIOV | |
- | 10. An eBPF filter can be attached to the netlink socket | + | 2. Multiple dynamic instances deployment at faster speed than VFs |
- | + | 3. NIC HW has very well defined vport based virtualization mode | |
- | (Jiri: Iptables enables tracing…looks like this infrastructure is missing in route susbsystem…tc subsystem etc…and then map that to HW) | + | 4. One PCI device split into multiple smaller sub devices |
- | + | 5. Each sub device comes with own devices, vport, namespace resource | |
- | (Tom: If I am receiving all the stuff that I am dropping…how will this scale…how do you weed one particular drop… | + | 6. Leverage mature switchdev mode and OVS eco-system |
- | + | 7. Applicable for SmartNIC use case. | |
- | Policers are configured between ASIC and CPU…to limit the number of packets… | + | 8. Using rich vendor agnostic devlink iproute2 tool. |
- | + | ||
- | The point is not eliminate stats but augment it more) | + | |
- | + | ||
- | | + | |
- | + | ||
- | iii. QoS offload for NIC eSwitch model: Focus on Ingress rate limiting and Policing. | + | |
- | + | ||
- | 1. Add a matchall type cls with police action | + | |
- | + | ||
- | 2. | + | |
- | + | ||
- | a. OVS should install Tc filters with priority offset, reserve higher priority for rate limiting | + | |
- | + | ||
- | 3. | + | |
- | + | ||
- | a. Enable TC offload | + | |
- | + | ||
- | b. Add bridge and interfaces | + | |
- | + | ||
- | c. Configure rate limit…translates to matchall filter with police action | + | |
- | + | ||
- | | + | |
- | + | ||
- | d. | + | |
- | + | ||
- | Rony: (why did you choose priorities vs chains…recirculation is a good use case for chains..) | + | |
- | + | ||
- | iv. Scalable NIC HW offload | + | |
- | + | ||
- | a. | + | |
- | + | ||
- | i. Scale without using SRIOV | + | |
- | + | ||
- | ii. Multiple dynamic instances deployment at faster speed than VFs | + | |
- | + | ||
- | iii. NIC HW has very well defined vport based virtualization mode | + | |
- | + | ||
- | iv. One PCI device split into multiple smaller sub devices | + | |
- | + | ||
- | v. Each sub device comes with own devices, vport, namespace resource | + | |
- | + | ||
- | vi. Leverage mature switchdev mode and OVS eco-system | + | |
- | + | ||
- | vii. Applicable for SmartNIC use case. | + | |
- | + | ||
- | viii. | + | |
- | + | ||
- | ix. Mdev software model view | + | |
- | + | ||
- | 1. Mlx5 mdev devices | + | |
- | + | ||
- | 2. Add control plane knob to add /query remove mdev devices | + | |
- | + | ||
- | a. Devlink used | + | |
- | + | ||
- | 3. | + | |
- | + | ||
- | 4. | + | |
- | + | ||
- | 5. In HW mdev is attached to a vport | + | |
- | + | ||
- | 6. Map it to a container…cannot be mapped to a VM since single instance of driver. | + | |
- | + | ||
- | i. Not connected to VFIO (it’s not necessary…), | + | |
- | + | ||
- | + | ||
- | Site: https:// | + | |
- | Slides: | + | |
- | Videos: | + | |
+ | The question that the presentors raised was how to achieve an Mdev software model view. A couple of points provided were: | ||
+ | 1. Mlx5 mdev devices | ||
+ | 2. Adding control plane knob to add /query remove mdev devices | ||
+ | 3. Mentioned vDPA from Intel | ||
+ | 4. Create 3 devices, netdev, RDMA device and representor netdev. | ||
+ | 5. In HW mdev is attached to a vport | ||
+ | 6. Map it to a container…cannot be mapped to a VM since single instance of driver. | ||
+ | The talk was concluded with reasons it's been implemented that way, as the devlink tool and bus model fits requirements such as providing vendor agnostic solution and multi-port subdevice creation. | ||
+ | Site: https:// |
0x13/reports/d1t1t04-hardware-offload-workshop.1554303545.txt.gz · Last modified: 2019/09/28 17:04 (external edit)