Session

Accelerating Software RDMA (RXE) with Netkit and Devmem

Speakers

Yanjun.Zhu

Label

Hands On

Session Type

Talk

Contents

Description

Software RDMA (Soft-RoCE/RXE) traditionally encapsulates RoCE v2 traffic within standard UDP/IP packets. While functional, this design forces same-host inter-namespace or container communication to traverse the entire Linux networking subsystem , incurring severe performance penalties from routing table lookups, Netfilter/iptables rules, and redundant memory copies. We propose and demonstrate an optimized, shortened data path for RXE that leverages netkit as a high-performance fast-path transport hook. By interfacing directly with netkit’s native transmit and receive mechanisms, RXE entirely bypasses the traditional IP layer. Our evaluation demonstrates a 10% to 20% reduction in same-host latency within network namespace environments, highly dependent on the complexity of the bypassed host routing and firewall rules. To push the boundaries of zero-copy software RDMA, we integrate Device Memory TCP (devmem TCP) and dma-buf into this architecture via netkit. By utilizing hardware queue leasing and splitting from a physical NIC (e.g., NVIDIA ConnectX-6), data payloads bypass host RAM and land directly into bound device memory (e.g., GPU). This hardware-software co-design achieves absolute 100% zero-copy data transfers across network boundaries , completely mitigating CPU-bound memory copy overhead under massive workloads. Furthermore, this architecture addresses a long-standing challenge in software RDMA: non-intrusive, low-overhead observability. Unlike kprobes or driver-specific modifications that degrade performance or break portability , we leverage netkit’s native integration with eBPF (tcx/ingress). We implement a programmable visibility layer that executes kernel-level packet parsing , extracts RDMA headers (Base Transport Header/BTH, etc.) into an eBPF ring buffer , and streams them to user space for standard PCAP dumping with negligible overhead. Live Demonstration: We will present a fully functional, unified RXE-Netkit prototype operating across isolated Linux network namespaces. The live demonstration will showcase:

  1. Fast-Path RDMA Transport: rping traffic running natively over netkit interfaces (nk0/nk1) while bypassing the IP routing stack.
  2. Devmem Acceleration: A wire-speed ncdevmem and RXE workload showcasing zero-copy RDMA read/write transfers backed by device memory.
  3. eBPF-Based Telemetry: A live demonstration of our custom eBPF kernel and user-space tool dissecting RoCE opcodes and payload structures in real-time. Discussion & Future Roadmap: Finally, we wish to engage the netdev community in a strategic discussion regarding the future evolution of software RDMA, focusing on:
  4. Moving toward a native, direct dma-buf infrastructure implementation inside the upstream RXE driver.
  5. Standardizing eBPF hooks within the software RDMA path for unified debugging, tracing, and security policy enforcement.
  6. Evaluating the long-term architectural trade-offs between virtual device-based hooks (netkit) versus a direct driver-level (xmit/recv) network stack bypass.

This talk will feature a comprehensive, deep-dive architectural analysis along with an end-to-end live demo based on Linux kernel selftests.