Fosstodon
NETDEV VIDEOS
Session
Toward Host-Pluggable Congestion Control for RDMA/IP Datacenter Transports
Speakers
Vivek Kashyap
Label
Nuts and Bolts
Session Type
Talk
Description
Building on the Netdev 0x19 talk on congestion control in AI/ML datacenter networks, this talk presents a concrete step toward host-pluggable congestion control for RDMA/IP datacenter transports. The previous talk surveyed modern datacenter congestion-control approaches, the limitations of fixed endpoint behavior, and the need for congestion-control algorithms to become more programmable and adaptable as workloads evolve.
This follow-up focuses on a practical implementation model that moves congestion-control policy out of fixed firmware or hardware implementations and into a host/hybrid control framework. A host component running in userspace, or alternatively in the kernel, periodically issues probe packets and uses hardware timestamping to obtain path RTT measurements. These measurements are converted into a congestion estimate for the path. The resulting control value is then distributed across the active Queue Pairs associated with that peer or path and applied through a driver-mediated QP update interface. The talk will share results from this implementation running with NIC-embedded congestion control disabled, without relying on DCQCN/PFC behavior, to demonstrate that a host-driven control loop can manage RDMA congestion.
The intent is not to claim that probe RTT is the only useful congestion signal. Rather, probe-driven feedback provides a deployable starting point for separating congestion-control policy from device-specific implementation. By commoditizing the control loop through a host-accessible framework, new algorithms can be prototyped, tuned, compared, and modified without requiring every change to be embedded directly in NIC firmware/hardware. The same host/driver framework can also be extended to incorporate additional endpoint signals such as ECN counts, ACK or progress counters, retransmit and retry events, selective-recovery information, and path-health indicators.
This flexibility matters because modern datacenter workloads are heterogeneous. AI/ML collectives, storage transfers, kv-cache movement, HPC messages, and front-end traffic may share Ethernet/IP infrastructure but have different latency, throughput, and burst behavior. A host-pluggable substrate allows congestion behavior to be adapted by workload, path, and policy rather than being constrained to a single fixed transport or firmware mechanism.
The talk will describe the probe-driven control loop, the interaction between the host congestion-control component and active QPs, early implementation results, the tradeoffs between userspace, kernel, firmware, and hardware placement, and the minimal endpoint interfaces needed to make RDMA congestion-control algorithms easier to deploy and evolve in datacenter environments.
Recent News
Bronze Sponsor, Common Net
[Tue, 16, Jun. 2026]
Bronze Sponsor, secunet
[Fri, 12, Jun. 2026]
Bronze Sponsor, Red Hat
[Fri, 12, Jun. 2026]
Bronze Sponsor, Mpiric
[Tue, 09, Jun. 2026]
Bronze Sponsor, Viasat
[Mon, 08, Jun. 2026]
Important Dates
| Closing of CFS | June 1st |
| Notification by | June 10th |
| Conference dates | July 13th-16th |