Session

DPU-Offloaded TLS Termination and Session Routing for Stateful MCP Traffic

Speakers

Balakrishna Bhamidipati
Vijay Ram Inavolu

Label

Nuts and Bolts

Session Type

Talk

Description

Model Context Protocol (MCP) is fast becoming how AI agents reach tools, data, and models, increasingly deployed at scale where connection counts, compute-intensive authentication, and session affinity become infrastructure concerns. A session, once initialized, is pinned to a backend, and subsequent requests must return to it using an application-layer session identifier assigned by the backend in the initialization response and visible only after TLS termination. This affinity cannot be determined from packet fields at connection time, and today TLS termination, authentication, and routing execute on inference hosts, competing for the CPU and memory bandwidth needed by MCP servers and their backends to serve requests.

We present a DPU-offload reverse-proxy solution, built entirely from stock Linux mechanisms, that moves TLS termination, OAuth2/JWT validation, and session-aware L7 routing off the host. On the DPU the proxy performs the OpenSSL handshake in userspace and enables kernel TLS with SSL_OP_ENABLE_KTLS, delegating record processing to the Linux tls subsystem when cipher and kernel support allow it. It extracts the Mcp-Session-Id from decrypted headers and maintains an in-process session-to-backend affinity table. New sessions are assigned round-robin, while subsequent requests follow the recorded session-to-backend mapping, ensuring consistent L7 session affinity. A single-process, epoll-driven state machine multiplexes handshakes, forwarding, long-lived Server-Sent Events (SSE) relay, and teardown on client DELETE or backend 404. JWT validation on the DPU rejects unauthorized requests before they reach inference hosts, shifting the TLS and authentication trust boundary to the DPU.

Evaluation demonstrates correct session affinity, balanced backend utilization, stale-session handling, and stable kernel-TLS operation across long-lived streaming connections. The contribution is a reusable Linux-based architecture rather than a new protocol or kernel primitive. We walk through the packet path, kTLS activation and fallback, session lifecycle, backend affinity, and SSE relay, showing how the design can be reproduced on commodity DPUs without kernel modifications.