The NetDev conference wiki

This is an old revision of the document!

Day 1 / Track 1 / Talk 1 Workshop: TCP Analytics Chair: Sowmini Varadhan Report by: Kiran Patil

As the outlined indicates, this is collection of session about TCP analytic.

Goal was to highlight the problems with TCP analytic - means in deployments of TCP/IP based networks,

“how to monitor TCP flow efficiently”. This sessions covers experiences from various deployments,

companies who deals with TCP analytic on day-to-day basis. There are various techniques

described/shared/proposed to monitor flows and provide some sort of QoS by monitoring,

metering and improving performance of TCP flows and eventually better service to end-user.

Following techniques were discussed/proposed:

TCP_INFO/TCP_SO_TIMESTAMPING

Extended TCP stack instrumentation (RFC 4898)

Monitoring TCP

Usage of TCP-eBPF (at Premature stage)

Large scale TCP analytic collection

TCP Analytics at Microsoft

TCP Analytic for satellite Broadband

Walk thru of TCP_INFO/TCP_SO_TIMESTAMPING: This session talks about details of TCP_INFO and its

fields (tcp_state, options, data_seg*, delivered, retransmission) and how this information

is useful in determining state of TCP flows and provides insight into TCP flows (aide to debug)

TCP_INFO is large blob and obtaining TCP_INFO has measurable overhead due to socket lock,

and still it doesn;t include everything (CC used, SOL_TCP, etc..)

This session covers usage of OPT_STATS with TCP_SO_TIMESTAMPING and its use for perf analysis

Key-takeaway: TCP_INFO and TCP_SO_TIMSTAMPING are both powerful instrumentation but use them wisely

Caveats to remember: If enabled on every TX packets, 20% reduction/regression for throughput

Questions: When happens when TSO offload is used for TCP?

Answer: Timestamp will be available for last_byte if packet send resulted into multiple packets unlike UDP

Extended TCP stack instrumentation: This talk covers about RFC 4898 (Framework for TCP stack instrumentation)

which defines approx. ~120 MIBs for instrumentation. In kernel, RFC 4898 stats are implemented via. TCP_INFO.

For details, refer to TCP_INFO and RFC 4898. This session covers following 3 implementations

- Web100, Web10G, tcpstat.

Web10G: TCP Instrumentation where metrics are stored in hash (in memory structs). This feature

can be enabled thru kernel parameter (net.ipv4.tcp_estats). This stats are accessed via netlink kernel module (TCP_ESTATS and TCP_INFO).

There is user land API thru limnl library for user space to query stats. This talks covers about value of Web10G in TCP analytic.

Web10G provides real work detail flow metric and used in multiple research such as TEACUP. It is also used

in various papers exploiting buffer bloat, cloud perf, wireless latency, and network modeling reproducibility.

This session covers about caveats and recommendation - when to use and when to not use.

- This is useful for data transfer nodes and not deploy it in high scale production environment (e.g. 100K connection)

This talk explains use case “XSight”, Web10G in production environment:

Used to identify failing and sub-optimal flows during lifespan of flows.

               Also used to provide panoptical view of network performance.

Monitoring TCP: This session covers about challenges w.r.t. monitoring TCP such as

“what stats to collect (TCP_CHRONO) and how frequently to sample TCP_INFO state”. It also

covers about interesting TCP state events and how TCP-BPF opens up new possibilities

TCP-BPF: This session covers per connection optimization for TCP parameters using TCP-BPF.

It covers about tunable params for intra-DC traffic such as use of small buffers, small SYN_RTO, and cwnd-clamp.

TCP-BPF is new BPF program and it provides access to TCP_SOCK_FIELDS. It means visibility to internal state of TCP flows.

It also opens up mechanism of new callbacks for analytics and better decision making (w.r.t. provisioning dynamic resources).

Example of new callbacks are, notify when packets are sent or received. Use this feature

with caution - means don't enable on all flows but as needed (randomly on small % of flows) or enable

while debugging atypical flow. This session also covers about external trigger (e.g. TCP_INFO) like “ss”.

TCP-BPF per connection is not yet there but there is TCP-BPF per cgroup.

Large Scale TCP Analytics collection: This session describes issues with inet_diag (referring to Telco use case)

such as events getting dropped, polling takes long time, no events during connection setup and termination.

It also covers issues about getting information about connections/flows (such as which congestion

algorithm is used) out of kernel to user space and how this information is propagated to user space.

TCP Analytic at Microsoft: This session covers real life problem being dealt in Microsoft.

It covers about class of problems - connectivity and performance. There are various reasons

for connectivity problems such as “app failed to connect” - this could be due to network/infrastructure issues,

no listener, listen backlog, firewall rules, port exhaustion, routing misconfiguration, NIC driver issues.

Likewise it covers about performance problems such as “why TCP throughput is so low” and

its possible causes - application issue (not posting enough buffers, not draining fast-enough),

TCP Rx window, network congestion, CPU usage. This talk describes typical analysis process for

connectivity and performance problems. For connectivity issues - tracing and packet capture, detailed tracing

for connection setup. TO analyze performance issues - attempt of micro benchmark to rule out application

issue, time sequence plots, TCP stats, and network trace analysis.

TCP stats: This session talks about mapping of user to severs based on TCP stats (aka delivery metrics).

It covers stats collection methods such as random sampling (callbacks in TCP layer, additional per socket stats),

usage of mmap, poll to retrieve tcpsockstat from /dev/tcpsockstat, etc… It also covers

about TCP_INFO and how this information can be useful to derive delivery metrics. Proposal for

TCP stats collection using BPF/tracepoint, trace per socket. There was suggestion form Google

to trace “sendmsg” using cmsg, TCP_INFO, and timestamping.

TCP Analytic for Satellite Broadband: This session talks about TCP perf challenges and need of

min RTT of 500 ms and how none of the congestion algorithm deals with it. Recommendation

is to use PEP (Perf Enhancement Proxies) to avoid congestion. PEP and AQM (Active queue management)

avoid packet drops. It also covers about need to monitor TCP performance issues (needed

to meet Service Level Objective) and monitoring challenges:

active measurement if intrusive and not scalable

use of passive measurement (L2 stats monitoring)

It covers about QoE assurance and need of troubleshooting abnormality by correlating PEP TCP flow

stats with RF stats.

Site: https://www.netdevconf.org/0x13/session.html?tcp-analytics