0x13:reports:d1t1t01-tcp-analytics
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
0x13:reports:d1t1t01-tcp-analytics [2019/03/27 22:31] – ehalep | 0x13:reports:d1t1t01-tcp-analytics [2019/09/28 17:04] (current) – external edit 127.0.0.1 | ||
---|---|---|---|
Line 2: | Line 2: | ||
Workshop: TCP Analytics | Workshop: TCP Analytics | ||
Chair: Sowmini Varadhan | Chair: Sowmini Varadhan | ||
- | Report by: | + | Report by: Kiran Patil |
+ | The goal of this session was to highlight the problems with TCP analytics, means of deployment of TCP/IP based networks and "how to monitor TCP flow efficiently" | ||
+ | This sessions covered experiences from various deployments of companies who deal with TCP analytics on day-to-day basis. There were various techniques described, shared and proposed to monitor flows and provide some sort of QoS by monitoring, metering and improving performance of TCP flows and eventually better service to end-user. | ||
+ | The following techniques were discussed and proposed: | ||
+ | TCP_INFO/ | ||
+ | Extended TCP stack instrumentation (RFC 4898) | ||
+ | Monitoring TCP | ||
+ | Usage of TCP-eBPF (at Premature stage) | ||
+ | Large scale TCP analytic collection | ||
+ | TCP Analytics at Microsoft | ||
+ | TCP Analytic for satellite Broadband | ||
+ | During the meeting there was an introduction and walkthrough of the TCP_INFO/ | ||
+ | One of the issues that were described was that TCP_INFO is a large blob and obtaining TCP_INFO has measurable overhead due to socket lock, and still it doesn' | ||
- | Site: https://www.netdevconf.org/ | + | This was followed by a covering of the usage of OPT_STATS with TCP_SO_TIMESTAMPING and its use for perf analysis. The key-takeaway was that TCP_INFO and TCP_SO_TIMSTAMPING are both powerful instrumentation but their usage must be wisely. |
+ | Some caveats to remember: If enabled on every TX packets, there is a 20% reduction/ | ||
+ | |||
+ | One question that arose was what happens when TSO offload is used for TCP? The answer was that timestamp will be available for last_byte if packet send resulted into multiple packets unlike UDP. | ||
+ | |||
+ | Next there was a talk about extended TCP stack instrumentation. This talk covered RFC 4898 (Framework for TCP stack instrumentation) which defines approximately 120 MIBs for instrumentation. In kernel, RFC 4898 stats are implemented via. TCP_INFO. This session covered 3 implementations | ||
+ | 1. Web100 | ||
+ | 2. Web10G | ||
+ | 3. tcpstat. | ||
+ | |||
+ | For Web10G, in TCP Instrumentation, | ||
+ | Web10G provides real work detail flow metric and used in multiple research such as TEACUP. It is also used in various papers exploiting buffer bloat, cloud perf, wireless latency, and network modeling reproducibility. | ||
+ | |||
+ | This talk also covered caveats and recommendations. For example it is useful for data transfer nodes and not deploy it in high scale production environment (e.g. 100K connection) | ||
+ | |||
+ | This talk also explained the use case " | ||
+ | |||
+ | Next was a talk about Monitoring TCP, covering challenges with respect to monitoring TCP such as "what stats to collect (TCP_CHRONO) and how frequently to sample TCP_INFO state" | ||
+ | This talk also included TCP-BPF and how TCP-BPF can be used to provide per connection optimization for TCP parameters. It covered tunable parameters for intra-DC traffic such as use of small buffers, small SYN_RTO, and cwnd-clamp. | ||
+ | |||
+ | TCP-BPF is a new BPF program and it provides access to TCP_SOCK_FIELDS. It means visibility to internal state of TCP flows. It also opens up mechanism of new callbacks for analytics and better decision making (w.r.t. provisioning dynamic resources). Example of new callbacks are, notify when packets are sent or received. This feature has to be used with caution - a user shouldn' | ||
+ | |||
+ | The next talk covered Large Scale TCP Analytics collection. It described issues with inet_diag (referring to Telco use case) such as events getting dropped, polling takes long time, no events during connection setup and termination. It also covered issues about getting information about connections/ | ||
+ | |||
+ | The next talk discussed TCP Analytic at Microsoft covering real life problem being dealt in Microsoft. It covered about several classes of problems such as connectivity and performance. There are various reasons for connectivity problems such as "app failed to connect" | ||
+ | |||
+ | Following, the talk was about TCP Rx window, network congestion, CPU usage, described typical analysis process for connectivity and performance problems. For connectivity issues - tracing and packet capture, detailed tracing for connection setup. To analyze performance issues and attempting to micro benchmark to rule out application issues such as time sequence plots, TCP stats, and network trace analysis. | ||
+ | |||
+ | The next talk discussed TCP stats in regards to mapping of user to servers based on TCP stats (delivery metrics). It covered stats collection methods such as random sampling (callbacks in TCP layer, additional per socket stats), usage of mmap, poll to retrieve tcpsockstat from / | ||
+ | |||
+ | The final talk was related to TCP Analytic for Satellite Broadband. This talk covered issues about TCP perf challenges and need of min RTT of 500 ms and how none of the congestion algorithm deals with it. The recommendation was to use PEP (Perf Enhancement Proxies) to avoid congestion. PEP and AQM (Active queue management) avoid packet drops. It also covered the need to monitor TCP performance issues (needed to meet Service Level Objective) and monitoring challenges | ||
+ | 1. active measurement if intrusive and not scalable | ||
+ | 2. use of passive measurement (L2 stats monitoring) | ||
+ | The talk also discussed QoE assurance and need of troubleshooting abnormality by correlating PEP TCP flow stats with RF stats. | ||
+ | |||
+ | Site: https:// |
0x13/reports/d1t1t01-tcp-analytics.1553725903.txt.gz · Last modified: 2019/09/28 17:04 (external edit)