0x13:reports:d1t1t01-tcp-analytics
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
0x13:reports:d1t1t01-tcp-analytics [2019/04/02 20:51] – ehalep | 0x13:reports:d1t1t01-tcp-analytics [2019/09/28 17:04] (current) – external edit 127.0.0.1 | ||
---|---|---|---|
Line 19: | Line 19: | ||
During the meeting there was an introduction and walkthrough of the TCP_INFO/ | During the meeting there was an introduction and walkthrough of the TCP_INFO/ | ||
- | The issues that were described was that TCP_INFO is a large blob and obtaining TCP_INFO has measurable overhead due to socket lock, and still it doesn' | + | One of the issues that were described was that TCP_INFO is a large blob and obtaining TCP_INFO has measurable overhead due to socket lock, and still it doesn' |
This was followed by a covering of the usage of OPT_STATS with TCP_SO_TIMESTAMPING and its use for perf analysis. The key-takeaway was that TCP_INFO and TCP_SO_TIMSTAMPING are both powerful instrumentation but their usage must be wisely. | This was followed by a covering of the usage of OPT_STATS with TCP_SO_TIMESTAMPING and its use for perf analysis. The key-takeaway was that TCP_INFO and TCP_SO_TIMSTAMPING are both powerful instrumentation but their usage must be wisely. | ||
Line 32: | Line 32: | ||
3. tcpstat. | 3. tcpstat. | ||
- | For Web10G, in TCP Instrumentation, | + | For Web10G, in TCP Instrumentation, |
Web10G provides real work detail flow metric and used in multiple research such as TEACUP. It is also used in various papers exploiting buffer bloat, cloud perf, wireless latency, and network modeling reproducibility. | Web10G provides real work detail flow metric and used in multiple research such as TEACUP. It is also used in various papers exploiting buffer bloat, cloud perf, wireless latency, and network modeling reproducibility. | ||
Line 40: | Line 40: | ||
Next was a talk about Monitoring TCP, covering challenges with respect to monitoring TCP such as "what stats to collect (TCP_CHRONO) and how frequently to sample TCP_INFO state" | Next was a talk about Monitoring TCP, covering challenges with respect to monitoring TCP such as "what stats to collect (TCP_CHRONO) and how frequently to sample TCP_INFO state" | ||
- | This talk also covered | + | This talk also included |
+ | |||
+ | TCP-BPF is a new BPF program and it provides access to TCP_SOCK_FIELDS. It means visibility to internal state of TCP flows. It also opens up mechanism of new callbacks for analytics and better decision making (w.r.t. provisioning dynamic resources). Example of new callbacks are, notify when packets are sent or received. This feature has to be used with caution - a user shouldn' | ||
The next talk covered Large Scale TCP Analytics collection. It described issues with inet_diag (referring to Telco use case) such as events getting dropped, polling takes long time, no events during connection setup and termination. It also covered issues about getting information about connections/ | The next talk covered Large Scale TCP Analytics collection. It described issues with inet_diag (referring to Telco use case) such as events getting dropped, polling takes long time, no events during connection setup and termination. It also covered issues about getting information about connections/ | ||
- | The next talk discussed TCP Analytic at Microsoft covering real life problem being dealt in Microsoft. It covered about several classes of problems such as connectivity and performance. There are various reasons for connectivity problems such as "app failed to connect" | + | The next talk discussed TCP Analytic at Microsoft covering real life problem being dealt in Microsoft. It covered about several classes of problems such as connectivity and performance. There are various reasons for connectivity problems such as "app failed to connect" |
- | + | ||
- | TCP Rx window, network congestion, CPU usage. This talk describes typical analysis process for | + | |
- | + | ||
- | connectivity and performance problems. For connectivity issues - tracing and packet capture, detailed tracing | + | |
- | + | ||
- | for connection setup. TO analyze performance issues - attempt of micro benchmark to rule out application | + | |
- | + | ||
- | issue, time sequence plots, TCP stats, and network trace analysis. | + | |
+ | Following, the talk was about TCP Rx window, network congestion, CPU usage, described typical analysis process for connectivity and performance problems. For connectivity issues - tracing and packet capture, detailed tracing for connection setup. To analyze performance issues and attempting to micro benchmark to rule out application issues such as time sequence plots, TCP stats, and network trace analysis. | ||
+ | The next talk discussed TCP stats in regards to mapping of user to servers based on TCP stats (delivery metrics). It covered stats collection methods such as random sampling (callbacks in TCP layer, additional per socket stats), usage of mmap, poll to retrieve tcpsockstat from / | ||
- | TCP stats: This session talks about mapping of user to severs based on TCP stats (aka delivery metrics). | + | The final talk was related |
- | + | 1. active measurement if intrusive and not scalable | |
- | It covers stats collection methods such as random sampling (callbacks in TCP layer, additional per socket stats), | + | 2. use of passive measurement (L2 stats monitoring) |
- | + | The talk also discussed | |
- | usage of mmap, poll to retrieve tcpsockstat from / | + | |
- | + | ||
- | about TCP_INFO and how this information can be useful to derive delivery metrics. Proposal for | + | |
- | + | ||
- | TCP stats collection using BPF/ | + | |
- | + | ||
- | to trace " | + | |
- | + | ||
- | + | ||
- | + | ||
- | TCP Analytic for Satellite Broadband: This session talks about TCP perf challenges and need of | + | |
- | + | ||
- | min RTT of 500 ms and how none of the congestion algorithm deals with it. Recommendation | + | |
- | + | ||
- | is to use PEP (Perf Enhancement Proxies) to avoid congestion. PEP and AQM (Active queue management) | + | |
- | + | ||
- | avoid packet drops. It also covers about need to monitor TCP performance issues (needed | + | |
- | + | ||
- | to meet Service Level Objective) and monitoring challenges: | + | |
- | + | ||
- | | + | |
- | + | ||
- | | + | |
- | + | ||
- | + | ||
- | + | ||
- | It covers about QoE assurance and need of troubleshooting abnormality by correlating PEP TCP flow | + | |
- | + | ||
- | stats with RF stats. | + | |
- | + | ||
- | + | ||
- | Site: https:// | + | |
+ | Site: https:// |
0x13/reports/d1t1t01-tcp-analytics.1554238297.txt.gz · Last modified: 2019/09/28 17:04 (external edit)