Differences

This shows you the differences between two versions of the page.

--- 0x13:reports:d1t1t01-tcp-analytics [2019/04/02 13:02] – ehalep
+++ 0x13:reports:d1t1t01-tcp-analytics [2019/09/28 17:04] (current) – external edit 127.0.0.1
@@ Line 4: / Line 4: @@
 Report by: Kiran Patil
-As the outlined indicates, this is collection of session about TCP analytic.
+The goal of this session was to highlight the problems with TCP analytics, means of deployment of TCP/IP based networks and "how to monitor TCP flow efficiently".
-Goal was to highlight the problems with TCP analytic - means in deployments of TCP/IP based networks,
+This sessions covered experiences from various deployments of companies who deal with TCP analytics on day-to-day basis. There were various techniques described, shared and proposed to monitor flows and provide some sort of QoS by monitoring, metering and improving performance of TCP flows and eventually better service to end-user.
-"how to monitor TCP flow efficiently". This sessions covers experiences from various deployments,
+The following techniques were discussed and proposed:
+TCP_INFO/TCP_SO_TIMESTAMPING
+Extended TCP stack instrumentation (RFC 4898)
+Monitoring TCP
+Usage of TCP-eBPF (at Premature stage)
+Large scale TCP analytic collection
+TCP Analytics at Microsoft
+TCP Analytic for satellite Broadband
-companies who deals with TCP analytic on day-to-day basis. There are various techniques
+During the meeting there was an introduction and walkthrough of the TCP_INFO/TCP_SO_TIMESTAMPING. This part talked about details of TCP_INFO and its fields (tcp_state, options, data_seg*, delivered, retransmission) and how this information is useful in determining state of TCP flows and provides insight into TCP flows (aide to debug).
-described/shared/proposed to monitor flows and provide some sort of QoS by monitoring,
+One of the issues that were described was that TCP_INFO is a large blob and obtaining TCP_INFO has measurable overhead due to socket lock, and still it doesn't include everything (CC used, SOL_TCP, etc..)
-metering and improving performance of TCP flows and eventually better service to end-user.
+This was followed by a covering of the usage of OPT_STATS with TCP_SO_TIMESTAMPING and its use for perf analysis. The key-takeaway was that TCP_INFO and TCP_SO_TIMSTAMPING are both powerful instrumentation but their usage must be wisely.
-Following techniques were discussed/proposed:
+Some caveats to remember: If enabled on every TX packets, there is a 20% reduction/regression for throughput.
-     - TCP_INFO/TCP_SO_TIMESTAMPING
+One question that arose was what happens when TSO offload is used for TCP? The answer was that timestamp will be available for last_byte if packet send resulted into multiple packets unlike UDP.
-     - Extended TCP stack instrumentation (RFC 4898)
+Next there was a talk about extended TCP stack instrumentation. This talk covered RFC 4898 (Framework for TCP stack instrumentation) which defines approximately 120 MIBs for instrumentation. In kernel, RFC 4898 stats are implemented via. TCP_INFO. This session covered 3 implementations
+. Web100
+. Web10G
+. tcpstat.
-     - Monitoring TCP
+For Web10G, in TCP Instrumentation, metrics are stored in hash (in memory structs). This feature can be enabled through kernel parameter (net.ipv4.tcp_estats). This stats are accessed via netlink kernel module (TCP_ESTATS and TCP_INFO). There is user land API through limnl library for user space to query stats.
+Web10G provides real work detail flow metric and used in multiple research such as TEACUP. It is also used in various papers exploiting buffer bloat, cloud perf, wireless latency, and network modeling reproducibility.
-    - Usage of TCP-eBPF (at Premature stage)
+This talk also covered caveats and recommendations. For example it is useful for data transfer nodes and not deploy it in high scale production environment (e.g. 100K connection)
-    - Large scale TCP analytic collection
+This talk also explained the use case "XSight", in Web10G in production environment. It is used to identify failing and sub-optimal flows during lifespan of flows. Also it is used to provide panoptical view of network performance.
-    - TCP Analytics at Microsoft
+Next was a talk about Monitoring TCP, covering challenges with respect to monitoring TCP such as "what stats to collect (TCP_CHRONO) and how frequently to sample TCP_INFO state". It also covered interesting TCP state events and how TCP-BPF opens up new possibilities.
+This talk also included TCP-BPF and how TCP-BPF can be used to provide per connection optimization for TCP parameters. It covered tunable parameters for intra-DC traffic such as use of small buffers, small SYN_RTO, and cwnd-clamp.
-    - TCP Analytic for satellite Broadband
+TCP-BPF is a new BPF program and it provides access to TCP_SOCK_FIELDS. It means visibility to internal state of TCP flows. It also opens up mechanism of new callbacks for analytics and better decision making (w.r.t. provisioning dynamic resources). Example of new callbacks are, notify when packets are sent or received. This feature has to be used with caution - a user shouldn't enable on all flows but as needed (randomly on small % of flows) or enable while debugging atypical flow. Additionally this talk covered external trigger (e.g. TCP_INFO) like "ss". TCP-BPF per connection is not yet there but there is TCP-BPF per cgroup.
+The next talk covered Large Scale TCP Analytics collection. It described issues with inet_diag (referring to Telco use case) such as events getting dropped, polling takes long time, no events during connection setup and termination. It also covered issues about getting information about connections/flows (such as which congestion algorithm is used) out of kernel to user space and how this information is propagated to user space.
-Walk thru of TCP_INFO/TCP_SO_TIMESTAMPING: This session talks about details of TCP_INFO and its
+The next talk discussed TCP Analytic at Microsoft covering real life problem being dealt in Microsoft. It covered about several classes of problems such as connectivity and performance. There are various reasons for connectivity problems such as "app failed to connect" - this could be due to network/infrastructure issues, no listener, listen backlog, firewall rules, port exhaustion, routing misconfiguration, NIC driver issues. Likewise it covered performance problems such as "why TCP throughput is so low" and its possible causes - application issues (not posting enough buffers, not draining fast-enough),
-fields (tcp_state, options, data_seg*, delivered, retransmission) and how this information
-is useful in determining state of TCP flows and provides insight into TCP flows (aide to debug)
-TCP_INFO is large blob and obtaining TCP_INFO has measurable overhead due to socket lock,
-and still it doesn;t include everything (CC used, SOL_TCP, etc..)
+Following, the talk was about TCP Rx window, network congestion, CPU usage, described typical analysis process for connectivity and performance problems. For connectivity issues - tracing and packet capture, detailed tracing for connection setup. To analyze performance issues and attempting to micro benchmark to rule out application issues such as time sequence plots, TCP stats, and network trace analysis.
+The next talk discussed TCP stats in regards to mapping of user to servers based on TCP stats (delivery metrics). It covered stats collection methods such as random sampling (callbacks in TCP layer, additional per socket stats), usage of mmap, poll to retrieve tcpsockstat from /dev/tcpsockstat, etc... It also covered TCP_INFO and how this information could be useful to derive delivery metrics. Proposal for TCP stats collection using BPF/tracepoint, trace per socket. There was suggestion from Google to trace "sendmsg" using cmsg, TCP_INFO, and timestamping.
-This session covers usage of OPT_STATS with TCP_SO_TIMESTAMPING and its use for perf analysis
+The final talk was related to TCP Analytic for Satellite Broadband. This talk covered issues about TCP perf challenges and need of min RTT of 500 ms and how none of the congestion algorithm deals with it. The recommendation was to use PEP (Perf Enhancement Proxies) to avoid congestion. PEP and AQM (Active queue management) avoid packet drops. It also covered the need to monitor TCP performance issues (needed to meet Service Level Objective) and monitoring challenges
+. active measurement if intrusive and not scalable
+. use of passive measurement (L2 stats monitoring)
+The talk also discussed QoE assurance and need of troubleshooting abnormality by correlating PEP TCP flow stats with RF stats.
-Key-takeaway: TCP_INFO and TCP_SO_TIMSTAMPING are both powerful instrumentation but use them wisely
-Caveats to remember: If enabled on every TX packets, 20% reduction/regression for throughput
-Questions: When happens when TSO offload is used for TCP?
-Answer: Timestamp will be available for last_byte if packet send resulted into multiple packets unlike UDP
-Extended TCP stack instrumentation: This talk covers about RFC 4898 (Framework for TCP stack instrumentation)
-which defines approx. ~120 MIBs for instrumentation. In kernel, RFC 4898 stats are implemented via. TCP_INFO.
-For details, refer to TCP_INFO and RFC 4898. This session covers following 3 implementations
- - Web100, Web10G, tcpstat.
-Web10G: TCP Instrumentation where metrics are stored in hash (in memory structs). This feature
-can be enabled thru kernel parameter (net.ipv4.tcp_estats). This stats are accessed via netlink kernel module (TCP_ESTATS and TCP_INFO).
-There is user land API thru limnl library for user space to query stats. This talks covers about value of Web10G in TCP analytic.
-Web10G provides real work detail flow metric and used in multiple research such as TEACUP. It is also used
-in various papers exploiting buffer bloat, cloud perf, wireless latency, and network modeling reproducibility.
-This session covers about caveats and recommendation - when to use and when to not use.
-- This is useful for data transfer nodes and not deploy it in high scale production environment (e.g. 100K connection)
-This talk explains use case "XSight", Web10G in production environment:
-               - Used to identify failing and sub-optimal flows during lifespan of flows.
-                 Also used to provide panoptical view of network performance.
-Monitoring TCP: This session covers about challenges w.r.t. monitoring TCP such as
-"what stats to collect (TCP_CHRONO) and how frequently to sample TCP_INFO state". It also
-covers about interesting TCP state events and how TCP-BPF opens up new possibilities
-TCP-BPF: This session covers per connection optimization for TCP parameters using TCP-BPF.
-It covers about tunable params for intra-DC traffic such as use of small buffers, small SYN_RTO, and cwnd-clamp.
-TCP-BPF is new BPF program and it provides access to TCP_SOCK_FIELDS. It means visibility to internal state of TCP flows.
-It also opens up mechanism of new callbacks for analytics and better decision making (w.r.t. provisioning dynamic resources).
-Example of new callbacks are, notify when packets are sent or received. Use this feature
-with caution - means don't enable on all flows but as needed (randomly on small % of flows) or enable
-while debugging atypical flow. This session also covers about external trigger (e.g. TCP_INFO) like "ss".
-TCP-BPF per connection is not yet there but there is TCP-BPF per cgroup.
-Large Scale TCP Analytics collection: This session describes issues with inet_diag (referring to Telco use case)
-such as events getting dropped, polling takes long time, no events during connection setup and termination.
-It also covers issues about getting information about connections/flows (such as which congestion
-algorithm is used) out of kernel to user space and how this information is propagated to user space.
-TCP Analytic at Microsoft: This session covers real life problem being dealt in Microsoft.
-It covers about class of problems - connectivity and performance. There are various reasons
-for connectivity problems such as "app failed to connect" - this could be due to network/infrastructure issues,
-no listener, listen backlog, firewall rules, port exhaustion, routing misconfiguration, NIC driver issues.
-Likewise it covers about performance problems such as "why TCP throughput is so low" and
-its possible causes - application issue (not posting enough buffers, not draining fast-enough),
-TCP Rx window, network congestion, CPU usage. This talk describes typical analysis process for
-connectivity and performance problems. For connectivity issues - tracing and packet capture, detailed tracing
-for connection setup. TO analyze performance issues - attempt of micro benchmark to rule out application
-issue, time sequence plots, TCP stats, and network trace analysis.
-TCP stats: This session talks about mapping of user to severs based on TCP stats (aka delivery metrics).
-It covers stats collection methods such as random sampling (callbacks in TCP layer, additional per socket stats),
-usage of mmap, poll to retrieve tcpsockstat from /dev/tcpsockstat, etc... It also covers
-about TCP_INFO and how this information can be useful to derive delivery metrics. Proposal for
-TCP stats collection using BPF/tracepoint, trace per socket. There was suggestion form Google
-to trace "sendmsg" using cmsg, TCP_INFO, and timestamping.
-TCP Analytic for Satellite Broadband: This session talks about TCP perf challenges and need of
-min RTT of 500 ms and how none of the congestion algorithm deals with it. Recommendation
-is to use PEP (Perf Enhancement Proxies) to avoid congestion. PEP and AQM (Active queue management)
-avoid packet drops. It also covers about need to monitor TCP performance issues (needed
-to meet Service Level Objective) and monitoring challenges:
-               - active measurement if intrusive and not scalable
-               - use of passive measurement (L2 stats monitoring)
-It covers about QoE assurance and need of troubleshooting abnormality by correlating PEP TCP flow
-stats with RF stats.
-Site: https://www.netdevconf.org/0x13/session.html?tcp-analytics
+Site: https://www.netdevconf.info/0x13/session.html?tcp-analytics