This is an old revision of the document!
Day 3 / Track 1 / Talk 5 Panel: Industry Perspectives Panelists: Shawn Zandi (LinkedIn) Marek Majkowski (Cloudflare) Vikram Siwach (MobiledgeX)
Panel co-chairs Sowmini Varadhan Roopa Prabhu
Report by: Anjali Singhai
This session started with the three panelists presenting their view on how Linux networking is being used in the industry followed by questions and discussion.
The first presenter was Vikram Siwach, a product manager from MobiledgeX. Vikram started his talk by discussing that latency is becoming important as well as AI and machine learning followed by the current state of networks. Currently there are more than 3.7 billion devices. The network CAPEX the last ten years are around 1.7 Trillion USD ( mainly used as bit pipe). The cloud capex the last ten years is at 300 billion USD.
The major issue is that a client is mobile while the cloud is static and has no notion of location of the use. Vikram proposed a new better way: CloudNet, an architecture to bring the devices closer to the cloud which features Device Native and Zero touch. The goal is to build a better cloud for these Applications: Mobile Data thinning, IOT, Mobile Gaming, New pervasive and Immersive Experiences, Drone swarming, Compliance and Privacy.
Vikram then introduced the cloudlet architecture with the Distributed Matching Engine finding the best Cloudlet. The cloudlets software stack will spawn a cluster for a particular client and validate User/Client Identity and location,
Vikram made the cast to invest in Linux. Multitenancy model emerging at Edge, workload served at Real Edge can span multiple providers to provide consistent service and APIs to developers. Infrastructure needs to have programmability built-in, acceleration (don’t see smart NICs, no sophisticated HW, commodity HW), mobility, slicing: e2e traffic SLAs and embedded security.
The Linux Kernel is a mature common denominator to offer packet programming at container, VM, host, Switch with any OS, ASIC combination based on Application needs.
The next speaker was Artur Makutunowicz from LinkedIn, who presented SoNIC and Self-Healing Networks. Artur begun by proving a quick view on the LinkedIn infrastructure and its growth which contains approximately 250k bare metal servers in roughly 20 locations globally peering with 4000 networks. They experience a 34% growth every year with high bandwidth and compute demands due to organic growth.
Their main problems were not the design, but keeping the site up, planning for 10x growth, scale on demand, active-active datacenters and innovating for Hyperscale (Unlimited bandwidth, Compute on demand, Programmable datacenter, Scale cost effectively).
Artur continued with the fact that they own the code and that enabled them to control their own destiny with higher velocity, more granular rollouts, while having flexibility and simplicity.
Their data Center Design is a single SKU data ceter, single chip architecture, 5 stage BGP Clos. Their design principles include simplicity works at scale, openness by using community tools when possible, independent maintaining a vendor agnostic profile, and programmability.
Their hardware is a custom designed merchant Silicon, no big chassis For software they ended up using SONiC, Nos based on Linux with Minimum feature Ipv4/IPv6 and using kafka for self Healing Telemetry
Linux at Cloudflare ( Marek Majkowski)
Global network, speed and security
Edge & Core ( two types of Data center)
Edge Network in 100 locations around the world , uses anycast network
Uniform Config, No Virtualization, no containers, raw metal
Thousands of IP
Multiple Applications
HHTP, DNS and Other
HW server: moving to ARM server ( less power consumption)
Edge Network - uniform stack
See image
XDP for classification and load balancing, protocols and workers ( engineX)
XDP doesn't do rate limiting right now?
Socket Dispatch - DoS consideration
· Case of 30k UDP sockets
· Solution: ebpf token
· Socket dispatch - zero downtime restart for QUIC
· Socket dispatch: AnyIP
· See picture
Ebpf_exporter:
BPF is everywhere
XDP for Ddos, XDP for load balance etc
Sowmini: Kernel Bypass or Not?
MobileEdge: XDP sounds interesting…Vikram. Certain issues , have to develop a team of kernel developers. Getting ready for next generation of Internet…low latency…
Linkedin: Looking for TCP Analytics… more application focused
Cloudfare: kernel bypass they were using….but our applications are CPU bond…manageability is to cost you pay for latency gain with kernel bypass
Tom H: Kernel upgrade is it an issue?
Cloudflare: since we use XDP, kernel doesn’t have to be updated as often.
linkedin: If application and network boundaries are clear, applications can be migrated for upgrade. There are still challenges.
MobileEdge: It’s a huge problem, isolate the machine right now…but linux has to be end device kernel. People slow to move to latest kernel. It’s a real problem.
Rely on Distro vendor they will take care of it. Roopa
Need for programmability: looking for programmable Hw or just the SW
MobileEdge: No not for programmable HW…cost is high, not jumping on it. We still have to scale…no custom chips
Linkedin: SW programmability is most important. Biggest value is in manageability plane, not in dataplane. P4 is mostly dataplane.
Cloudflare: Before XDP, there was need for bypass or HW offload. Don’t see the need after XDP. Good old TCP offload can work for many flows.
Simpler Dataplanes: Switch ASICs to offload data plane in HW. Roopa?
Much more of a usecase in edge side. No use case.
Program the ToR switches : Mobile, need the packet to go as fast as possible.
Most of SW stack written by Clodflare.
Roopa question on Network Analytics:
Do you have any specific requirement or use today from kernel stack
Linkedin: Challenge is collecting the data at scale at real time…to do the analytics. Where does the network start? Collect closed to the application or socket.
Site: https://www.netdevconf.org/0x13/session.html?panel-industry-perspectives Slides: Videos: