This is an old revision of the document!
Day 3 / Track 1 / Talk 5 Panel: Industry Perspectives Panelists: Shawn Zandi (LinkedIn) Marek Majkowski (Cloudflare) Vikram Siwach (MobiledgeX)
Panel co-chairs Sowmini Varadhan Roopa Prabhu
Report by: Anjali Singhai
This session started with the three panelists presenting their view on how Linux networking is being used in the industry followed by questions and discussion.
The first presenter was Vikram Siwach, a product manager from MobiledgeX. Vikram started his talk by discussing that latency is becoming important as well as AI and machine learning followed by the current state of networks. Currently there are more than 3.7 billion devices. The network CAPEX the last ten years are around 1.7 Trillion USD ( mainly used as bit pipe). The cloud capex the last ten years is at 300 billion USD.
The major issue is that a client is mobile while the cloud is static and has no notion of location of the use. Vikram proposed a new better way: CloudNet, an architecture to bring the devices closer to the cloud which features Device Native and Zero touch.
Build a better cloud for these Applications: Mobile Data thinning, IOT, Mobile Gaming, New pervasive and Immersive Experiences, Drone swarming, Compliance and Privacy.
See picture for Architecture: cloudlets
Distributed Matching Engine:
DME finds best Cloudlet
Cloudlets Software Stack:
Spawn a cluster for a particular client.
Validate User/Client Identity and location,
Bring cloud closer to device: Make backend Mobile
Why invest in Linux? Multitenancy model emerging at Edge, workload served at Real Edge
Span multiple providers to provide consistent service and APIs to developers.
Infrastructure Needs:
Programmability in built
Acceleration ( don’t see smart NICs, no sophisticated HW, commodity HW)
Mobility: Anchorless
Slicing: e2e traffic SLAs
Security: embedded
Often attack on Cloud, need security
Linux Kernel is mature common denominator to offer packet programming at container, VM, host, Switch with any OS, ASIC combination based on Application needs.
Programmable Pipeline:
SoNIC and Self-Healing Networks
Artur Makutunowicz LinkedIn
~250k bare metal servers
~20 locations globally
4000 networks
Infrastructure Growth
34% every year
High bandwidth and compute demands due to organic growth..
East-west explosion, 1:1000
Call graph
Kafka
Hadoop
ML
Datacenter Journey
2010+ Operation Crisis, keep the site up.
Capacity Crisis: plan for 10x growth, scale on demand, active-active datac
Innovation for Hyperscale
Unlimited bandwidth, Compute on demand, Programmable datacenter, Scale cost effectively.
Own the code:
Enables us to control our own destiny
Higher velocity : more granular rollouts
Data Center Design
Single SKU data ceter, single chip architecture, 5 stage BGP Clos
Design Principles:
Simplicity works at scale
Open: use community tools when possible
Independent: vendor agnostic
Programmable
Building Block: HW
Merchant Silicon
Custom Design Switch
No big chassis
SoNIC, Nos based on Linux, Minimum feature Ipv4/IPv6
Self Healing Telemetry, kafka pipeline See picture
Linux at Cloudflare ( Marek Majkowski)
Global network, speed and security
Edge & Core ( two types of Data center)
Edge Network in 100 locations around the world , uses anycast network
Uniform Config, No Virtualization, no containers, raw metal
Thousands of IP
Multiple Applications
HHTP, DNS and Other
HW server: moving to ARM server ( less power consumption)
Edge Network - uniform stack
See image
XDP for classification and load balancing, protocols and workers ( engineX)
XDP doesn't do rate limiting right now?
Socket Dispatch - DoS consideration
· Case of 30k UDP sockets
· Solution: ebpf token
· Socket dispatch - zero downtime restart for QUIC
· Socket dispatch: AnyIP
· See picture
Ebpf_exporter:
BPF is everywhere
XDP for Ddos, XDP for load balance etc
Sowmini: Kernel Bypass or Not?
MobileEdge: XDP sounds interesting…Vikram. Certain issues , have to develop a team of kernel developers. Getting ready for next generation of Internet…low latency…
Linkedin: Looking for TCP Analytics… more application focused
Cloudfare: kernel bypass they were using….but our applications are CPU bond…manageability is to cost you pay for latency gain with kernel bypass
Tom H: Kernel upgrade is it an issue?
Cloudflare: since we use XDP, kernel doesn’t have to be updated as often.
linkedin: If application and network boundaries are clear, applications can be migrated for upgrade. There are still challenges.
MobileEdge: It’s a huge problem, isolate the machine right now…but linux has to be end device kernel. People slow to move to latest kernel. It’s a real problem.
Rely on Distro vendor they will take care of it. Roopa
Need for programmability: looking for programmable Hw or just the SW
MobileEdge: No not for programmable HW…cost is high, not jumping on it. We still have to scale…no custom chips
Linkedin: SW programmability is most important. Biggest value is in manageability plane, not in dataplane. P4 is mostly dataplane.
Cloudflare: Before XDP, there was need for bypass or HW offload. Don’t see the need after XDP. Good old TCP offload can work for many flows.
Simpler Dataplanes: Switch ASICs to offload data plane in HW. Roopa?
Much more of a usecase in edge side. No use case.
Program the ToR switches : Mobile, need the packet to go as fast as possible.
Most of SW stack written by Clodflare.
Roopa question on Network Analytics:
Do you have any specific requirement or use today from kernel stack
Linkedin: Challenge is collecting the data at scale at real time…to do the analytics. Where does the network start? Collect closed to the application or socket.
Site: https://www.netdevconf.org/0x13/session.html?panel-industry-perspectives Slides: Videos: