Name: A Story of Managing Kubernetes Watch Events End-to End Flow in Extremely Large Clusters | 在极大规模集群中管理Kubernetes watch事件端到端流程的故事 - Bo Tang, Ant Group
Start: 2024-08-22T11:00:00+0800
End: 2024-08-22T11:35:00+0800

In-person
21-23 August, 2024
Learn More and Register to Attend

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon + Open Source Summit + AI_Dev China 2024 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in Hong Kong Standard Time (UTC +8). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change and session seating is available on a first-come, first-served basis.

亲临现场

2024年8月21-23日

了解更多并注册参加

Sched应用程序允许您创建自己的日程安排，但不能替代您的活动注册。您必须注册参加KubeCon + CloudNativeCon + Open Source Summit + AI_Dev China 2024，才能参加会议。如果您尚未注册但希望加入我们，请访问活动注册页面购买注册。

请注意：本日程自动显示为香港标准时间（UTC +8）。要查看您偏好的时区的日程，请从右侧“按日期筛选”上方的下拉菜单中选择。日程可能会有变动，会议席位先到先得。

Thursday August 22, 2024 11:00am - 11:35am HKT

Level 1 | Hung Hom Room 2

The K8s watching mechanism has not been given the attention it deserves for an extended period. However, it is critical to the K8s cluster in both stability and perfermance aspsects and watch latency is a perfect indicator of cluster health. This talk begins by introducing the measurement of watch events latency and then defines watch SLI and SLO metrics. Using watch SLO as a guide, the talk will show the bottleneck identification process for watching. And the talk will describe the optimizations made to apiserver, etcd, kubelet, controller-runtime and clients such as controllers and schedulers in various aspects wrt watching, including watch latency, pod provisioning time, bandwidth, cpu/mem etc. With these optimizations, daily P99 watch latency has improved by over 90% in large clusters (~20K nodes) impacting billions of watch events. Pod provisioning time has improved by over 60%. Apiserver bandwidth has decreased by 50%. The overall stability of K8s cluster has improved greatly.

K8s观察机制长期以来并未得到应有的重视。然而，它对于K8s集群的稳定性和性能至关重要，观察延迟是集群健康的完美指标。本次演讲将首先介绍观察事件延迟的测量，然后定义观察SLI和SLO指标。通过观察SLO作为指导，演讲将展示观察瓶颈识别过程。演讲将描述在观察方面对apiserver、etcd、kubelet、controller-runtime和客户端（如控制器和调度器）进行的各种优化，包括观察延迟、Pod提供时间、带宽、CPU/内存等方面。通过这些优化，大型集群（~20K节点）中每日P99观察延迟已经提高了超过90%，影响了数十亿次观察事件。Pod提供时间已经提高了超过60%。Apiserver带宽减少了50%。K8s集群的整体稳定性得到了极大的改善。

Speakers

Bo Tang

Senior Engineer, Ant Group

Bo Tang is a senior engineer in Ant Group. He is currently working on scalability and performance optimization of Kubernetes clusters.

Thursday August 22, 2024 11:00am - 11:35am HKT
Level 1 | Hung Hom Room 2

KubeCon + CloudNativeCon Sessions, Operations + Performance

Experience Level | 内容经验水平 中级 (Intermediate)
Language | 语言 中文 (Chinese)

KubeCon + CloudNativeCon + Open Source Summit + AI_dev China 2024

Bo Tang

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!