Name: Unlocking LLM Performance with EBPF: Optimizing Training and Inference Pipelines | 通过eBPF解锁LLM性能：优化训练和推理管道 - Yang Xiang, Yunshan Networks, Inc.
Start: 2024-08-23T16:05:00+0800
End: 2024-08-23T16:40:00+0800

In-person
21-23 August, 2024
Learn More and Register to Attend

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon + Open Source Summit + AI_Dev China 2024 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in Hong Kong Standard Time (UTC +8). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change and session seating is available on a first-come, first-served basis.

亲临现场

2024年8月21-23日

了解更多并注册参加

Sched应用程序允许您创建自己的日程安排，但不能替代您的活动注册。您必须注册参加KubeCon + CloudNativeCon + Open Source Summit + AI_Dev China 2024，才能参加会议。如果您尚未注册但希望加入我们，请访问活动注册页面购买注册。

请注意：本日程自动显示为香港标准时间（UTC +8）。要查看您偏好的时区的日程，请从右侧“按日期筛选”上方的下拉菜单中选择。日程可能会有变动，会议席位先到先得。

Friday August 23, 2024 4:05pm - 4:40pm HKT

Level 1 | Hung Hom Room 2

The training and inference processes of Large Language Models (LLMs) involve handling vast amounts of model data and training data, and consume significant GPU compute resources. However, enhancing GPU utilization becomes extremely challenging in the absence of observability. This presentation will introduce how to achieve observability in LLM training and inference processes with zero disruption using eBPF. This includes utilizing Memory Profiling to understand the loading performance of models and training data, Network Profiling to comprehend the data exchange performance, and GPU Profiling to analyze GPU's MFU (Model FLOPs Utilization) and performance bottlenecks. Additionally, we will share the practical effects of implementing observability in a PyTorch LLM application and the llm.c project using eBPF, aiming to enhance training and inference performance.

大型语言模型（LLMs）的训练和推断过程涉及处理大量的模型数据和训练数据，并消耗大量的GPU计算资源。然而，在缺乏可观察性的情况下，提高GPU利用率变得极具挑战性。本次演讲将介绍如何利用eBPF在LLM训练和推理过程中实现零中断的可观察性。这包括利用内存分析来了解模型和训练数据的加载性能，网络分析来理解数据交换性能，以及GPU分析来分析GPU的MFU（模型FLOPs利用率）和性能瓶颈。此外，我们将分享在PyTorch LLM应用程序和llm.c项目中使用eBPF实现可观察性的实际效果，旨在提高训练和推理性能。

Speakers

Yang Xiang

VP of Engineering, Yunshan Networks, Inc.

Received a Ph.D. from Tsinghua University, and currently serving as VP of Engineering at Yunshan Networks and the head of the DeepFlow open-source community. He has presented academic papers on topics such as application observability and network measurement at top international academic... Read More →

Friday August 23, 2024 4:05pm - 4:40pm HKT
Level 1 | Hung Hom Room 2

KubeCon + CloudNativeCon Sessions, Observability

Experience Level | 内容经验水平 中级 (Intermediate)
Language | 语言 中文 (Chinese)

KubeCon + CloudNativeCon + Open Source Summit + AI_dev China 2024

Yang Xiang

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!