Name: Boundaryless Computing: Optimizing LLM Performance, Cost, and Efficiency in Multi-Cloud Architecture | 无边界计算：在多云架构中优化LLM性能、成本和效率 - Jian Zhu, Red Hat & Kai Zhang, Alibaba Cloud Intelligence
Start: 2024-08-21T13:50:00+0800
End: 2024-08-21T14:25:00+0800

In-person
21-23 August, 2024
Learn More and Register to Attend

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon + Open Source Summit + AI_Dev China 2024 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in Hong Kong Standard Time (UTC +8). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change and session seating is available on a first-come, first-served basis.

亲临现场

2024年8月21-23日

了解更多并注册参加

Sched应用程序允许您创建自己的日程安排，但不能替代您的活动注册。您必须注册参加KubeCon + CloudNativeCon + Open Source Summit + AI_Dev China 2024，才能参加会议。如果您尚未注册但希望加入我们，请访问活动注册页面购买注册。

请注意：本日程自动显示为香港标准时间（UTC +8）。要查看您偏好的时区的日程，请从右侧“按日期筛选”上方的下拉菜单中选择。日程可能会有变动，会议席位先到先得。

Wednesday August 21, 2024 1:50pm - 2:25pm HKT

Level 1 | Hung Hom Room 7

For large language model (LLM) inference, GPU resources within a single data center or cloud region often cannot meet all user demands. Additionally, for the end-users, deploying across multiple geographic regions is necessary to provide an optimal user experience. However, managing model distribution, synchronization, and consistency across multiple regions presents new challenges. To address this, the OCM and Fluid communities have collaborated to automate the multi-region distribution of inference applications through OCM's multi-cluster application deployment capabilities, combined with Fluid's data orchestration capabilities. This automation facilitates the cross-regional distribution and pre-warming of large models, enhancing the efficiency of model deployment and upgrades.

对于大型语言模型（LLM）推理，单个数据中心或云区域内的GPU资源通常无法满足所有用户需求。此外，对于最终用户来说，跨多个地理区域部署是为了提供最佳用户体验。然而，在多个地区管理模型分发、同步和一致性会带来新的挑战。为了解决这个问题，OCM和Fluid社区合作，通过OCM的多集群应用部署能力和Fluid的数据编排能力自动化实现推理应用的多地区分发。这种自动化促进了大型模型的跨地区分发和预热，提高了模型部署和升级的效率。

Speakers

Kai Zhang

Senior Staff Engineer, Alibaba

Kai Zhang is a Senior Staff Engineer at Alibaba Cloud Intelligence, where he has been part of the team developing the Alibaba Cloud container service for Kubernetes (ACK) for over 6 years. He currently leads ACK’s Cloud native AI product and solution offerings. Before this, he spent... Read More →

Jian Zhu

Senior Software Engineer, RedHat

Zhu Jian is a senior software engineer at RedHat, core contributor to open cluster management project. Jian enjoys solving multi-cluster workload distribution problems and extending OCM with add-ons.

Wednesday August 21, 2024 1:50pm - 2:25pm HKT
Level 1 | Hung Hom Room 7

KubeCon + CloudNativeCon Sessions, AI + ML

Experience Level | 内容经验水平 中级 (Intermediate)
Language | 语言 中文 (Chinese)

KubeCon + CloudNativeCon + Open Source Summit + AI_dev China 2024

Kai Zhang

Jian Zhu

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!