Loading…
Attending this event?
In-person
21-23 August, 2024
Learn More and Register to Attend

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon + Open Source Summit + AI_Dev China 2024 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in Hong Kong Standard Time (UTC +8). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change and session seating is available on a first-come, first-served basis. 

亲临现场
2024年8月21-23日
了解更多并注册参加

Sched应用程序允许您创建自己的日程安排,但不能替代您的活动注册。您必须注册参加KubeCon + CloudNativeCon + Open Source Summit + AI_Dev China 2024,才能参加会议。如果您尚未注册但希望加入我们,请访问活动注册页面购买注册。

请注意:本日程自动显示为香港标准时间(UTC +8)。要查看您偏好的时区的日程,请从右侧“按日期筛选”上方的下拉菜单中选择。日程可能会有变动,会议席位先到先得。
Wednesday August 21, 2024 5:15pm - 5:50pm HKT
With AI's growing popularity, Kubernetes has become the de facto AI infrastructure. However, the increasing number of clusters with diverse AI devices (e.g., NVIDIA, Intel, Huawei Ascend) presents a major challenge. AI devices are expensive, how to better improve resource utilization? How to better integrate with K8s clusters? How to manage heterogeneous AI devices consistently, support flexible scheduling policies, and observability all bring many challenges The HAMi project was born for this purpose. This session including: * How K8s manages heterogeneous AI devices (unified scheduling, observability) * How to improve device usage by GPU share * How to ensure the QOS of high-priority tasks in GPU share stories * Support flexible scheduling strategies for GPU (NUMA affinity/anti-affinity, binpack/spread etc) * Integration with other projects (such as volcano, scheduler-plugin, etc.) * Real-world case studies from production-level users. * Some other challenges still faced and roadmap

随着人工智能的日益普及,Kubernetes已成为事实上的人工智能基础设施。然而,不断增加的具有多样化人工智能设备(如NVIDIA、Intel、华为Ascend)的集群数量带来了重大挑战。人工智能设备价格昂贵,如何更好地提高资源利用率?如何更好地与K8s集群集成?如何一致地管理异构人工智能设备,支持灵活的调度策略和可观察性都带来了许多挑战。HAMi项目应运而生。本场演讲包括: * K8s如何管理异构人工智能设备(统一调度、可观察性) * 如何通过GPU共享提高设备使用率 * 如何确保GPU共享故事中高优先级任务的QOS * 为GPU支持灵活的调度策略(NUMA亲和性/反亲和性、binpack/spread等) * 与其他项目的集成(如volcano、scheduler-plugin等) * 来自生产级用户的实际案例研究。 * 仍然面临的一些其他挑战和路线图
Speakers
avatar for xiaozhang

xiaozhang

Senior Technical Lead, DaoCloud
- Xiao Zhang is leader of the Container team(focus on infra,AI,Muti-Cluster,Cluster - LCM,OCI) - Kubernetes / Kubernetes-sigs active Contributor、member - Karmada maintainer,kubean maintainer,HAMi maintainer - Cloud-Native Developer - CNCF Open Source Enthusiast. - GithubID: waw... Read More →
avatar for Mengxuan Li

Mengxuan Li

senior developer, The 4th Paradigm Co., Ltd
Reviewer of volcano community Founder of CNCF Landscape project HAMi responsible for the development of gpu virtualization mechanism on volcano. It have been merged in the master branch of volcano, and will be released in v1.8. speaker, in OpenAtom Global Open Source Commit#2023 speaker... Read More →
Wednesday August 21, 2024 5:15pm - 5:50pm HKT
Level 1 | Hung Hom Room 3

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link