Loading…
Attending this event?
In-person
21-23 August, 2024
Learn More and Register to Attend

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon + Open Source Summit + AI_Dev China 2024 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in Hong Kong Standard Time (UTC +8). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change and session seating is available on a first-come, first-served basis. 

亲临现场
2024年8月21-23日
了解更多并注册参加

Sched应用程序允许您创建自己的日程安排,但不能替代您的活动注册。您必须注册参加KubeCon + CloudNativeCon + Open Source Summit + AI_Dev China 2024,才能参加会议。如果您尚未注册但希望加入我们,请访问活动注册页面购买注册。

请注意:本日程自动显示为香港标准时间(UTC +8)。要查看您偏好的时区的日程,请从右侧“按日期筛选”上方的下拉菜单中选择。日程可能会有变动,会议席位先到先得。
Wednesday August 21, 2024 2:40pm - 3:15pm HKT
Today cloud-native infra is vital for AI/ML, administrative complexities and the growing demand for compute resources drive devs towards multi-cluster patterns. Batch scheduling projects, like Kueue, are valuable for efficient AI/ML training in a single Kubernetes cluster. Multi-cluster management platforms like OCM and Fleet simplify cluster management and provide advanced scheduling features. We hope to bridge the best of both worlds to simplify user operations and reduce confusion between different systems. In this talk, we will showcase that with the help of Sig Multi-Cluster's newly proposed API - ClusterProfile, combined with OCM, Fleet, and Kueue, to address these challenges. We will demonstrate that MultiKueue setup can be easily automated with the help of the ClusterProfile API; with a few tweaks, users can use OCM and Fleet's advanced scheduling features through MultiKueue to smart place AI/ML jobs across the clusters to maximize resource utilization like GPU to save costs.

今天,云原生基础设施对于人工智能/机器学习、管理复杂性以及对计算资源需求不断增长至关重要,这推动开发人员转向多集群模式。像Kueue这样的批处理调度项目对于在单个Kubernetes集群中高效进行人工智能/机器学习训练非常有价值。OCM和Fleet等多集群管理平台简化了集群管理,并提供了高级调度功能。我们希望将两者的优势结合起来,简化用户操作,减少不同系统之间的混乱。 在本次演讲中,我们将展示如何借助Sig Multi-Cluster最新提出的API - ClusterProfile,结合OCM、Fleet和Kueue来解决这些挑战。我们将演示如何通过ClusterProfile API轻松自动化MultiKueue设置;通过一些调整,用户可以利用OCM和Fleet的高级调度功能,通过MultiKueue智能地在集群之间放置人工智能/机器学习作业,以最大化资源利用率,如GPU,以节省成本。
Speakers
avatar for Qing Hao

Qing Hao

Senior Software Engineer, RedHat
Qing Hao is a senior software engineer at RedHat, where she works as the maintainer of Open Cluster Management. Qing has interest in solving complex problems in the multi-clusters areas, eg, application scheduling, and management components rolling upgrade. Prior to RedHat, she worked... Read More →
avatar for Chen Yu

Chen Yu

Senior Software Engineer, Microsoft
Chen Yu is a senior software engineer at Microsoft with a keen interest in cloud-native computing. He is currently working on Multi-Cluster Kubernetes and contributing to the Fleet project open-sourced by Azure Kubernetes Service.
Wednesday August 21, 2024 2:40pm - 3:15pm HKT
Level 1 | Hung Hom Room 7
  KubeCon + CloudNativeCon Sessions, AI + ML

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link