KubeCon + CloudNativeCon + Open Source Summit + AI

In-person
21-23 August, 2024
Learn More and Register to Attend

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon + Open Source Summit + AI_Dev China 2024 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in Hong Kong Standard Time (UTC +8). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change and session seating is available on a first-come, first-served basis.

亲临现场

2024年8月21-23日

了解更多并注册参加

Sched应用程序允许您创建自己的日程安排，但不能替代您的活动注册。您必须注册参加KubeCon + CloudNativeCon + Open Source Summit + AI_Dev China 2024，才能参加会议。如果您尚未注册但希望加入我们，请访问活动注册页面购买注册。

请注意：本日程自动显示为香港标准时间（UTC +8）。要查看您偏好的时区的日程，请从右侧“按日期筛选”上方的下拉菜单中选择。日程可能会有变动，会议席位先到先得。

11:00 HKT

Accelerating Serverless AI Large Model Inference with Functionalized Scheduling and RDMA | 通过功能化调度和RDMA加速无服务器AI大模型推理 - Yiming Li, Tianjin University& Chenglong Wang, Jinan Inspur Data Technology Co., Ltd.

Wednesday August 21, 2024 11:00 - 11:35 HKT

Level 1 | Hung Hom Room 7

The deployment of AI large models on standard Serverless inference platforms like KServe is gaining popularity due to its ability to improve resource utilization and reduce costs. However, existing large model inference faces significant scheduling and communication bottlenecks, making it challenging to meet low-latency and high-throughput demands. The centralized control plane of Kubernetes leads to low scheduling efficiency, unable to achieve second-level response to large-scale burst requests. Additionally, the large model inference needs to transfer GB-level KV cache for each request, resulting in high communication overhead. So, we have developed a highly elastic functionalized scheduling framework to guarantee second-level scheduling for thousands of Serverless AI large model inference task instances. Additionally, we leverage RDMA technology to achieve high-speed KV cache migration, avoiding the high overhead caused by traditional network protocol stacks.

AI大模型在像KServe这样的标准无服务器推理平台上的部署越来越受欢迎，因为它能够提高资源利用率并降低成本。然而，现有的大模型推理面临着重要的调度和通信瓶颈，使得满足低延迟和高吞吐量需求变得具有挑战性。Kubernetes的集中式控制平面导致低调度效率，无法实现对大规模突发请求的秒级响应。此外，大模型推理需要为每个请求传输GB级别的KV缓存，导致高通信开销。因此，我们开发了一个高度弹性的功能化调度框架，以确保对数千个无服务器AI大模型推理任务实例进行秒级调度。此外，我们利用RDMA技术实现高速KV缓存迁移，避免传统网络协议栈引起的高开销。

Speakers

Cookie

Senior Software Engineer, Jinan Inspur Data Technology Co., Ltd.

I'm employed in Inspur. I mainly do container computing related development and are familiar with container networks, especially Calico and Cilium. I'm also a contributor to the Openyurt community and mainly participate in the development of the raven project.

Yiming Li

PhD candidate, Tianjin University

Yiming Li received the bachelor’s and master’s degrees from Tianjin University, China, in 2017 and 2019, respectively. He is currently pursuing the Ph.D. degree with the College of Intelligence and Computing, Tianjin University, China. His research interests include cloud com... Read More →

Wednesday August 21, 2024 11:00 - 11:35 HKT
Level 1 | Hung Hom Room 7

KubeCon + CloudNativeCon Sessions, AI + ML

Experience Level | 内容经验水平 中级 (Intermediate)
Language | 语言 中文 (Chinese)

11:50 HKT

AI Inference Performance Acceleration: Methods, Tools, and Deployment Workflows | AI推理性能加速：方法、工具和部署工作流程 - Yifei Zhang & 磊钱, Bytedance

Wednesday August 21, 2024 11:50 - 12:25 HKT

Level 1 | Hung Hom Room 7

As AI rapidly evolves and embraces cloud-native technologies, inference performance has become crucial for application value. GPU selection, serving framework configuration, and model/data loading significantly impact inference efficiency. We'll focus on cloud-native solutions to storage performance issues and tools for evaluating inference performance across configurations, offering optimal deployment setups integrated into cloud-native workflows. We'll discuss inference performance's impact on user experience and how optimization can reduce costs and improve efficiency. Using technologies like Fluid and model optimization, we'll share strategies to enhance inference performance. Based on performance and cost analysis of various GPUs, we'll guide AI engineers in hardware selection. Additionally, we'll introduce a performance testing tool to evaluate and recommend the best model, hardware, and acceleration scheme combinations, aligning with deployment workflows based on test results.

随着人工智能的快速发展和对云原生技术的采用，推理性能对应用价值变得至关重要。 GPU选择、服务框架配置以及模型/数据加载对推理效率有着重大影响。我们将专注于云原生解决方案，解决存储性能问题，并提供评估不同配置下推理性能的工具，为云原生工作流程提供最佳部署设置。我们将讨论推理性能对用户体验的影响，以及优化如何降低成本并提高效率。利用Fluid和模型优化等技术，我们将分享增强推理性能的策略。基于各种GPU的性能和成本分析，我们将指导人工智能工程师进行硬件选择。此外，我们将介绍一种性能测试工具，评估并推荐最佳模型、硬件和加速方案组合，根据测试结果与部署工作流程相匹配。

Speakers

Yifei Zhang

Software Engineer, Bytedance

Yifei Zhang, Software Engineer at Volcengine, focuses on technical research and product development in Kubernetes and AI, and has rich experience in public cloud, and is now fully working on VKE (Volcengine Kubernetes Engine), which is the managed Kubernetes product in Volcengine... Read More →

钱磊

Software Engineer, Bytedance

a kubernetes developer in bytedance. focus on building a stable kubernetes engine on public cloud.

Wednesday August 21, 2024 11:50 - 12:25 HKT
Level 1 | Hung Hom Room 7

KubeCon + CloudNativeCon Sessions, AI + ML

Experience Level | 内容经验水平 中级 (Intermediate)
Language | 语言 中文 (Chinese)

13:50 HKT

Boundaryless Computing: Optimizing LLM Performance, Cost, and Efficiency in Multi-Cloud Architecture | 无边界计算：在多云架构中优化LLM性能、成本和效率 - Jian Zhu, Red Hat & Kai Zhang, Alibaba Cloud Intelligence

Wednesday August 21, 2024 13:50 - 14:25 HKT

Level 1 | Hung Hom Room 7

For large language model (LLM) inference, GPU resources within a single data center or cloud region often cannot meet all user demands. Additionally, for the end-users, deploying across multiple geographic regions is necessary to provide an optimal user experience. However, managing model distribution, synchronization, and consistency across multiple regions presents new challenges. To address this, the OCM and Fluid communities have collaborated to automate the multi-region distribution of inference applications through OCM's multi-cluster application deployment capabilities, combined with Fluid's data orchestration capabilities. This automation facilitates the cross-regional distribution and pre-warming of large models, enhancing the efficiency of model deployment and upgrades.

对于大型语言模型（LLM）推理，单个数据中心或云区域内的GPU资源通常无法满足所有用户需求。此外，对于最终用户来说，跨多个地理区域部署是为了提供最佳用户体验。然而，在多个地区管理模型分发、同步和一致性会带来新的挑战。为了解决这个问题，OCM和Fluid社区合作，通过OCM的多集群应用部署能力和Fluid的数据编排能力自动化实现推理应用的多地区分发。这种自动化促进了大型模型的跨地区分发和预热，提高了模型部署和升级的效率。

Speakers

Kai Zhang

Senior Staff Engineer, Alibaba

Kai Zhang is a Senior Staff Engineer at Alibaba Cloud Intelligence, where he has been part of the team developing the Alibaba Cloud container service for Kubernetes (ACK) for over 6 years. He currently leads ACK’s Cloud native AI product and solution offerings. Before this, he spent... Read More →

Jian Zhu

Senior Software Engineer, RedHat

Zhu Jian is a senior software engineer at RedHat, core contributor to open cluster management project. Jian enjoys solving multi-cluster workload distribution problems and extending OCM with add-ons.

Wednesday August 21, 2024 13:50 - 14:25 HKT
Level 1 | Hung Hom Room 7

KubeCon + CloudNativeCon Sessions, AI + ML

Experience Level | 内容经验水平 中级 (Intermediate)
Language | 语言 中文 (Chinese)

14:40 HKT

Connecting the Dots: Towards a Unified Multi-Cluster AI/ML Experience | 连接点：走向统一的多集群AI/ML体验 - Qing Hao, RedHat & Chen Yu, Microsoft

Wednesday August 21, 2024 14:40 - 15:15 HKT

Level 1 | Hung Hom Room 7

Today cloud-native infra is vital for AI/ML, administrative complexities and the growing demand for compute resources drive devs towards multi-cluster patterns. Batch scheduling projects, like Kueue, are valuable for efficient AI/ML training in a single Kubernetes cluster. Multi-cluster management platforms like OCM and Fleet simplify cluster management and provide advanced scheduling features. We hope to bridge the best of both worlds to simplify user operations and reduce confusion between different systems. In this talk, we will showcase that with the help of Sig Multi-Cluster's newly proposed API - ClusterProfile, combined with OCM, Fleet, and Kueue, to address these challenges. We will demonstrate that MultiKueue setup can be easily automated with the help of the ClusterProfile API; with a few tweaks, users can use OCM and Fleet's advanced scheduling features through MultiKueue to smart place AI/ML jobs across the clusters to maximize resource utilization like GPU to save costs.

今天，云原生基础设施对于人工智能/机器学习、管理复杂性以及对计算资源需求不断增长至关重要，这推动开发人员转向多集群模式。像Kueue这样的批处理调度项目对于在单个Kubernetes集群中高效进行人工智能/机器学习训练非常有价值。OCM和Fleet等多集群管理平台简化了集群管理，并提供了高级调度功能。我们希望将两者的优势结合起来，简化用户操作，减少不同系统之间的混乱。在本次演讲中，我们将展示如何借助Sig Multi-Cluster最新提出的API - ClusterProfile，结合OCM、Fleet和Kueue来解决这些挑战。我们将演示如何通过ClusterProfile API轻松自动化MultiKueue设置；通过一些调整，用户可以利用OCM和Fleet的高级调度功能，通过MultiKueue智能地在集群之间放置人工智能/机器学习作业，以最大化资源利用率，如GPU，以节省成本。

Speakers

Qing Hao

Senior Software Engineer, RedHat

Qing Hao is a senior software engineer at RedHat, where she works as the maintainer of Open Cluster Management. Qing has interest in solving complex problems in the multi-clusters areas, eg, application scheduling, and management components rolling upgrade. Prior to RedHat, she worked... Read More →

Chen Yu

Senior Software Engineer, Microsoft

Chen Yu is a senior software engineer at Microsoft with a keen interest in cloud-native computing. He is currently working on Multi-Cluster Kubernetes and contributing to the Fleet project open-sourced by Azure Kubernetes Service.

Wednesday August 21, 2024 14:40 - 15:15 HKT
Level 1 | Hung Hom Room 7

KubeCon + CloudNativeCon Sessions, AI + ML

Experience Level | 内容经验水平 任意程度 (Any)
Language | 语言 中文 (Chinese)

15:35 HKT

How Fast Can Your Model Composition Run in Serverless Inference? | 您的模型组合在无服务器推理中可以运行多快？ - Fog Dong, BentoML & Wenbo Qi, Ant Group

Wednesday August 21, 2024 15:35 - 16:10 HKT

Level 1 | Hung Hom Room 7

Are you struggling with slow deployment times, high operational costs, or scalability issues when serving your ML models? Now, imagine the added complexity when typical AI apps require not just one, but an interconnected suite of models. In this session, discover how the integration of BentoML with Dragonfly effectively addresses these challenges, transforming the landscape of multi-model composition and inference within serverless Kubernetes envs. Join the co-presentation by the BentoML and Dragonfly communities to explore a compelling case study: a RAG app that combines 3 models—LLM, embedding, and OCR. Learn how our framework not only packages these diverse models efficiently but also utilizes Dragonfly's innovative P2P network for swift distribution. We'll further delve into how other open-source technologies like JuiceFS and VLLM have enabled us to achieve remarkable deployment times of just 40 seconds and establish a scalable blueprint for multi-model composition deployments.

您是否在为机器学习模型的部署时间慢、运营成本高或可扩展性问题而苦恼？现在，想象一下当典型的人工智能应用程序不仅需要一个模型，而是一个相互连接的模型套件时所增加的复杂性。在本场演讲中，了解BentoML与Dragonfly的集成如何有效解决这些挑战，改变了无服务器Kubernetes环境中多模型组合和推理的格局。加入BentoML和Dragonfly社区的联合演示，探索一个引人注目的案例研究：一个结合了LLM、嵌入和OCR三个模型的RAG应用程序。了解我们的框架不仅高效打包这些多样化的模型，还利用Dragonfly创新的P2P网络进行快速分发。我们还将深入探讨其他开源技术，如JuiceFS和VLLM，如何帮助我们实现仅需40秒的部署时间，并为多模型组合部署建立可扩展的蓝图。

Speakers

Wenbo Qi

Senior Software Engineer, Ant Group

Wenbo Qi is a software engineer at Ant Group working on Dragonfly. He is a maintainer of the Dragonfly. He hopes to do some positive contributions to open source software and believe that fear springs from ignorance.

Fog Dong

Senior Software Engineer, BentoML

Fog Dong, a Senior Engineer at BentoML, KubeVela maintainer, CNCF Ambassador, and LFAPAC Evangelist, has a rich background in cloud native. Previously instrumental in developing Alibaba's large-scale Serverless Application Engine workflows and Bytedance's cloud-native CI/CD platform... Read More →

Wednesday August 21, 2024 15:35 - 16:10 HKT
Level 1 | Hung Hom Room 7

KubeCon + CloudNativeCon Sessions, AI + ML

Experience Level | 内容经验水平 高级 (Advanced)
Language | 语言 中文 (Chinese)

16:25 HKT

Leverage Topology Modeling and Topology-Aware Scheduling to Accelerate LLM Training | 利用拓扑建模和拓扑感知调度加速LLM训练 - Yang Wang, Huawei

Wednesday August 21, 2024 16:25 - 17:00 HKT

Level 1 | Hung Hom Room 7

In the LLM training and inference era, the bottle neck has changed from computing to network. A lot of high throughput and low latency inter-connect technology are widely used, e.g. nvlink, nvswitch to build hyper computer such as nvidia super pod, google multi-slice, AWS placement group. However, Kubernetes has net yet addressed topology awareness efficiently, resulting in low performance when sub-optimal resources are provisioned. This talk will explore the inter-node communication and resources within node inter-connect. Also analyze how these two toplogical factors impacts on the runtime performance of AI workload especially for large language model training. The talk will cover: - How to model the topology on underlying resources like NUMA, Rack, Super Pod, Hyper Computer - How to make scheduler to aware of topology and make the best scheduling - How to coordinate topology-aware scheduling with DRA on node

在LLM训练和推断时代，瓶颈已经从计算转变为网络。许多高吞吐量和低延迟的互连技术被广泛使用，例如nvlink、nvswitch用于构建超级计算机，如nvidia超级Pod、谷歌多片、AWS放置组。然而，Kubernetes尚未有效地解决拓扑意识问题，导致在资源配置不佳时性能较低。本次演讲将探讨节点间通信和节点内部资源的互连。还将分析这两个拓扑因素如何影响AI工作负载的运行性能，特别是对于大型语言模型训练。演讲内容包括： - 如何对底层资源（如NUMA、机架、超级计算机）建模拓扑 - 如何使调度程序意识到拓扑并进行最佳调度 - 如何协调拓扑感知调度与节点上的DRA

Speakers

Yang Wang

Senior engineer and maintainer of Volcano, Huawei Cloud Technologies Co., LTD

Volcano maintainer and speaker at KCD and GOTC. Focus on cloud native scheduling and multi-cluster managment.

Wednesday August 21, 2024 16:25 - 17:00 HKT
Level 1 | Hung Hom Room 7

KubeCon + CloudNativeCon Sessions, AI + ML

Experience Level | 内容经验水平 中级 (Intermediate)
Language | 语言 中文 (Chinese)

17:15 HKT

Leveraging Wasm for Portable AI Inference Across GPUs, CPUs, OS & Cloud-Native Environments | 利用Wasm在GPU、CPU、操作系统和云原生环境中进行可移植的AI推理 - Miley Fu & Hung-Tung Tai, Second State

Wednesday August 21, 2024 17:15 - 17:50 HKT

Level 1 | Hung Hom Room 7

This talk will focus on the advantages of using WebAssembly (Wasm) for running AI inference tasks in a cloud-native ecosystem. We will explore how wasm empowers devs to develop on their own PC and have their AI inference uniformly performed across different hardware, including GPUs and CPUs, operating systems, edge cloud etc. We'll discuss how Wasm and Wasm runtime facilitates seamless integration into cloud-native frameworks, enhancing the deployment and scalability of AI applications. This presentation will specifically highlight how Wasm provides a flexible, efficient solution suitable for diverse cloud-native architectures, including Kubernetes, to allow developers to fully tap the potential of LLMs, especially open source LLMs. The session offers insights into maximizing the potential of AI applications by leveraging the cross-platform capabilities of Wasm, ensuring consistency, low cost, and efficiency in AI inference across different computing environments.

本次演讲将重点介绍在云原生生态中运行AI推理任务时使用WebAssembly（Wasm）的优势。我们将探讨如何使用Wasm使开发者能够在自己的个人电脑上开发，并在不同硬件（包括GPU和CPU）、操作系统、边缘云等上统一执行他们的AI推理。我们将讨论Wasm和Wasm运行时如何实现无缝集成到云原生框架中，增强AI应用程序的部署和可扩展性。本次演示将重点展示Wasm如何提供灵活、高效的解决方案，适用于各种云原生架构，包括Kubernetes，以帮助开发者充分发挥大语言模型的潜力，特别是开源大语言模型。将深入探讨通过利用Wasm的跨平台能力来最大限度地发挥AI应用的潜力，确保在不同计算环境中实现AI推理的一致性、低成本和高效性。

Speakers

Hung-Ying Tai

Software Engineer, Second State

Hung-Ying is a maintainer of the WasmEdge project and a pioneer in compiler optimization and virtual machine design. He is a prolific open-source contributor, participating in many open-source projects, including go-ethereum, solidity, SOLL, crun, and WasmEdge.

Miley Fu

CNCF Ambassador, Founding member at WasmEdge, Second State Inc

Miley is a Developer Advocate with a passion for empowering devs to build and contribute to open source. With over 5 years of experience working on WasmEdge runtime in CNCF sandbox as the founding member, she talked at KubeCon, KCD Shenzhen, CloudDay Italy, DevRelCon, Open Source... Read More →

Wednesday August 21, 2024 17:15 - 17:50 HKT
Level 1 | Hung Hom Room 7

KubeCon + CloudNativeCon Sessions, AI + ML

Experience Level | 内容经验水平 任意程度 (Any)
Language | 语言 英语 (English)

11:00 HKT

The Journey of Next-Gen FinTech IDP at China Merchants Bank | 中国招商银行下一代金融科技IDP之旅 - Jiahang Xu, China Merchants Bank

Thursday August 22, 2024 11:00 - 11:35 HKT

Level 1 | Hung Hom Room 7

Explore China Merchants Bank's (CMB), one of China's largest retail banks, transformative journey through cloud migration, cloud-native transformation, and platform engineering over the past three years. Despite challenges such as increased complexity in cloud technology and management, and potential risks to developer productivity and continuous assurance of financial services, CMB successfully leveraged KubeVela, OpenFeature, Envoy, Clilum, and OpenTelemetry to build the Next-Gen FinTech IDP. This led to the management of 70% of applications within a year and improved developer experience, covering thousands of R&D engineers. We'll discuss the strategic thinking, 'Golden Path' implementation, struggles, trade-offs, and key success metrics with platform engineering maturity model. This session provides a blueprint and reference architecture for financial organizations undergoing similar transformations.

在KubeCon的会议描述中，探索中国招商银行（CMB）作为中国最大的零售银行之一，在过去三年中通过云迁移、云原生转型和平台工程的变革之旅。尽管面临诸如云技术和管理复杂性增加、开发人员生产力和金融服务持续保障的潜在风险等挑战，CMB成功利用KubeVela、OpenFeature、Envoy、Clilum和OpenTelemetry构建了下一代金融科技IDP。这导致了一年内管理了70%的应用程序，并改善了开发人员体验，涵盖了数千名研发工程师。我们将讨论战略思维、“黄金路径”实施、挣扎、权衡和关键成功指标，以及平台工程成熟度模型。本场演讲提供了金融机构进行类似转型的蓝图和参考架构。

Speakers

Jiahang Xu

System Architect, China Merchants Bank

Jiahang Xu is a System Architect at China Merchants Bank. He has over 14 years of unique cross-domain experience working in telecom, automotive, financial industry, startup as a co-founder, and KubeVela maintainer. He's mainly focused on cloud-native application technology and platform... Read More →

Thursday August 22, 2024 11:00 - 11:35 HKT
Level 1 | Hung Hom Room 7

KubeCon + CloudNativeCon Sessions, Platform Engineering

Experience Level | 内容经验水平 高级 (Advanced)
Language | 语言 英语 (English)

11:50 HKT

Unlocking Scalability and Simplifying Multi-Cloud Management with Karmada and PipeCD | 使用Karmada和PipeCD解锁可扩展性并简化多云管理 - Khanh Tran, CyberAgent, Inc. & Hongcai Ren, Huawei

Thursday August 22, 2024 11:50 - 12:25 HKT

Level 1 | Hung Hom Room 7

In the new AI coming age, it has become inevitable for any organizations to embrace the multi-cloud approach. Managing applications across multiple clouds can present various challenges, including resilience, performance, security, cost, and deployment management. How well did you prepare yourself and your services for that new coming age? This presentation will introduce Karmada and PipeCD, two powerful tools designed to support organizations in effectively addressing these challenges and achieving seamless multi-cloud management. Karmada is a multi-cloud container orchestration, while PipeCD is a multi-cloud continuous delivery solution. Both tools are built based on extensive experience in managing applications at scale across multiple clouds. We will delve into the key features and benefits of Karmada and PipeCD, and how they can simplify multi-cloud management. Together, we can unlock the true potential of multi-cloud systems and empower organizations to thrive in the era of AI.

在新的人工智能时代，任何组织都不可避免地需要采用多云方法。在多个云上管理应用程序可能会带来各种挑战，包括弹性、性能、安全性、成本和部署管理。您为新时代做好了多少准备？本次演讲将介绍Karmada和PipeCD，这两款强大的工具旨在支持组织有效应对这些挑战，实现无缝的多云管理。Karmada是一个多云容器编排工具，而PipeCD是一个多云持续交付解决方案。这两款工具都是基于在多个云上管理应用程序的丰富经验构建的。我们将深入探讨Karmada和PipeCD的关键特性和优势，以及它们如何简化多云管理。让我们一起释放多云系统的真正潜力，赋予组织在人工智能时代蓬勃发展的力量。

Speakers

Hongcai Ren

Senior Software Engineer, Huawei

Hongcai Ren(@RainbowMango) is the CNCF Ambassador, who has been working on Kubernetes and other CNCF projects since 2019, and is the maintainer of the Kubernetes and Karmada projects.

Khanh Tran

Software Engineer, CyberAgent, Inc.

Khanh is a maintainer of the PipeCD project. He is currently employed at CyberAgent Inc, and responsible for the CI/CD system across the organization. As a member of the developer productivity team, his primary focus is on automation and anything that enhances the development process... Read More →

Thursday August 22, 2024 11:50 - 12:25 HKT
Level 1 | Hung Hom Room 7

KubeCon + CloudNativeCon Sessions, Platform Engineering

Experience Level | 内容经验水平 中级 (Intermediate)
Language | 语言 英语 (English)

13:50 HKT

Testing and Release Patterns for Crossplane | 跨平面的测试和发布模式 - Yury Tsarev & Steven Borrelli, Upbound

Thursday August 22, 2024 13:50 - 14:25 HKT

Level 1 | Hung Hom Room 7

Crossplane has become the foundation of many Internal Developer Platforms (IDPs). A requirement for any IDP in production is the ability to make changes and upgrades to the platform with confidence. This talk will cover testing and release patterns based on our experience building production-ready environments across a range of Crossplane users. We’ll cover the lifecycle of a Crossplane Composition upgrade, from local commit to pull request to target customer environment, end-to-end testing tools, handling API changes, and how to control updates to customer environments. For quite a while, testing Crossplane Compositions meant relying exclusively on costly end-to-end layers. In this talk, we're unveiling new unit testing capabilities that allow you to evaluate and test your Composition code in complete isolation.

Crossplane已成为许多内部开发者平台（IDPs）的基础。在生产中，任何IDP的要求都是能够有信心地对平台进行更改和升级。本次演讲将涵盖基于我们在跨多个Crossplane用户构建生产就绪环境的经验，讨论测试和发布模式。我们将介绍Crossplane Composition升级的生命周期，从本地提交到拉取请求再到目标客户环境，端到端测试工具，处理API更改以及如何控制对客户环境的更新。相当长一段时间以来，测试Crossplane Compositions意味着完全依赖昂贵的端到端层。在本次演讲中，我们将揭示新的单元测试功能，使您能够在完全隔离的环境中评估和测试您的Composition代码。

Speakers

Steven Borrelli

Principal Solutions Architect, Upbound

Steven is a Principal Solutions Architect for Upbound, where he helps customers adopt Crossplane.

Yury Tsarev

Principal Solutions Architect, Upbound

Yury is an experienced software engineer who strongly focuses on open-source, software quality and distributed systems. As the creator of k8gb (https://www.k8gb.io) and active contributor to the Crossplane ecosystem, he frequently speaks at conferences covering topics such as Control... Read More →

Thursday August 22, 2024 13:50 - 14:25 HKT
Level 1 | Hung Hom Room 7

KubeCon + CloudNativeCon Sessions, Platform Engineering

Experience Level | 内容经验水平 中级 (Intermediate)
Language | 语言 英语 (English)

14:40 HKT

NanoVisor: Revolutionizing FaaS Cold Start Performance with Secure, Lightweight Container Runtime | NanoVisor：通过安全、轻量级容器运行时改变FaaS冷启动性能 - Tianyu Zhou, Ant Group

Thursday August 22, 2024 14:40 - 15:15 HKT

Level 1 | Hung Hom Room 7

Function as a Service(FaaS) is booming, but cold start time, the time it takes to create a new container for a function, remains a significant bottleneck. This not only impacts user experience with noticeable delays, but also incurs unnecessary costs due to wasted resources. NanoVisor, a groundbreaking container runtime built on gVisor, tackles the challenge of slow cold start time in FaaS. It achieves this by a series of optimizations specifically designed for FaaS: lightweight containerd interaction for faster setup, read-only filesystem for enhanced efficiency, and a sandbox fork mechanism that replaces the heavy container creation for significant performance gains. These empower NanoVisor to create secure, sandboxed containers ready for function execution within an astonishing 5ms,

Function as a Service（FaaS）正在蓬勃发展，但冷启动时间，即为函数创建新容器所需的时间，仍然是一个重要的瓶颈。这不仅影响用户体验，导致明显的延迟，还因浪费资源而产生不必要的成本。NanoVisor是一种基于gVisor构建的开创性容器运行时，解决了FaaS中慢冷启动时间的挑战。它通过一系列专为FaaS设计的优化来实现：轻量级的containerd交互以加快设置速度，只读文件系统以提高效率，以及一个替代繁重容器创建的沙箱分叉机制，以获得显著的性能提升。这些优化使NanoVisor能够在惊人的5毫秒内创建安全的、沙箱化的容器，每个实例的内存开销不到1MB，每个节点的QPS为1.5K。它已成功应用于蚂蚁集团的生态系统，包括支付宝云基地和SOFA Function，以及CI/CD加速。

Speakers

Tianyu Zhou

System Engineer, Ant Group

Tianyu Zhou, a system engineer at Ant Group. I graduated from Zhejiang University with a master's degree in cyberspace security. My research interests include kernel, system security and container security.

Thursday August 22, 2024 14:40 - 15:15 HKT
Level 1 | Hung Hom Room 7

KubeCon + CloudNativeCon Sessions, Emerging + Advanced

Experience Level | 内容经验水平 中级 (Intermediate)
Language | 语言 英语 (English)

15:35 HKT

Optimize and Accelerate Cloud AI Infrastructure with Autoscaling | 通过自动缩放优化和加速云AI基础设施 - Yuan Mo, Alibaba Cloud

Thursday August 22, 2024 15:35 - 16:10 HKT

Level 1 | Hung Hom Room 7

With the rise of generative AI technology, more and more applications are starting to integrate with the capabilities of generative AI. However, the high costs of training and inference can be daunting for developers. In this talk, we will discuss the issues and solutions that need additional consideration when using elastic scaling in generative AI scenarios, including: ● How to enhance the elastic startup efficiency of generative AI ● How to address the efficiency of inference when separating compute and storage in generative AI ● How to reduce the costs of training and inference ● How to solve the interruption problem in AI training scenarios using Spot instances ● How to address the issue of capacity elasticity in LLM scenarios Finally, we will introduce the practical experience of the world's leading generative AI service provider: HaiYi (seaart.ai), allowing more developers to understand the architectural methods of elastic cloud AI infrastructure.

随着生成式人工智能技术的兴起，越来越多的应用程序开始与生成式人工智能的能力集成。然而，训练和推理的高成本可能会让开发人员望而却步。在这次演讲中，我们将讨论在生成式人工智能场景中使用弹性扩展时需要额外考虑的问题和解决方案，包括： ● 如何提高生成式人工智能的弹性启动效率 ● 如何在生成式人工智能中分离计算和存储时解决推理效率的问题 ● 如何降低训练和推理的成本 ● 如何使用Spot实例解决AI训练场景中的中断问题 ● 如何解决LLM场景中的容量弹性问题最后，我们将介绍世界领先的生成式人工智能服务提供商海艺（seaart.ai）的实际经验，让更多开发人员了解弹性云AI基础设施的架构方法。

Speakers

Yuan Mo

Staff Engineer, Alibaba Cloud

Senior technical expert at Alibaba Cloud, the maintainer of the Kubernetes elastic component autoscaler, the founder of the cloud-native gaming community and OpenKruiseGame, and has given several talks at kubecon before. Focus on the cloud-native transformation of the gaming industry... Read More →

Thursday August 22, 2024 15:35 - 16:10 HKT
Level 1 | Hung Hom Room 7

KubeCon + CloudNativeCon Sessions, Platform Engineering

Experience Level | 内容经验水平 高级 (Advanced)
Language | 语言 英语 (English)

16:25 HKT

The Two Sides of the Kubernetes Enhancement Proposals (KEPs) | Kubernetes Enhancement Proposals（KEPs）的两面性 - Rayan Das, OneTrust LLC & Sreeram Venkitesh, BigBinary

Thursday August 22, 2024 16:25 - 17:00 HKT

Level 1 | Hung Hom Room 7

Kubernetes Enhancement Proposals (KEPs) are pivotal in proposing, communicating, and coordinating new efforts within the Kubernetes project. As members of the Release Team (the team responsible for releasing the next version of Kubernetes) especially Enhancements Team under SIG-Release, we play a vital role in maintaining the active status of enhancements and facilitating communication between stakeholders, be it a deprecation or a feature update. In this talk, we look at the KEP lifecycle from the perspective of the release team, exploring the process (enhancements freeze, code freeze, and the exception process), major themes, and more. Additionally, we will discuss the developer's viewpoint on KEPs, highlighting the process, deadlines, and best practices for proposing, reviewing, and implementing KEPs effectively. Join us to know how KEPs drive innovation and collaboration within the Kubernetes community, empowering contributors to shape the future of Kubernetes development.

Kubernetes Enhancement Proposals（KEPs）在Kubernetes项目中提出、沟通和协调新工作方面起着关键作用。作为发布团队的成员（负责发布下一个版本的Kubernetes的团队），特别是在SIG-Release下的Enhancements团队，我们在维护增强功能的活跃状态和促进利益相关者之间的沟通方面发挥着重要作用，无论是废弃还是功能更新。在这次演讲中，我们将从发布团队的角度看待KEP的生命周期，探讨过程（增强功能冻结、代码冻结和异常处理过程）、主要主题等。此外，我们还将讨论开发人员对KEP的观点，重点介绍提出、审查和有效实施KEP的过程、截止日期和最佳实践。加入我们，了解KEP如何推动Kubernetes社区内的创新和协作，赋予贡献者塑造Kubernetes开发未来的能力。

Speakers

Rayan Das

Senior Site Reliability Engineer, OneTrust LLC

As a Senior Site Reliability Engineer, I devote my expertise to work on the infrastructure of OneTrust Privacy Software. Within the Kubernetes community, I've served as the SIG-Release Enhancement Shadow for Kubernetes v1.29, I applied for release shadow for v1.31 as well. Beyond... Read More →

Sreeram Venkitesh

Software Engineer, BigBinary

Sreeram Venkitesh is a Software Engineer at BigBinary and is an active contributor to Kubernetes. He is active in the Kubernetes release team, where he served as a shadow in the enhancements team from v1.29-v1.30 and is the enhancements sub-team lead for v1.31. He also helps write... Read More →

Thursday August 22, 2024 16:25 - 17:00 HKT
Level 1 | Hung Hom Room 7

KubeCon + CloudNativeCon Sessions, Cloud Native Experience

Experience Level | 内容经验水平 任意程度 (Any)
Language | 语言 英语 (English)

17:15 HKT

Addressing the #1 Threat to the Web: Authorization | 应对网络的头号威胁：授权 - Jimmy Zelinskie, authzed

Thursday August 22, 2024 17:15 - 17:50 HKT

Level 1 | Hung Hom Room 7

As more folks deploy cloud-native architectures and technologies, store ever larger amounts of data, and build ever more complex software suites, the complexity required to correctly and securely authorize requests only becomes exponentially more difficult. Broken authorization now tops OWASP's Top 10 Security Risks for Web Apps. Their recommendation? Adopt an ABAC or ReBAC authorization model. This talk establishes the problems with the status quo, explains the core concepts behind ReBAC, and introduces SpiceDB, a widely adopted open source system inspired by the system internally powering Google: Zanzibar.

随着越来越多的人部署云原生架构和技术，存储越来越多的数据，并构建越来越复杂的软件套件，正确和安全地授权请求所需的复杂性变得指数级增加。破解授权现在已经成为OWASP Web应用程序安全风险前十名之首。他们的建议是采用ABAC或ReBAC授权模型。本次演讲将阐明现状存在的问题，解释ReBAC背后的核心概念，并介绍SpiceDB，这是一个广泛采用的开源系统，受到Google内部系统Zanzibar的启发。

Speakers

Jimmy Zelinskie

cofounder, authzed

Jimmy Zelinskie is a software engineer and product leader with a goal of democratizing software via open source development. He's currently CPO of authzed where he's focused on bringing hyperscaler best-practices in authorization to the industry at large. At CoreOS, he helped pioneer... Read More →

Thursday August 22, 2024 17:15 - 17:50 HKT
Level 1 | Hung Hom Room 7

KubeCon + CloudNativeCon Sessions, Security

Experience Level | 内容经验水平 中级 (Intermediate)
Language | 语言 英语 (English)

10:35 HKT

Leveraging Multi-Cluster Architecture for Resilient and Elastic Hybrid Cloud at Xiaohongshu - Feng Xiong, 小红书 & Hongcai Ren, Huawei

Friday August 23, 2024 10:35 - 11:10 HKT

Level 1 | Hung Hom Room 7

At Xiaohongshu, the scale and number of K8S clusters have grown significantly, leading to increased complexity in cluster and resource management.

To address challenges, such as slow resource turnover, limited automation, and inefficiency, Xiaohongshu has adopted Karmada for the unified platform. This approach enhances cross-cluster application distribution, elasticity, and efficient cross-cluster scheduling while effectively managing multi-cloud infrastructure.

This session focuses on the following key points:

Key challenges of hyperscale infrastructure
Evaluation of K8s-based multi-cluster solutions and considerations
Practice of efficiently distributing applications and securely migrating existing applications
Practice of enhance elasticity in a multi-cluster environment
Achievements, problems met and resolved

Speakers

Hongcai Ren

Senior Software Engineer, Huawei

Hongcai Ren(@RainbowMango) is the CNCF Ambassador, who has been working on Kubernetes and other CNCF projects since 2019, and is the maintainer of the Kubernetes and Karmada projects.

Feng Xiong

小红书

Senior Technical Expert at Xiaohongshu, and Head of Cloud Native Serverless Infrastructure. He began engaging with Kubernetes in 2017 and possesses extensive experience in product development and industry implementation related to cloud computing, containers, Serverless, and edge... Read More →

Friday August 23, 2024 10:35 - 11:10 HKT
Level 1 | Hung Hom Room 7

CNCF Maintainer Track Sessions, Karmada

11:25 HKT

Rollout Patterns: Smoothly Migrating and Rolling Out Your Microservices | 部署模式：平稳迁移和部署您的微服务 - Tim Xiao, DaoCloud & Wu Chenhui, AS.Watson TechLab

Friday August 23, 2024 11:25 - 12:00 HKT

Level 1 | Hung Hom Room 7

At Watsons, most of their services are built on Dubbo. Now, they aim to utilize delivery tools like Argo CD and Argo Rollouts to automatically and securely deliver their services. However, they have encountered complexities beyond what Argo Rollouts assumes. We will summarize these patterns and demonstrate how to handle them, including: - Pattern 1: One service at a time. - Pattern 2: Multiple services, each forward-compatible. - Pattern 3: Multiple services with version dependency.

在Watsons，他们的大多数服务都是基于Dubbo构建的。现在，他们希望利用Argo CD和Argo Rollouts等交付工具来自动和安全地交付他们的服务。然而，他们遇到了超出Argo Rollouts假设的复杂性。我们将总结这些模式，并演示如何处理它们，包括： - 模式1：一次一个服务。 - 模式2：多个服务，每个都是向前兼容的。 - 模式3：具有版本依赖性的多个服务。

Speakers

旸肖

Developer, DaoCloud

Served as DevOps platform Principle Engineer in DaoCloud, participated in community projects including argo-cd, argo-rollouts, kubevela and other community projects, and has more than 5 years of kubernetes platform development experience.

Wu Chenhui

architecture, AS.Watson TechLab

I have nearly 30 years of experience in software development and architecture design, and 5 years of experience in k8s, responsible for k8s related architecture design of Watsons Group

Friday August 23, 2024 11:25 - 12:00 HKT
Level 1 | Hung Hom Room 7

KubeCon + CloudNativeCon Sessions, SDLC (Software Development Lifecycle)

Experience Level | 内容经验水平 初级 (Beginner)
Language | 语言 中文 (Chinese)

13:20 HKT

What if Your System Experiences an Outage? Let's Build a Resilient Systems with Chaos Engineering | 如果您的系统遇到故障怎么办？让我们通过混沌工程构建弹性系统 - NamKyu Park, LitmusChaos

Friday August 23, 2024 13:20 - 13:55 HKT

Level 1 | Hung Hom Room 7

This session explores how LitmusChaos improves the resilience of cloud-native applications by injecting chaos. It also showcases the streamlined management of chaos engineering software through Backstage. Cloud-native applications can be complex to navigate and secure. Our session will present strategies to identify vulnerabilities using GitOps and monitoring, integrated seamlessly into your system. Learn how Backstage and LitmusChaos can enhance your application's resilience with ease! The session starts with chaos orchestration and analysis using LitmusChaos, followed by a live demo highlighting the utilization of LitmusChaos' Backstage plugin and others like Prometheus and ArgoCD. Learn how these plugins, when integrated with Backstage, effectively manage all components necessary for executing chaos engineering.

本场演讲探讨了LitmusChaos如何通过注入混沌来提高云原生应用程序的弹性。它还展示了通过Backstage简化混沌工程软件的管理。云原生应用程序可能很复杂，难以导航和保护。我们的会议将介绍使用GitOps和监控来识别漏洞的策略，无缝集成到您的系统中。了解如何使用Backstage和LitmusChaos轻松增强您的应用程序的弹性！本场演讲从使用LitmusChaos进行混沌编排和分析开始，然后展示了使用LitmusChaos的Backstage插件以及其他插件如Prometheus和ArgoCD的实时演示。了解这些插件与Backstage集成后，如何有效管理执行混沌工程所需的所有组件。

Speakers

Namkyu Park

Maintainer, LitmusChaos

Namkyu Park is a CNCF Ambassador and a Software Developer. He worked at several startups in South Korea. He has completed Linux Foundation Mentorship Programme(LitmusChaos) as a mentee and is currently a mentor and maintainer of LitmusChaos. He has previously spoken at GopherCon Korea... Read More →

Friday August 23, 2024 13:20 - 13:55 HKT
Level 1 | Hung Hom Room 7

KubeCon + CloudNativeCon Sessions, SDLC (Software Development Lifecycle)

Experience Level | 内容经验水平 中级 (Intermediate)
Language | 语言 英语 (English)

14:10 HKT

Opportunities and Challenges of Cloud Native Technology in US Healthtech | 美国健康科技中云原生技术的机遇与挑战 - Katerina Arzhayev, SUSE

Friday August 23, 2024 14:10 - 14:45 HKT

Level 1 | Hung Hom Room 7

In this session I will share the strategic roadmap for Cloud Native Technology companies eyeing expansion into the intricate US healthcare market. Delving into the multifaceted landscape of American healthcare, the session navigates through its complexities, from the dichotomy of public and private sectors to the nuanced regulatory framework dominated by HIPAA and FDA regulations. By illuminating Cloud Native Technology's transformative potential, particularly in fostering interoperability, enhancing telehealth capabilities, and empowering data analytics, the session showcases how innovation can meet the industry's pressing needs. Moreover, it sheds light on the indispensable considerations for market entry, emphasizing regulatory compliance, trust-building with healthcare stakeholders, and the imperative of market localization. Attendees will be equipped with a strategic playbook to navigate the intricate terrain of US healthtech.

在这场演讲上，我将分享云原生技术公司进军美国复杂医疗市场的战略路线。深入探讨美国医疗保健的多层面景观，本场演讲将引导参与者了解其复杂性，从公共和私营部门的对立到以HIPAA和FDA法规为主导的细致监管框架。通过阐明云原生技术的变革潜力，特别是在促进互操作性、增强远程医疗能力和赋能数据分析方面，本场演讲展示了创新如何满足行业迫切需求。此外，它还揭示了进入市场的不可或缺的考虑因素，强调了监管合规性、与医疗保健利益相关者建立信任以及市场本地化的必要性。与会者将获得一份战略指南，帮助他们在美国医疗科技领域的复杂地形中航行。

Speakers

Katerina Arzhayev

Director of Product Management, Healthcare Edge, SUSE

Katerina Arzhayev is experienced in cross-cultural collaboration and technology strategy. She has a proven track record of driving business results through effective communication and strategic planning. Katerina's expertise lies in making highly complicated topics accessible to non-technical... Read More →

Friday August 23, 2024 14:10 - 14:45 HKT
Level 1 | Hung Hom Room 7

KubeCon + CloudNativeCon Sessions, Cloud Native Experience

Experience Level | 内容经验水平 任意程度 (Any)
Language | 语言 英语 (English)

15:15 HKT

The Experience of ChillyRoom Developing & Managing Session-Based Game on K8s with OpenKruiseGame | 在K8s上使用OpenKruiseGame开发和管理基于会话的游戏的ChillyRoom经验 - Qiuyang Liu, Alibaba Cloud & Xinhao Liu, ChillyRoom

Friday August 23, 2024 15:15 - 15:50 HKT

Level 1 | Hung Hom Room 7

In the era of traditional game operation and maintenance, session-based games face huge challenges in terms of delivery efficiency and resource costs. Cloud native technology brings exactly the flexibility and highly automated capabilities that session-based games need. However, due to the game servers' strong stateful characteristics, there are also various difficulties in the process of implementing games on Kubernetes. This talk will focus on the characteristics of session-based games and describe how ChillyRoom uses OpenKruiseGame, which is the subproject of CNCF incubating project OpenKruise, to develop and manage session-based games on Kubernetes, providing developers in the game industry with cloud native implementation experience in automatic network access, elastic scaling of game servers, matching logic development, and room status management, etc.

在传统游戏运维时代，基于会话的游戏在交付效率和资源成本方面面临巨大挑战。云原生技术正好为会话型游戏带来了灵活性和高度自动化能力。然而，由于游戏服务器具有强烈的有状态特性，在实现游戏在 Kubernetes 上的过程中也存在各种困难。本次演讲将重点关注会话型游戏的特点，并描述 ChillyRoom 如何使用 OpenKruise 的子项目 OpenKruiseGame 来开发和管理基于会话的游戏在 Kubernetes 上，为游戏行业的开发人员提供云原生实现经验，包括自动网络访问、游戏服务器的弹性扩展、匹配逻辑开发和房间状态管理等。

Speakers

Qiuyang Liu

Senior R&D Engineer, Alibaba Cloud

Qiuyang Liu, head of cloud native game at Alibaba Cloud Container Service and maintainer of the kruise-game project. He has long been engaged in the research and development of cloud native in the gaming field and is committed to promoting the implementation of cloud native in the... Read More →

Xinhao Liu

Engineer, ChillyRoom

Xinhao Liu, an engineer with one year experience in game server development at ChillyRoom and three years experience in Linux OS and cloud core network software development in industry. He has a passion for creating flexible, high-performance, high-available and easy-to-maintain game... Read More →

Friday August 23, 2024 15:15 - 15:50 HKT
Level 1 | Hung Hom Room 7

KubeCon + CloudNativeCon Sessions, Cloud Native Experience

Experience Level | 内容经验水平 中级 (Intermediate)
Language | 语言 中文 (Chinese)

16:05 HKT

dora-rs: Dataflow Oriented Robotic AI framework | dora-rs：面向数据流的机器人AI框架 - Philipp Oppermann, Freelancer & Xavier Tao, 1ms.ai

Friday August 23, 2024 16:05 - 16:40 HKT

Level 1 | Hung Hom Room 7

dora-rs project (https://github.com/dora-rs) is a Dataflow Oriented Robotic AI framework that putting ML powered robots at the reach of anyone. It also leverages the existing legacy robotic ecosystem thru its ROS bridge extension.

dora-rs is developed in Rust, surpassing C/C++ in terms of development speed, quality, memory safety and security. dora-rs offers multiple programming language APIs, especially Python which is treated as a first-class citizen. Developers can fully utilize the latest ML models from open-source communities to quickly build robots prototypes for research, education, and rapid prototyping. C/C++/Rust API is also available for high-performance, low-latency production environments.

We will showcase dora-rs robotic framework for the open-source AI revolution and discuss design decisions behind it. We will bring physical robots to do demoes:

· robot understanding its surroundings powered by (VLM).
· robot programming itself in real time powered by (LLM).

Speakers

Philipp Oppermann

Freelancer, Open-source software engineer

I'm a freelance software engineer from Germany. I focus on Rust, open-source software, operating systems, and other system-level software. I currently work on the dora-rs project, a modern robotic framework. I'm also part of the `rust-osdev` organization on GitHub, which maintains... Read More →

Xavier Tao

Founder, 1ms.ai

Xavier Tao is a French software engineer developing practical solutions for ML/AI users and engineers through open-source projects. One such project is Dora-rs that aims to make building AI applications fast and easy.

Friday August 23, 2024 16:05 - 16:40 HKT
Level 1 | Hung Hom Room 7

AI_dev: Open Source GenAI & ML Summit Sessions