Loading…
Attending this event?
In-person
21-23 August, 2024
Learn More and Register to Attend

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon + Open Source Summit + AI_Dev China 2024 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in Hong Kong Standard Time (UTC +8). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change and session seating is available on a first-come, first-served basis. 

亲临现场
2024年8月21-23日
了解更多并注册参加

Sched应用程序允许您创建自己的日程安排,但不能替代您的活动注册。您必须注册参加KubeCon + CloudNativeCon + Open Source Summit + AI_Dev China 2024,才能参加会议。如果您尚未注册但希望加入我们,请访问活动注册页面购买注册。

请注意:本日程自动显示为香港标准时间(UTC +8)。要查看您偏好的时区的日程,请从右侧“按日期筛选”上方的下拉菜单中选择。日程可能会有变动,会议席位先到先得。
Wednesday, August 21
 

7:30am HKT

9:00am HKT

9:20am HKT

Keynote: Accelerating Electric Vehicle Innovation with Cloud Native Technologies | 主论坛演讲: 使用云原生技术加速电动汽车创新 - Kevin Wang, Huawei & Saint Jiang, NIO
Wednesday August 21, 2024 9:20am - 9:35am HKT
The electric vehicle (EV) industry is rapidly advancing towards a future where intelligence and connectivity are paramount. As we embrace this new era, the challenges in automotive software development escalate, such as software consistency, testing efficiency, data utilization etc., between simulated environments and real-world vehicle runtime environments. In this session, discover how NIO, an innovator in the global EV sphere, harnesses the power of cloud native technologies such as Containerd, Kubernetes, KubeEdge, and AI cloud-edge collaboration. Learn about NIO's journey to augment the development efficiency and quality of EV software, propelling us towards the zenith of vehicular intelligence. Delve into the transformative impact and future prospects of cloud native solutions in revolutionizing the EV landscape.

电动汽车(EV)行业正迅速向着智能和连接至关重要的未来发展。随着我们迎接这个新时代,汽车软件开发中的挑战不断升级,例如在模拟环境和真实车辆运行环境之间的软件一致性、测试效率、数据利用等等。 在这场演讲中,探索全球EV领域的创新者NIO如何利用云原生技术,如Containerd、Kubernetes、KubeEdge和AI云边协作。了解NIO如何提高EV软件开发效率和质量,推动我们走向车辆智能的巅峰。深入探讨云原生解决方案在革新EV领域中的转变影响和未来前景。
Speakers
avatar for Kevin Wang

Kevin Wang

Lead of Cloud Native Open Source Team, Huawei
Kevin Wang has been an outstanding contributor in the CNCF community since its beginning and is the leader of the cloud native open source team at Huawei. Kevin has contributed critical enhancements to Kubernetes, led the incubation of the KubeEdge, Volcano, Karmada projects in CNCF... Read More →
avatar for Saint Jiang

Saint Jiang

NIO
Saint Jiang has over 10 years of experience in automotive software development. He is currently responsible for the software platform development in the intelligent cockpit domain at NIO, a global leader in electric vehicles. Prior to that, he was the system manager of the software... Read More →
Wednesday August 21, 2024 9:20am - 9:35am HKT
Level 2 | Grand Ballroom 1-2

9:35am HKT

9:40am HKT

10:00am HKT

Keynote: China & Hong Kong's Leading Role in Open Source and AI | 主论坛演讲:中国和香港在开源和人工智能中的领先角色 - Stormy Peters, VP of Communities, GitHub
Wednesday August 21, 2024 10:00am - 10:15am HKT
Hong Kong and China have an active open source software community, home to 11 million software developers who are at the forefront of AI innovation. Hong Kong and China are ranked in the top 10 largest communities globally for generative AI projects on GitHub, their developers are making a significant impact on the world of open source software. Join us to explore and celebrate China and Hong Kong's contributions to the open source ecosystem and discover how this community is shaping the future of AI and technology.

香港和中国拥有活跃的开源软件社区,拥有1100万软件开发人员,处于人工智能创新的前沿。香港和中国在GitHub上生成式AI项目的社区规模位列全球前十名,他们的开发人员对开源软件领域产生了重大影响。加入我们,一起探索和庆祝中国和香港对开源生态系统的贡献,并了解这个社区如何塑造AI和技术的未来。
Speakers
avatar for Stormy Peters

Stormy Peters

VP, Communities, GitHub
Stormy Peters is VP of Communities at GitHub. She leads the teams responsible for enabling the online creators and open source communities on GitHub, including GitHub’s community product efforts, developer relations, education, and other strategic programs. Throughout her career... Read More →
Wednesday August 21, 2024 10:00am - 10:15am HKT
Level 2 | Grand Ballroom 1-2

10:15am HKT

Keynote: Closing Remarks | 主论坛演讲: 闭幕词
Wednesday August 21, 2024 10:15am - 10:30am HKT
Wednesday August 21, 2024 10:15am - 10:30am HKT
Level 2 | Grand Ballroom 1-2

10:30am HKT

Coffee Break ☕ | 茶歇
Wednesday August 21, 2024 10:30am - 11:00am HKT
Wednesday August 21, 2024 10:30am - 11:00am HKT
Level 2 | Grand Ballroom 3-4

10:30am HKT

10:30am HKT

Solutions Showcase | 解决方案展示
Wednesday August 21, 2024 10:30am - 8:00pm HKT
Visit our sponsors in the Solutions Showcase to try the latest demos, watch live presentations, talk to experts, check out job opportunities, and score some swag.

请访问我们在解决方案展示区的赞助商,尝试最新的演示,观看现场演示,与专家交谈,了解工作机会,并获得一些赠品。

In order to facilitate networking and business relationships at the event, you may choose to visit a third party’s booth or to access sponsored content. You are never required to visit third-party booths or to access sponsored content. When visiting a booth or participating in sponsored activities, the third party will receive some of your registration data. This data includes your first name, last name, title, company, address, email, standard demographics questions (i.e. job function, industry), and details about the sponsored content or resources you interacted with. If you choose to interact with a booth or access sponsored content, you are explicitly consenting to receipt and use of such data by the third-party recipients, which will be subject to their own privacy policies.

为了促进活动中的网络和业务关系,您可以选择访问第三方的展位或者获取赞助内容。我们不会强制要求您参观第三方展位或获取赞助内容。当您访问展位或参与赞助活动时,第三方将收到您的一些注册数据。这些数据包括您的名字、姓氏、职位、公司、地址、电子邮件、标准人口统计问题(例如工作职能、行业)以及您与赞助内容或资源互动的详细信息。如果您选择与展位互动或获取赞助内容,您明确同意第三方接收和使用此类数据,这将受到他们自己的隐私政策的约束
Wednesday August 21, 2024 10:30am - 8:00pm HKT
Level 2 | Grand Ballroom 3-4

10:40am HKT

Project Pavilion Tour with Jorge Castro, CNCF | 与 Jorge Castro 进行的 CNCF 项目展厅之旅
Wednesday August 21, 2024 10:40am - 11:00am HKT
Explore the Project Pavilion, a hub of innovation and discovery! Take part in daily tours, interact with project maintainers at their kiosks, gain insights on community engagement and KCD event organization, and learn more about certification opportunities to showcase your expertise.

Join cloud veteran Jorge Castro as he takes you on a guided tour of our cloud native projects. This tour will include an introduction to the Pavilion, making introductions, interacting with maintainers, and ensuring you end up talking to the right projects!

Meeting Point: Please meet Jorge over at the Project Pavilion at the sign "CNCF Project Team Here to Help!"
Wednesday August 21, 2024 10:40am - 11:00am HKT
Level 2 | Grand Ballroom 3-4 | Project Pavilion

11:00am HKT

CNCF Project Lightning Talks Welcome & Opening - Jorge Castro, CNCF
Wednesday August 21, 2024 11:00am - 11:05am HKT
Join us for a rapid-fire journey through the CNCF ecosystem, where experts, including project maintainers and community members, share insights, innovations, and real-world applications of Cloud Native Computing Foundation projects. Each project has just five minutes to present, promising to enlighten and inspire with cutting-edge tools and practices that shape the future of cloud-native development. Whether you're a seasoned pro or just getting started, there's something for everyone in the world of Cloud Native Computing!

FAQ:
  • Do I need an all-access pass to attend the project lightning talks? No, you will only need your KubeCon + CloudNativeCon only pass for access.
  • When will the schedule of Project Lightning Talks be available? Friday, 21 June.
Wednesday August 21, 2024 11:00am - 11:05am HKT
Level 2 | Grand Ballroom 1-2

11:00am HKT

Addressing Challenges of Cross-Architecture Dynamic Migration Over Heterogeneous Acceleration System | 解决异构加速系统上跨架构动态迁移的挑战 - Yanjun Chen, China Mobile
Wednesday August 21, 2024 11:00am - 11:35am HKT
With the surge of application computing demand, the industry began to run AI applications on diverse acceleration hardwares (GPU, FPGA, NPU...) to gain more processing capability. One key problem to use diverse accelerators is tool chain & vendor lock-in in application Dev-to-Run processes. Cross-system (multi-arch chips + multi-vendor tool chain) application development and migration is hard to achieve. In this presentation China Mobile will introduce the practices to solve above challenges allowing AI applications smoothly migrate among different accelerators. It includes a unified abstraction for diverse accelerators, a middle-compiler using existing compilers (CUDA, ROCm, oneAPI...) to achieve cross-architecture compile in the same execution, and a runtime supporting dynamic and replaceable link. We want to enable applications migrate freely between diverse accelerators without changing development habits, and show the architecture design, open source plans and a demo.

随着应用计算需求的激增,行业开始在各种加速硬件(GPU、FPGA、NPU等)上运行AI应用程序,以获得更多的处理能力。在使用各种加速器时,一个关键问题是在应用程序开发到运行过程中的工具链和供应商锁定。跨系统(多架构芯片+多供应商工具链)应用程序开发和迁移很难实现。在这个演示中,中国移动将介绍解决上述挑战的实践,使AI应用程序能够在不同的加速器之间平稳迁移。这包括对各种加速器的统一抽象,使用现有编译器(CUDA、ROCm、oneAPI等)的中间编译器实现跨架构编译在同一执行中,以及支持动态和可替换链接的运行时。我们希望能够使应用程序在不改变开发习惯的情况下自由迁移至各种加速器,并展示架构设计、开源计划和演示。
Speakers
avatar for Yanjun Chen

Yanjun Chen

Open Source Expert, China Mobile
Yanun Chen is the open source expert and CNCF delegate in China Mobile. She joined actively in many open source projects and now she is the TSC member of LF Edge Akraino.
Wednesday August 21, 2024 11:00am - 11:35am HKT
Level 1 | Hung Hom Room 3

11:00am HKT

SIG-Multicluster Intro and Deep Dive | SIG-Multicluster介绍和深入探讨 - Jeremy Olmsted-Thompson, Google; Hongcai Ren, Huawei; Jian Qiu, Red Hat
Wednesday August 21, 2024 11:00am - 11:35am HKT
SIG-Multicluster is focused on solving common challenges related to the management of many Kubernetes clusters, and applications deployed across many clusters, or even across cloud providers. In this session, we'll give attendees an overview of the current status of the multi-cluster problem space in Kubernetes and of the SIG. We’ll discuss current thinking around best practices for multi-cluster deployments and what it means to be part of a ClusterSet. Then we’ll highlight current SIG projects, focused use cases, and ideas for what’s next. Most importantly, we’ll provide information on how you can get involved either as a contributor or as a user who wants to provide feedback about the SIG's current efforts and future direction. Bring your questions, problems, and ideas - help us expand the multi-cluster Kubernetes landscape!

SIG-Multicluster专注于解决与管理许多Kubernetes集群和部署在许多集群甚至跨云提供商的应用程序相关的常见挑战。在本场演讲中,我们将向与会者概述Kubernetes中多集群问题空间的当前状态和SIG。我们将讨论关于多集群部署最佳实践的当前思考以及成为ClusterSet的一部分意味着什么。然后,我们将重点介绍当前的SIG项目、关注的用例和下一步的想法。最重要的是,我们将提供有关如何参与其中的信息,无论是作为贡献者还是作为希望就SIG当前工作和未来方向提供反馈的用户。带上你的问题、问题和想法 - 帮助我们扩展多集群Kubernetes领域!
Speakers
avatar for Jeremy Olmsted-Thompson

Jeremy Olmsted-Thompson

Principal Engineer, Google
Jeremy is a software engineer who works on Google Kubernetes Engine. His main focus is on simplifying the Kubernetes experience, and making it as easy as possible to deploy applications both within a cluster with things like GKE Autopilot, and across clusters with multi-cluster solutions... Read More →
avatar for Hongcai Ren

Hongcai Ren

Senior Software Engineer, Huawei
Hongcai Ren(@RainbowMango) is the CNCF Ambassador, who has been working on Kubernetes and other CNCF projects since 2019, and is the maintainer of the Kubernetes and Karmada projects.
avatar for Jian Qiu

Jian Qiu

Senior Principal Software Engineer, Red Hat
Qiu Jian is a developer at Redhat mainly focusing on multiple cluster management.
Wednesday August 21, 2024 11:00am - 11:35am HKT
Level 1 | Hung Hom Room 6

11:00am HKT

Accelerating Serverless AI Large Model Inference with Functionalized Scheduling and RDMA | 通过功能化调度和RDMA加速无服务器AI大模型推理 - Yiming Li, Tianjin University& Chenglong Wang, Jinan Inspur Data Technology Co., Ltd.
Wednesday August 21, 2024 11:00am - 11:35am HKT
The deployment of AI large models on standard Serverless inference platforms like KServe is gaining popularity due to its ability to improve resource utilization and reduce costs. However, existing large model inference faces significant scheduling and communication bottlenecks, making it challenging to meet low-latency and high-throughput demands. The centralized control plane of Kubernetes leads to low scheduling efficiency, unable to achieve second-level response to large-scale burst requests. Additionally, the large model inference needs to transfer GB-level KV cache for each request, resulting in high communication overhead. So, we have developed a highly elastic functionalized scheduling framework to guarantee second-level scheduling for thousands of Serverless AI large model inference task instances. Additionally, we leverage RDMA technology to achieve high-speed KV cache migration, avoiding the high overhead caused by traditional network protocol stacks.

AI大模型在像KServe这样的标准无服务器推理平台上的部署越来越受欢迎,因为它能够提高资源利用率并降低成本。然而,现有的大模型推理面临着重要的调度和通信瓶颈,使得满足低延迟和高吞吐量需求变得具有挑战性。Kubernetes的集中式控制平面导致低调度效率,无法实现对大规模突发请求的秒级响应。此外,大模型推理需要为每个请求传输GB级别的KV缓存,导致高通信开销。因此,我们开发了一个高度弹性的功能化调度框架,以确保对数千个无服务器AI大模型推理任务实例进行秒级调度。此外,我们利用RDMA技术实现高速KV缓存迁移,避免传统网络协议栈引起的高开销。
Speakers
avatar for Cookie

Cookie

Senior Software Engineer, Jinan Inspur Data Technology Co., Ltd.
I'm employed in Inspur. I mainly do container computing related development and are familiar with container networks, especially Calico and Cilium. I'm also a contributor to the Openyurt community and mainly participate in the development of the raven project.
avatar for Yiming Li

Yiming Li

PhD candidate, Tianjin University
Yiming Li received the bachelor’s and master’s degrees from Tianjin University, China, in 2017 and 2019, respectively. He is currently pursuing the Ph.D. degree with the College of Intelligence and Computing, Tianjin University, China. His research interests include cloud com... Read More →
Wednesday August 21, 2024 11:00am - 11:35am HKT
Level 1 | Hung Hom Room 7
  KubeCon + CloudNativeCon Sessions, AI + ML

11:00am HKT

How to Increase the Throughput of Kubernetes Scheduler by Tens of Times | 如何将Kubernetes调度器的吞吐量提高数十倍 - Yuquan Ren & Bing Li, ByteDance
Wednesday August 21, 2024 11:00am - 11:35am HKT
Currently, various Kubernetes-based task schedulers popular in the community have limited performance capabilities, which restricts the cluster scale they can handle. Due to the limitation of cluster scale, it is difficult to improve resource utilization through large-scale colocation, and more clusters also bring greater operational burdens. 1. Due to the bottleneck of the scheduler and related components, the maximum cluster scale cannot exceed 5k nodes; 2. In clusters with more than 5k Nodes, scheduling throughput cannot exceed 100 Pods/s. Godel Scheduler is a distributed high-performance scheduler based on Kubernetes, and it is now open-sourced. In this talk, we will go deep into the performance optimization methods of godel scheduler: 1. Optimize scheduling algorithms and do data structures refactor; 2. Implement optimistic concurrency under multi-shard architecture to achieve parallel computation; 3. Abstract "batch" scheduling to fully reuse scheduling computation results.

目前,社区中流行的基于Kubernetes的各种任务调度器在性能方面存在一定限制,这限制了它们能处理的集群规模。由于集群规模的限制,通过大规模的共存难以提高资源利用率,而且更多的集群也会带来更大的运维负担。1. 由于调度器及相关组件的瓶颈,最大集群规模无法超过5k个节点;2. 在超过5k个节点的集群中,调度吞吐量无法超过100个Pod/s。 Godel Scheduler是一个基于Kubernetes的分布式高性能调度器,现已开源。在本次演讲中,我们将深入探讨godel调度器的性能优化方法:1. 优化调度算法并进行数据结构重构;2. 在多分片架构下实现乐观并发以实现并行计算;3. 抽象“批量”调度以充分重用调度计算结果。
Speakers
avatar for Yuquan Ren

Yuquan Ren

Cloud Native Architect, ByteDance
Yuquan Ren has 10+ years of working experience in the cloud-native field, contributing extensively to open-source projects such as Kubernetes. Currently, he is a tech leader at ByteDance, primarily focusing on the field of orchestration and scheduling.
avatar for Bing Li

Bing Li

Senior Software Engineer, Bytedance
Bing Li has participated in the open source community for nearly 3 years. Currently, he is a senior software engineer at ByteDance, focusing on scheduling system performance optimization and system evolution.
Wednesday August 21, 2024 11:00am - 11:35am HKT
Level 1 | Hung Hom Room 2
  KubeCon + CloudNativeCon Sessions, Operations + Performance

11:00am HKT

Securing the Supply Chain: A Practical Guide to SLSA Compliance from Build to Runtime | 保障供应链安全:从构建到运行的SLSA合规实用指南 - Enguerrand Allamel, Ledger
Wednesday August 21, 2024 11:00am - 11:35am HKT
Navigating the complexities of supply chain security might seem intimidating, especially with evolving frameworks like SLSA (Supply-chain Levels for Software Artifacts). This talk introduces beginners to the foundational practices required to secure software from build to runtime using CNCF tools. We'll explore how GitHub Actions can automate build processes, integrate with Cosign for keyless artifact signing, and use Kyverno for runtime policy enforcement. Additionally, we'll discuss how tools like in-toto and Kubescape help manage and verify artifact integrity, providing a holistic view of SLSA compliance in the Kubernetes ecosystem. To enhance security further, we will also briefly discuss the potential integration of Hardware Security Modules (HSMs) into the supply chain. HSMs can offer an added layer of security for key management operations critical to signing processes, ensuring that cryptographic keys are managed securely and are resilient against attack.

在KubeCon的一个会话描述: 供应链安全的复杂性可能看起来令人望而却步,尤其是随着像SLSA(软件构件供应链级别)这样不断发展的框架。 本次演讲将向初学者介绍使用CNCF工具来确保软件从构建到运行时的基本实践。 我们将探讨GitHub Actions如何自动化构建流程,与Cosign集成进行无密钥构件签名,以及使用Kyverno进行运行时策略执行。此外,我们还将讨论像in-toto和Kubescape这样的工具如何帮助管理和验证构件完整性,为Kubernetes生态系统中的SLSA合规性提供全面视角。 为了进一步增强安全性,我们还将简要讨论将硬件安全模块(HSMs)集成到供应链中的潜在可能性。HSMs可以为关键管理操作提供额外的安全层,这对签名过程至关重要,确保加密密钥得到安全管理,并且具有抵御攻击的弹性。
Speakers
avatar for Enguerrand Allamel

Enguerrand Allamel

Senior Cloud Security Engineer, Ledger
Enguerrand is a Senior Cloud Security Engineer with experience in Site Reliability Engineering at Ledger since 2022. His work focuses on the security of scalable and reliable cloud systems, leveraging his knowledge of hybrid computing technologies and container orchestration with... Read More →
Wednesday August 21, 2024 11:00am - 11:35am HKT
Level 1 | Hung Hom Room 1
  KubeCon + CloudNativeCon Sessions, Security

11:00am HKT

A New Choice for Istio Data Plane: Architectural Innovation for a Brand-New Performance Experience | Istio数据平面的新选择:全新性能体验的架构创新 - Zhonghu Xu, Huawei
Wednesday August 21, 2024 11:00am - 11:35am HKT
With the deployment of service mesh technologies like Istio, reducing latency overhead caused by data plane proxy architecture has become a critical concern for mesh providers. In this conference, Zhong Hu and Song Yang will propose a fresh solution for the service mesh data plane from an operating system perspective. By leveraging eBPF + kernel enhancements, they enable native traffic governance capabilities in the OS. Unlike other solutions, this approach significantly simplifies the forwarding path of the mesh data plane, resulting in a 60%+ reduction in data plane forwarding latency. In addition, it features low resource overhead and secure isolation. The project redefines the mesh data plane, with Istiod as the control plane, and Huawei is currently conducting internal verification. Furthermore, they will discuss the future evolution of service mesh and exploring the potential of sidecarless architecture in diverse deployment scenarios.

随着像Istio这样的服务网格技术的部署,减少由数据平面代理架构引起的延迟开销已成为网格提供商的一个关键关注点。在本场演讲中,钟虎和宋洋将从操作系统的角度提出一种全新的服务网格数据平面解决方案。通过利用eBPF +内核增强功能,他们在操作系统中实现了原生流量治理能力。与其他解决方案不同,这种方法显著简化了网格数据平面的转发路径,导致数据平面转发延迟降低了60%以上。此外,它具有低资源开销和安全隔离的特点。该项目重新定义了网格数据平面,以Istiod作为控制平面,华为目前正在进行内部验证。 此外,他们将讨论服务网格的未来演变,并探索在不同部署场景中无边车架构的潜力。
Speakers
avatar for Zhonghu Xu

Zhonghu Xu

Principle Engineer, huawei
Zhonghu is an open-source enthusiast and has focused on oss since 2017. In 2023, Zhonghu was awarded `Google Open Source Peer Bonus`. He has worked on istio for more than 6 years and has been a core Istio maintainer and the TOP 3 contributors. He has been continuously serving as Istio... Read More →
Wednesday August 21, 2024 11:00am - 11:35am HKT
Level 1 | Hung Hom Room 5
  Open Source Summit Sessions, Networking + Edge Computing

11:07am HKT

Project Lightning Talk: KCL: Simplifying Kubernetes Manifests Management | KCL:简化 Kubernetes 清单管理
Wednesday August 21, 2024 11:07am - 11:12am HKT
As the software scale continues to grow, the complexity that Kubernetes manifests and resource management is also increasing. KCL aims to reduce the complexity of configuration management and reduce problems such as configuration drift. This Lightning Talk intends to demonstrate how to use the KCL language to manage Kubernetes manifests and resources more simply. This includes using KCL to abstract and simplify complex Kubernetes manifests to reduce configuration scale; verifying and checking Kubernetes manifests to enhance stability and security; and mutating Kubernetes resources to support dynamic configuration management. During this talk, the audience will gain experience in managing existing Kubernetes manifests and resources using KCL without rewriting their infrastructure manifests with KCL.


随着软件规模的不断增长,Kubernetes 清单和资源管理的复杂性也在增加。KCL 旨在降低配置管理的复杂性,减少配置漂移等问题。本次闪电演讲旨在展示如何使用 KCL 语言更简单地管理 Kubernetes 清单和资源。这包括使用 KCL 抽象和简化复杂的 Kubernetes 清单以减少配置规模;验证和检查 Kubernetes 清单以增强稳定性和安全性;以及变更 Kubernetes 资源以支持动态配置管理。在本次演讲中,观众将学习如何使用 KCL 管理现有的 Kubernetes 清单和资源,而无需用 KCL 重写他们的基础设施清单。
Wednesday August 21, 2024 11:07am - 11:12am HKT
Level 2 | Grand Ballroom 1-2

11:14am HKT

Project Lightning Talk: KubeEdge user cases show in multiple industries and scenarios | KubeEdge 在多个行业和场景中的用户案例展示
Wednesday August 21, 2024 11:14am - 11:19am HKT
Since KubeEdge officially entered CNCF in March 2019, it has been widely used in intelligent transportation, smart city, smart park, smart energy, smart factory, smart bank, smart site, CDN and other industries to provide users with integrated edge cloud collaborative solutions.
This topic will share the 10+ KubeEdge user cases in various industries, to help users understand the practical experience of cloud-native edge computing and edge AI.


自 2019 年 3 月 KubeEdge 正式进入 CNCF 以来,已广泛应用于智能交通、智慧城市、智慧园区、智能能源、智能工厂、智能银行、智能工地、CDN 等行业,为用户提供集成的边缘云协同解决方案。本主题将分享 10 多个不同行业中的 KubeEdge 用户案例,帮助用户了解云原生边缘计算和边缘 AI 的实践经验。
Wednesday August 21, 2024 11:14am - 11:19am HKT
Level 2 | Grand Ballroom 1-2

11:21am HKT

Project Lightning Talk: A Deep Dive into Cilium Gateway API: The Future of Ingress Traffic Routing | 深入探讨 Cilium Gateway API:Ingress 流量路由的未来
Wednesday August 21, 2024 11:21am - 11:26am HKT
In the cloud-native era, the traffic routing and secure access of microservices architecture have gone beyond the traditional Kubernetes Ingress API. Cloud-native solutions provide more flexible, scalable, and secure ways to manage traffic both inside and outside the cluster.
For example, Service Mesh technologies like Istio and Linkerd provide rich traffic management features, including dynamic routing, circuit breaking, retries, timeouts, and more. They also have built-in secure service-to-service authentication and encrypted communication, significantly improving the overall system security.
Additionally, modern API gateways like Cilium can seamlessly integrate with Kubernetes, providing more fine-grained routing rules, load balancing, monitoring, and other functionalities. They can serve as the unified entry point for the cluster, simplifying the management of external access.


在云原生时代,微服务架构的流量路由和安全访问已经超越了传统的 Kubernetes Ingress API。云原生解决方案提供了更灵活、可扩展和安全的方式来管理集群内外的流量。

例如,像 Istio 和 Linkerd 这样的服务网格技术提供了丰富的流量管理功能,包括动态路由、熔断、重试、超时等。它们还内置了安全的服务间身份验证和加密通信,大大提高了系统的整体安全性。

此外,像 Cilium 这样的现代 API 网关可以无缝集成 Kubernetes,提供更细粒度的路由规则、负载均衡、监控等功能。它们可以作为集群的统一入口点,简化外部访问的管理。
Wednesday August 21, 2024 11:21am - 11:26am HKT
Level 2 | Grand Ballroom 1-2

11:28am HKT

Project Lightning Talk: Explore Secure Artifacts Storage and Management with Harbor | 探索使用 Harbor 进行安全的制品存储和管理
Wednesday August 21, 2024 11:28am - 11:33am HKT
Harbor is well known as a trusted open source registry that offers rich set of functionalities and align with secure supply chain of artifacts store and management.
In this sessions, we will talk about the most exciting security related features in our recent releases, especially for SBOM generation and management that adopt OCI spec 1.1
Secure artifacts future expectation in Harbor would come into discussion as well. Like supporting of scanning oci-compatible helm charts, landing encrypted images, enhancement of security hub etc..
Please join us and explore more possibilities for secure your artifacts in cloud-native registry


Harbor作为一款备受信任的开源注册表,提供丰富的功能,并符合制品存储和管理的安全供应链。在本次会议中,我们将讨论我们最新发布的与安全相关的最激动人心的功能,特别是采用OCI规范1.1的SBOM生成和管理。此外,还将讨论Harbor在安全制品未来的期望,如支持扫描OCI兼容的Helm图表、引入加密镜像、增强安全中心等。请加入我们,探索云原生注册表中安全制品的更多可能性。
Wednesday August 21, 2024 11:28am - 11:33am HKT
Level 2 | Grand Ballroom 1-2

11:35am HKT

Project Lightning Talk: nerdctl: Docker-compatible CLI for containerd | nerdctl:基于 containerd 的兼容 Docker CLI
Wednesday August 21, 2024 11:35am - 11:40am HKT
During this session, participants will learn about nerdctl’s compatibility compared to Docker and Podman, along with features that Docker has not yet implemented. These include:
* Lazy-pulling with Stargz/Nydus/OverlayBD
* Peer-to-peer image distribution with IPFS
* Image encryption with OCIcrypt
* Image signing with Cosign
* Slirp-less rootless containers with bypass4netns
* Interactive Dockerfile debugging with buildg

Furthermore, the session will delve into nerdctl’s features, related projects(such as Lima, AWS Finch, Colima, Rancher Desktop, Kind ...), and the envisioned roadmap for its future development. Lastly, we aim to delve deeper into community engagement to contribute to the project.


在本次会议中,参与者将了解 nerdctl 与 Docker 和 Podman 的兼容性,以及 Docker 尚未实现的功能。这些功能包括:

* 使用 Stargz/Nydus/OverlayBD 进行延迟拉取
* 使用 IPFS 进行点对点镜像分发
* 使用 OCIcrypt 进行镜像加密
* 使用 Cosign 进行镜像签名
* 使用 bypass4netns 实现无 Slirp 的无根容器
* 使用 buildg 进行交互式 Dockerfile 调试

此外,本次会议还将深入探讨 nerdctl 的功能、相关项目(如 Lima、AWS Finch、Colima、Rancher Desktop、Kind 等),以及其未来开发的愿景路线图。最后,我们还将深入讨论社区参与,为项目的贡献做出贡献。
Wednesday August 21, 2024 11:35am - 11:40am HKT
Level 2 | Grand Ballroom 1-2

11:42am HKT

Project Lightning Talk: What's new in Kuasar 1.0? | Kuasar 1.0 有什么新特性?
Wednesday August 21, 2024 11:42am - 11:47am HKT
Open-sourced in April 2023 and joining the CNCF in December, Kuasar is already a year and a half old. As the Sandbox API stabilizes in the upcoming containerd 2.0 release in 2024, Kuasar has been the first to complete adaptation and updates, with native support for containerd 2.0. In the forthcoming 1.0 release, Kuasar will undergo significant updates, including the latest Sandbox API, new adaptations to microVM, WebAssembly, appKernel, and runc containers. We're excited to share the progress of the Kuasar project. Come join us and ask your questions to the on-site Kuasar maintainers.


Kuasar 于 2023 年 4 月开源,并于同年 12 月加入 CNCF,至今已有一年半的历史。随着 Sandbox API 在即将发布的 containerd 2.0 中稳定下来,Kuasar 成为首个完成适配和更新,并原生支持 containerd 2.0 的项目。在即将发布的 1.0 版本中,Kuasar 将经历重大更新,包括最新的 Sandbox API,以及对微虚拟机(microVM)、WebAssembly、应用内核(appKernel)和 runc 容器的新适配。我们非常期待分享 Kuasar 项目的进展。欢迎加入我们,并向现场的 Kuasar 维护者提问。
Wednesday August 21, 2024 11:42am - 11:47am HKT
Level 2 | Grand Ballroom 1-2

11:49am HKT

Project Lightning Talk: WasmEdge 0.14.0 release highlight | 项目闪电讲:WasmEdge 0.14.0 发布亮点
Wednesday August 21, 2024 11:49am - 11:54am HKT
WasmEdge project released 0.14.0. In this version, we introduced lots of key features of the Wasm proposals, including WasmGC, Typed Function Reference, Exception Handling, and more. We also fully integrated the llama.cpp as our plugin to execute LLM. In this talk, I will give a quick update on the 0.14.0 highlight and the future roadmap of WasmEdge.

WasmEdge 项目发布了 0.14.0 版本。在这个版本中,我们引入了许多 Wasm 提案的关键特性,包括 WasmGC、Typed Function Reference、异常处理等等。我们还完全集成了 llama.cpp 作为我们执行 LLM 的插件。在这次讲话中,我将快速更新一下 0.14.0 版本的亮点以及 WasmEdge 的未来路线图。
Wednesday August 21, 2024 11:49am - 11:54am HKT
Level 2 | Grand Ballroom 1-2

11:50am HKT

Ethics in the Cloud: Safeguarding Responsible AI Development in Asia | 云计算中的伦理:在亚洲保障负责任的人工智能发展 - Quiana Berry, Red Hat
Wednesday August 21, 2024 11:50am - 12:25pm HKT
Ethics serve as the compass guiding responsible innovation and societal progress. This presentation blends ethics, cloud computing, and AI advancement, spotlighting the imperative of upholding responsible AI practices, particularly within the Asian market. From safeguarding data privacy and fortifying cybersecurity to navigating regulatory compliance and governance, this comprehensive discourse delves into multifaceted dimensions essential for ethical AI development. As Asia, including China, propels the frontier of AI innovation, the imperative of embedding ethics and responsible practices becomes increasingly pronounced. This session is tailored to provide actionable strategies and regulatory insights for Asian leaders. Together, we'll empower attendees to become champions of responsible AI practices, fostering a culture of integrity and innovation in the vibrant and diverse tech landscape of Asia.

伦理道德是引导负责任创新和社会进步的指南。本次演讲融合了伦理、云计算和人工智能的进步,重点关注在亚洲市场内坚持负责任人工智能实践的必要性。从保护数据隐私和加强网络安全到遵守监管合规和治理,这场全面的讨论深入探讨了对伦理人工智能发展至关重要的多方面维度。 随着亚洲,包括中国,推动人工智能创新的前沿,嵌入伦理和负责任实践的必要性变得日益突出。本场演讲旨在为亚洲领导者提供可行的策略和监管洞察。 让我们共同助力与会者成为负责任人工智能实践的倡导者,在亚洲充满活力和多样化的科技领域中培育诚信和创新的文化。
Speakers
avatar for Quiana Berry

Quiana Berry

Product Lead, Red Hat
Quiana is a dynamic cloud Product Lead at Red Hat/IBM, dedicated to crafting innovative developer tools and reshaping the future of technology. With an academic foundation encompassing Anthropology, Biology, and Chemistry and a specialty in the fusion of (DEI) and Ethical AI, Quiana... Read More →
Wednesday August 21, 2024 11:50am - 12:25pm HKT
Level 1 | Hung Hom Room 3

11:50am HKT

Safeguarding Cloud Native Supply Chain | 保护云原生供应链-公证项目介绍,新功能和即将推出的内容 - Yi Zha, Microsoft & Mostafa Radwan, CloudRoads
Wednesday August 21, 2024 11:50am - 12:25pm HKT
Ensuring a secure supply chain for container images is vital in the cloud-native ecosystem. But how can you be certain that container images originate from trusted sources? And how can you verify that they haven’t been altered since their creation? Join this session to delve into how the Notary Project bolsters cloud native supply chains by leveraging authentic container images and other OCI artifacts. Our maintainers will present an overview of the project for newcomers. Discover the latest features that enhance software supply chains, gain insights into the roadmap, and explore upcoming developments, including attestations. Observe a user demonstrating how the Notary Project guarantees the integrity and authenticity of container images and arbitrary files. Our maintainers will be available to answer any questions you may have. Whether you’re new or experienced in container security, or someone interested in contributing to the project, this session is not to be missed!

确保容器镜像的安全供应链对于云原生生态系统至关重要。但是,您如何确保容器镜像来自可信的来源?您如何验证它们自创建以来没有被篡改?加入本场演讲,深入了解Notary项目如何通过利用真实的容器镜像和其他OCI工件来增强云原生供应链。我们的维护人员将为新手提供项目概述。发现增强软件供应链的最新功能,了解路线图,探索即将推出的开发,包括证明。观察用户演示Notary项目如何保证容器镜像和任意文件的完整性和真实性。我们的维护人员将随时回答您可能有的任何问题。无论您是容器安全的新手还是经验丰富的人,或者是有兴趣为项目做出贡献的人,都不要错过本场演讲!
Speakers
avatar for Mostafa Radwan

Mostafa Radwan

Principal Consultant, CloudRoads
Mostafa is a technologist and consultant specialized in cloud native computing. He started his career as a software engineer before getting in the trenches of application and production support. He enjoys helping enterprise companies successfully adopt DevOps and cloud native technologies... Read More →
avatar for Yi Zha

Yi Zha

Senior Product Manager, Microsoft
Yi is a senior product manager in Azure Container Upstream team at Microsoft and is responsible for container supply chain security for Azure services and customers. He is also a maintainer of CNCF project Notary, and a contributor of CNCF ORAS and OSS project Ratify.
Wednesday August 21, 2024 11:50am - 12:25pm HKT
Level 1 | Hung Hom Room 6

11:50am HKT

AI Inference Performance Acceleration: Methods, Tools, and Deployment Workflows | AI推理性能加速:方法、工具和部署工作流程 - Yifei Zhang & 磊 钱, Bytedance
Wednesday August 21, 2024 11:50am - 12:25pm HKT
As AI rapidly evolves and embraces cloud-native technologies, inference performance has become crucial for application value. GPU selection, serving framework configuration, and model/data loading significantly impact inference efficiency. We'll focus on cloud-native solutions to storage performance issues and tools for evaluating inference performance across configurations, offering optimal deployment setups integrated into cloud-native workflows. We'll discuss inference performance's impact on user experience and how optimization can reduce costs and improve efficiency. Using technologies like Fluid and model optimization, we'll share strategies to enhance inference performance. Based on performance and cost analysis of various GPUs, we'll guide AI engineers in hardware selection. Additionally, we'll introduce a performance testing tool to evaluate and recommend the best model, hardware, and acceleration scheme combinations, aligning with deployment workflows based on test results.

随着人工智能的快速发展和对云原生技术的采用,推理性能对应用价值变得至关重要。 GPU选择、服务框架配置以及模型/数据加载对推理效率有着重大影响。我们将专注于云原生解决方案,解决存储性能问题,并提供评估不同配置下推理性能的工具,为云原生工作流程提供最佳部署设置。 我们将讨论推理性能对用户体验的影响,以及优化如何降低成本并提高效率。利用Fluid和模型优化等技术,我们将分享增强推理性能的策略。基于各种GPU的性能和成本分析,我们将指导人工智能工程师进行硬件选择。此外,我们将介绍一种性能测试工具,评估并推荐最佳模型、硬件和加速方案组合,根据测试结果与部署工作流程相匹配。
Speakers
avatar for Yifei Zhang

Yifei Zhang

Software Engineer, Bytedance
Yifei Zhang, Software Engineer at Volcengine, focuses on technical research and product development in Kubernetes and AI, and has rich experience in public cloud, and is now fully working on VKE (Volcengine Kubernetes Engine), which is the managed Kubernetes product in Volcengine... Read More →
avatar for 钱磊

钱磊

Software Engineer, Bytedance
a kubernetes developer in bytedance. focus on building a stable kubernetes engine on public cloud.
Wednesday August 21, 2024 11:50am - 12:25pm HKT
Level 1 | Hung Hom Room 7
  KubeCon + CloudNativeCon Sessions, AI + ML

11:50am HKT

Extend Kubernetes to Edge Using Event-Based Transport | 使用基于事件的传输将Kubernetes扩展到边缘 - Longlong Cao & Meng Yan, Red Hat
Wednesday August 21, 2024 11:50am - 12:25pm HKT
Struggling with extensive edge cluster management? Kubernetes adoption brings new challenges, especially in sectors like telecom, retail, and manufacturing. The surge in clusters highlights Kubernetes' limitations, worsened by unreliable networks between data centers and edge clusters. Without scalable control, organizations resort to sending engineers to maintain thousands or even millions of edge clusters, slowing progress. But, we have a solution: connecting Kubernetes and edge clusters via event-based transport, utilizing standard open-source protocols like Kafka, MQTT, and NATS. This enhances Kubernetes-style events, making them resilient to network delays or disconnects. With these capabilities, we can effortlessly construct a central control plane scalable to millions of edge clusters. Join us for an intuitive control plane, handling a million edge clusters across regions. Learn an approach that can be adapted to your edge management infrastructure today.

在KubeCon的会议描述中,若您正在为庞大的边缘集群管理而苦恼?Kubernetes的采用带来了新的挑战,尤其是在电信、零售和制造等行业。集群数量的激增凸显了Kubernetes的局限性,加剧了数据中心和边缘集群之间不稳定网络的问题。在缺乏可扩展控制的情况下,组织不得不派遣工程师去维护成千上万甚至数百万个边缘集群,从而拖慢了进展。但是,我们有解决方案:通过基于事件的传输将Kubernetes和边缘集群连接起来,利用标准的开源协议如Kafka、MQTT和NATS。这样可以增强Kubernetes风格的事件,使其能够抵御网络延迟或断开连接。有了这些功能,我们可以轻松构建一个可扩展到数百万个边缘集群的中央控制平台。加入我们,体验一个直观的控制平台,可以跨区域管理数百万个边缘集群。学习一种可以立即应用于您的边缘管理基础设施的方法。
Speakers
avatar for Longlong Cao

Longlong Cao

Senior Software Engineer, Red Hat
Long Long Cao currently works as a cloud engineer at Red Hat, he is also maintainer of the Istio project and member of the Kubernetes SIGs. He is passionate about open source projects and has extensive experience in Docker, Kubernetes and Service Mesh. He writes blogs/articles and... Read More →
avatar for Meng Yan

Meng Yan

Software Engineer, Red Hat
Meng Yan currently works as a software engineer at Red Hat. What he mainly does is the management of large-scale clusters. Mainly contributed to open source projects are multicluster-global-hub, multicluster-controlplane, etc, also participating in the improvement of Cloudevent.
Wednesday August 21, 2024 11:50am - 12:25pm HKT
Level 1 | Hung Hom Room 1
  KubeCon + CloudNativeCon Sessions, Connectivity

11:50am HKT

Implementing Fine-Grained and Pluggable Container Resource Management Leveraging NRI | 基于 NRI 实现精细化且可插拔的容器资源管理 - Qiang Ren, Intel & He Cao, ByteDance
Wednesday August 21, 2024 11:50am - 12:25pm HKT
To overcome Kubernetes' limitations in resource management, ByteDance developed Katalyst, a resource management system. Katalyst employs a range of methodologies, including colocation, node over-commitment, specification recommendation, and tidal colocation, aimed at optimizing cluster resource utilization.

Initially, Katalyst introduced a QoS Resource Manager (QRM) framework within kubelet, facilitating versatile container resource allocation through a plugin architecture. Presently, the Node Resource Interface (NRI) presents a refined alternative.

This session elucidates how Katalyst leverages NRI for fine-grained and adaptable container resource management, ensuring efficiency without intrusive modifications of upstream components. This novel architecture allows Katalyst to seamlessly integrate with native Kubernetes, offering a user-friendly and easily maintainable solution.

为了克服 Kubernetes 在资源管理方面的局限性,字节跳动构建了一个资源管理系统 Katalyst,通过在离线业务常态混部、资源超分、规格推荐、潮汐混部等方式,提升集群的资源利用率。最初,Katalyst 在 kubelet 中引入了一个 QoS Resource Manager(QRM)框架,通过插件化的方式来扩展容器的资源分配策略;当前,Node Resource Interface(NRI)提供了一个原生的替代方案。

本次演讲将介绍 Katalyst 如何通过 NRI 实现精细化且可插拔的容器资源管理,在不对上游组件进行侵入性修改的情况下,提升资源利用率并保证业务的 SLO 不受影响。这种全新的架构使 Katalyst 能够与原生 Kubernetes 无缝集成,提供了一种易于使用和维护的解决方案。
Speakers
avatar for Qiang Ren

Qiang Ren

Software Engineer, Intel
Ren Qiang works as a Cloud Orchestration Software Engineer in SATG, Intel. He mainly focuses on Cloud Native technologies in the runtime. At the same time, he actively participates in open-source projects and is committed to promoting the development of runtime and resource isola... Read More →
avatar for He Cao

He Cao

Senior Software Engineer, ByteDance
He Cao is a senior software engineer on the Cloud Native team at ByteDance, a maintainer of Katalyst and KubeZoo, and a member of Istio. He has 5+ years of experience in the cloud native area. Since joining ByteDance, he has designed and implemented several critical systems for VKE... Read More →
Wednesday August 21, 2024 11:50am - 12:25pm HKT
Level 1 | Hung Hom Room 2

11:50am HKT

Community Charter and Cookbook: The Recipe of Building Communities in the Open | 社区章程与手册:在开放中建立社区的秘诀 - Prithvi Raj, Harness
Wednesday August 21, 2024 11:50am - 12:25pm HKT
An open source project holding immense value and with massive potential fails to build an exciting community. A community that is interactive and scaling at the start becomes stagnant after a point. A project has good github traction but doesn't have a good community traction. These are some of the many problems that arise while building an open source project community and are still very much prevalent. This talk summarises the right ingredients essential in nurturing a project community and ensuring its growth over the years to come through Prithvi's experience. He will be highlighting steps and best practices to be ensured in terms of the right metrics, content curation, social portrayal and building the right culture amongst stakeholders and the broader audience. He will also share tips and tricks on creating the right special interest groups, ensuring constant contributions and incentivising the community.

一个拥有巨大价值和巨大潜力的开源项目,却未能建立一个令人兴奋的社区。一个在开始时互动并扩展的社区在某个时刻变得停滞不前。一个项目在GitHub上有良好的关注度,但却缺乏良好的社区关注度。 这些是在建立开源项目社区时出现的许多问题,而且仍然非常普遍。这次演讲总结了在培育项目社区并确保其未来增长方面至关重要的正确要素,通过Prithvi的经验。 他将重点介绍在正确的指标、内容策划、社交表现和在利益相关者和更广泛的观众中建立正确文化方面应确保的步骤和最佳实践。 他还将分享有关创建正确的特殊兴趣小组、确保持续贡献和激励社区的技巧和窍门。
Speakers
avatar for Prithvi Raj

Prithvi Raj

Technical Community Manager, Harness
Prithvi Raj is a Technical Community Manager at Harness and a CNCF Ambassador. He is currently leading the community for the LitmusChaos CNCF incubating project. He has 4 years of experience in the industry and has helped scale the broader Chaos Engineering community. He has worked... Read More →
Wednesday August 21, 2024 11:50am - 12:25pm HKT
Level 1 | Hung Hom Room 5

11:56am HKT

Project Lightning Talk: Xline: Achieving Fast Consensus and High Performance in Wide-Area Networks | 项目闪电讲:Xline:在广域网络中实现快速共识和高性能
Wednesday August 21, 2024 11:56am - 12:01pm HKT
With the development of technology and the improvement of infrastructure, cloud computing is gradually entering the era of multi-cloud. In multi-cloud scenarios, the high latency of networks between clouds poses new infrastructure challenges. As one of the infrastructures in multi-cloud scenarios, Xline has proposed its solution to this challenge.

In this session, we‘ll introduce Xline, a distributed KV store designed to provide metadata management in WAN environments.

This presentation includes three parts:
1. What’s Xline and why do we need it
2. How does Xline achieve fast consensus within 1 RTT, while etcd needs 2
3. The benchmark report of Xline in LAN & WAN

随着技术的发展和基础设施的改善,云计算正逐渐进入多云时代。在多云场景中,云之间高延迟的网络成为新的基础设施挑战。作为多云场景中的基础设施之一,Xline 提出了其应对这一挑战的解决方案。

在本次会议中,我们将介绍 Xline,这是一个设计用于在广域网络环境中提供元数据管理的分布式 KV 存储系统。

本次演示包括三个部分:
1. Xline 是什么以及为什么我们需要它
2. Xline 如何在一个 RTT 内实现快速共识,而 etcd 需要 2 个 RTT
3. Xline 在局域网和广域网环境中的基准测试报告
Wednesday August 21, 2024 11:56am - 12:01pm HKT
Level 2 | Grand Ballroom 1-2

12:03pm HKT

Project Lightning Talk: Adaptive Tracing Propagation with OpenTelemetry: Navigating Protocol Diversity in the Cloud | 项目闪电讲:使用OpenTelemetry进行自适应跟踪传播:在云中导航协议多样性
Wednesday August 21, 2024 12:03pm - 12:08pm HKT
The cloud's vast landscape is characterized by a diversity of applications that employ different tracing protocols, each tailored to specific telemetry collection tools and requirements. This diversity results in an ecosystem where maintaining consistent and reliable traces across disparate systems is a significant challenge.
To address this, our contribution enhances OpenTelemetry with an adaptive approach to propagate multi-protocol trace signals. Building on this implementation, we have developed an adaptable and extendable trace propagation framework, facilitating a more seamless trace propagation process and ensuring instant compatibility with a diverse range of cloud services.
During this talk, we'll dive deep into the design and implementation of this feature and introduce how this mechanism achieves out-of-the-box functionality within our APM services.

云计算的广阔领域以各种应用为特征,这些应用采用不同的跟踪协议,每种协议都针对特定的遥测收集工具和需求进行了定制。这种多样性导致了一个生态系统,在这个生态系统中,跨不同系统保持一致且可靠的追踪是一个重大挑战。

为了解决这一问题,我们的贡献在OpenTelemetry基础上增强了自适应的跟踪信号传播方法。在这一实现的基础上,我们开发了一个灵活且可扩展的跟踪传播框架,促进了更无缝的追踪传播过程,并确保与各种云服务的即时兼容性。

在这次演讲中,我们将深入探讨这一特性的设计和实现,并介绍这一机制如何在我们的应用性能管理(APM)服务中实现即插即用的功能。
Wednesday August 21, 2024 12:03pm - 12:08pm HKT
Level 2 | Grand Ballroom 1-2

12:10pm HKT

Project Lightning Talk: Karmada: Project introduction and updates | 项目闪电讲:Karmada:项目介绍与更新
Wednesday August 21, 2024 12:10pm - 12:15pm HKT
Karmada, a CNCF incubating project, aims to offer a unified control plane for seamless deployment and management across diverse cloud environments.

In this lightning talk, the following topic will be covered:

- Briefly introduction of Karmada
- Core Capabilities
- Key Use Cases
- Community updates

Karmada,一个正在CNCF孵化中的项目,旨在提供一个统一的控制平台,实现在多样化的云环境中无缝部署和管理。

在这次闪电讲中,将涵盖以下主题:

- Karmada简介
- 核心能力
- 关键使用案例
- 社区更新
Wednesday August 21, 2024 12:10pm - 12:15pm HKT
Level 2 | Grand Ballroom 1-2

12:17pm HKT

Project Lightning Talk: Kyverno Kickoff: Getting Your Team Onboard in a Flash | 项目闪电讲:Kyverno启动:快速让您的团队上车
Wednesday August 21, 2024 12:17pm - 12:22pm HKT
Convincing your team to adopt new technologies can be challenging, but with the right approach, you can successfully advocate for Kyverno's benefits. We'll discuss the key advantages of Kyverno, such as simplifying policy management, enhancing security, and streamlining compliance efforts within Kubernetes environments. In addition, we'll cover practical tips for addressing common concerns and showcasing Kyverno's value proposition in just 5 minutes.

Whether you're a developer, operator, or team lead, this talk will equip you with persuasive techniques to effectively communicate the benefits of Kyverno and inspire your team to embrace this powerful tool. Join us to discover how to navigate the path to successful Kyverno adoption and drive positive change within your organization."

说服团队采纳新技术可能具有挑战性,但采用正确的方法,您可以成功地倡导Kyverno的好处。我们将讨论Kyverno的主要优势,如简化策略管理、增强安全性以及在Kubernetes环境中优化合规工作的能力。此外,我们还将提供实用的建议,帮助解决常见顾虑,并在短短5分钟内展示Kyverno的价值主张。

无论您是开发人员、运维人员还是团队负责人,本次讲话将为您提供有说服力的技巧,有效传达Kyverno的好处,并激励团队接受这个强大的工具。加入我们,探索如何顺利推动Kyverno的采用,为您的组织带来积极变革。
Wednesday August 21, 2024 12:17pm - 12:22pm HKT
Level 2 | Grand Ballroom 1-2

12:24pm HKT

Project Lightning Talk: Telemetry API and Open Telemetry: the answer of istio monitoring? | 项目闪电讲:Telemetry API 和 OpenTelemetry:Istio 监控的答案?
Wednesday August 21, 2024 12:24pm - 12:29pm HKT
Telemetry API provides the ability to fine-gained telemetry(e.g. accesslog, metrics, tracing) configuration fro sidecar, Open Telemetry help you export telemetry data in a standard protocol, maybe this's the solution you're looking for Istio.

Telemetry API 提供了在 sidecar 上进行精细化遥测配置的能力(例如访问日志、指标、追踪),OpenTelemetry 则帮助您以标准协议导出遥测数据,也许这就是您寻找的 Istio 监控解决方案。
Wednesday August 21, 2024 12:24pm - 12:29pm HKT
Level 2 | Grand Ballroom 1-2

12:25pm HKT

Lunch 🍜 | 午餐
Wednesday August 21, 2024 12:25pm - 1:50pm HKT
Wednesday August 21, 2024 12:25pm - 1:50pm HKT
Level 2 | Grand Ballroom 3-4

12:55pm HKT

Project Pavilion Tour with Jorge Castro, CNCF | 与 Jorge Castro 进行的 CNCF 项目展厅之旅
Wednesday August 21, 2024 12:55pm - 1:15pm HKT
Explore the Project Pavilion, a hub of innovation and discovery! Take part in daily tours, interact with project maintainers at their kiosks, gain insights on community engagement and KCD event organization, and learn more about certification opportunities to showcase your expertise.

Join cloud veteran Jorge Castro as he takes you on a guided tour of our cloud native projects. This tour will include an introduction to the Pavilion, making introductions, interacting with maintainers, and ensuring you end up talking to the right projects!

Meeting Point: Please meet Jorge over at the Project Pavilion at the sign "CNCF Project Team Here to Help!"
Wednesday August 21, 2024 12:55pm - 1:15pm HKT
Level 2 | Grand Ballroom 3-4 | Project Pavilion

1:50pm HKT

⚡ Lightning Talk: Continuously Profile Your Applications in Kubernetes with Pyroscope | ⚡ 闪电演讲: 使用Pyroscope在Kubernetes中持续对应用程序进行性能分析 - Kerrigan Lin, Amazon Web Services
Wednesday August 21, 2024 1:50pm - 1:55pm HKT
Explore performance optimization in Kubernetes using Pyroscope. This Lightning Talk will cover advanced strategies to uncover and resolve performance bottlenecks, enhancing application efficiency and reliability. Tailored for developers and SRE engineers, the session will highlight case studies and demonstrate practical applications of these technologies in real-world scenarios. Attendees will leave with actionable insights for effective performance tuning of containerized applications in Kubernetes.

在KubeCon中探索使用Pyroscope进行Kubernetes性能优化。这场闪电演讲将涵盖发现和解决性能瓶颈的高级策略,提升应用程序的效率和可靠性。针对开发人员和SRE工程师定制,本场演讲将重点介绍案例研究,并演示这些技术在实际场景中的实际应用。与会者将获得有关在Kubernetes中对容器化应用程序进行有效性能调优的可操作见解。
Speakers
avatar for Kerrigan Lin

Kerrigan Lin

Solutions Architect, Amazon Web Services
Kerrigan Lin brings over 14 years of experience in the information technology industry, with a background in software development and architecture. Currently, he serves as a Solutions Architect at AWS, where he helps clients build cloud-native systems.
Wednesday August 21, 2024 1:50pm - 1:55pm HKT
Level 1 | Hung Hom Room 1
  ⚡ Lightning Talks | ⚡ 闪电演讲, Observability

1:50pm HKT

Is Your GPU Really Working Efficiently in the Data Center? N Ways to Improve GPU Usage | 您的GPU在数据中心真的高效工作吗?提高GPU使用率的N种方法 - Xiao Zhang, DaoCloud
Wednesday August 21, 2024 1:50pm - 2:25pm HKT
AI has penetrated into various industries, and companies have purchased many expensive AI GPU devices and used them for training and inference. So what is the reality of the use of these devices? Is the usage rate really high? Is the GPU card being monopolized by a large number of applications that are not heavily used? Do these AI devices work efficiently 24/7? This session will combine our mass production practices to summarize N ways to improve the utilization rate of AI equipment, such as * How to avoid monopoly and improve GPU usage through GPU sharing technology * How to improve GPU device usage through co-located in scenes with obvious tides * How to better perform GPU mark group matching training and inference applications to improve GPU usage This session will combine the practical experience of the two open source projects HAMi and Volcano in production, hoping to give everyone a clearer understanding of how to improve GPU usage.

人工智能已经渗透到各个行业,公司购买了许多昂贵的人工智能GPU设备,并将它们用于训练和推理。那么这些设备的使用情况如何呢?使用率真的很高吗?GPU卡是否被大量不常用的应用程序垄断?这些人工智能设备是否能够24/7高效工作? 本场演讲将结合我们的大规模生产实践,总结提高人工智能设备利用率的N种方法,例如: * 如何通过GPU共享技术避免垄断并提高GPU使用率 * 如何通过与明显潮汐场景共同使用GPU设备来提高GPU使用率 * 如何更好地执行GPU标记组匹配训练和推理应用程序,以提高GPU使用率 本场演讲将结合两个开源项目HAMi和Volcano在生产中的实际经验,希望能让大家更清楚地了解如何提高GPU使用率。
Speakers
avatar for xiaozhang

xiaozhang

Senior Technical Lead, DaoCloud
- Xiao Zhang is leader of the Container team(focus on infra,AI,Muti-Cluster,Cluster - LCM,OCI) - Kubernetes / Kubernetes-sigs active Contributor、member - Karmada maintainer,kubean maintainer,HAMi maintainer - Cloud-Native Developer - CNCF Open Source Enthusiast. - GithubID: waw... Read More →
Wednesday August 21, 2024 1:50pm - 2:25pm HKT
Level 1 | Hung Hom Room 3

1:50pm HKT

Power TiKV with in-Memory Engine | 使用内存引擎强化 TiKV - Chenjie Tang, PingCAP
Wednesday August 21, 2024 1:50pm - 2:25pm HKT
As a distributed kv database, TiKV supports no-transactional operations and transactional operations. In the transactional mode, there would be many kv versions if there are many UPDATE operations. Scanning such a range containing a large number of kv versions, the read latency is unpredictable. In order to achieve stable low latency for TiKV, we introduced a "In-Memory Engine" to reduce the read latency in such case. Meanwhile the "In-Memory Engine" also can improve the overall performance when there are some hot ranges with heavy read workload.

作为一个分布式kv数据库,TiKV支持非事务操作和事务操作。在事务模式下,如果有很多UPDATE操作,就会有很多kv版本。扫描包含大量kv版本的范围时,读取延迟是不可预测的。为了实现TiKV的稳定低延迟,我们引入了一个“内存引擎”来减少这种情况下的读取延迟。同时,“内存引擎”还可以在存在一些热门范围和大量读取工作负载时提高整体性能。
Speakers
avatar for Chenjie Tang

Chenjie Tang

Engineer, PingCAP
TiKV committer, rust programmer, focus on large scale distributed system
Wednesday August 21, 2024 1:50pm - 2:25pm HKT
Level 1 | Hung Hom Room 6

1:50pm HKT

Boundaryless Computing: Optimizing LLM Performance, Cost, and Efficiency in Multi-Cloud Architecture | 无边界计算:在多云架构中优化LLM性能、成本和效率 - Jian Zhu, Red Hat & Kai Zhang, Alibaba Cloud Intelligence
Wednesday August 21, 2024 1:50pm - 2:25pm HKT
For large language model (LLM) inference, GPU resources within a single data center or cloud region often cannot meet all user demands. Additionally, for the end-users, deploying across multiple geographic regions is necessary to provide an optimal user experience. However, managing model distribution, synchronization, and consistency across multiple regions presents new challenges. To address this, the OCM and Fluid communities have collaborated to automate the multi-region distribution of inference applications through OCM's multi-cluster application deployment capabilities, combined with Fluid's data orchestration capabilities. This automation facilitates the cross-regional distribution and pre-warming of large models, enhancing the efficiency of model deployment and upgrades.

对于大型语言模型(LLM)推理,单个数据中心或云区域内的GPU资源通常无法满足所有用户需求。此外,对于最终用户来说,跨多个地理区域部署是为了提供最佳用户体验。然而,在多个地区管理模型分发、同步和一致性会带来新的挑战。为了解决这个问题,OCM和Fluid社区合作,通过OCM的多集群应用部署能力和Fluid的数据编排能力自动化实现推理应用的多地区分发。这种自动化促进了大型模型的跨地区分发和预热,提高了模型部署和升级的效率。
Speakers
avatar for Kai Zhang

Kai Zhang

Senior Staff Engineer, Alibaba
Kai Zhang is a Senior Staff Engineer at Alibaba Cloud Intelligence, where he has been part of the team developing the Alibaba Cloud container service for Kubernetes (ACK) for over 6 years. He currently leads ACK’s Cloud native AI product and solution offerings. Before this, he spent... Read More →
avatar for Jian Zhu

Jian Zhu

Senior Software Engineer, RedHat
Zhu Jian is a senior software engineer at RedHat, core contributor to open cluster management project. Jian enjoys solving multi-cluster workload distribution problems and extending OCM with add-ons.
Wednesday August 21, 2024 1:50pm - 2:25pm HKT
Level 1 | Hung Hom Room 7
  KubeCon + CloudNativeCon Sessions, AI + ML

1:50pm HKT

Enhancing Cyber Resilience Through Zero Trust Chaos Experiments in Cloud Native Environments | 通过在云原生环境中进行零信任混沌实验来增强网络安全弹性 - Sayan Mondal, Harness & Rafik Harabi, Sysdig
Wednesday August 21, 2024 1:50pm - 2:25pm HKT
Cyber-attacks against cloud-native infrastructure are increasing in frequency and sophistication. The complexity of modern cloud-native systems and the speed at which technology is developing have outpaced cloud security solutions. On the flip side, cyber-criminals are taking advantage of these developments to launch successful cloud attacks. This session delves into the paradigm of Zero Trust Chaos Experiments, exploring how intentional disruptions and simulated cyber threats can uncover vulnerabilities and enhance cyber resilience. Through practical insights, we will illustrate the transformative impact of Zero Trust Chaos Experiments on organizations' ability to detect and mitigate cyber incidents. By the end of the session, participants will be equipped with actionable strategies and a better understanding of how Zero Trust Chaos Experiments can elevate cyber resilience in cloud-native environments

针对云原生基础设施的网络攻击频率和复杂性正在增加。现代云原生系统的复杂性和技术发展速度已经超过了云安全解决方案。与此同时,网络犯罪分子正在利用这些发展来发动成功的云攻击。 本场演讲将深入探讨零信任混沌实验的范式,探讨有意的干扰和模拟网络威胁如何揭示漏洞并增强网络安全弹性。通过实用的见解,我们将阐明零信任混沌实验对组织检测和缓解网络事件能力的转变影响。在会议结束时,参与者将掌握可操作的策略,并更好地了解零信任混沌实验如何提升云原生环境中的网络安全弹性。
Speakers
avatar for Rafik Harabi

Rafik Harabi

Senior Solutions Architect, Sysdig
Rafik has more than 15 years of tech and internet industry experience. Currently, he is a Senior Solution Architect devoted to helping customers secure their cloud native platforms and applications. Before joining Sysdig, he was responsible for executing go-to cloud programmes in... Read More →
avatar for Sayan Mondal

Sayan Mondal

Senior Software Engineer 2, Harness
Sayan Mondal is a Senior Software Engineer II at Harness, building their Chaos Engineering platform and helping them shape the customer experience market. He's the maintainer of a few open-source libraries and is also a maintainer of LitmusChaos (the Incubating CNCF project). Sayan's... Read More →
Wednesday August 21, 2024 1:50pm - 2:25pm HKT
Level 2 | Grand Ballroom 1-2
  KubeCon + CloudNativeCon Sessions, Security

1:50pm HKT

Kubespray Unleashed: Navigating Bare Metal Services in Kubernetes for LLM and RAG | Kubespray大放异彩:在Kubernetes中为LLM和RAG部署裸金属服务 - Kay Yan, DaoCloud & Alan Leung, Equinix
Wednesday August 21, 2024 1:50pm - 2:25pm HKT
Kubespray, popular within the SIG-Cluster-Lifecycle of Kubernetes, is celebrated for deploying production-ready Kubernetes clusters, particularly on bare metal, which boosts performance for AI workloads like LLM and RAG. This session will explore using Kubespray in bare metal settings, addressing challenges, and sharing best practices. The first part of the talk will show Kubespray's key features and provide practical tips. The latter half will focus on swiftly deploying AI using Retrieval-Augmented Generation (RAG), demonstrating how Kubespray facilitates setting up Kubernetes clusters on bare metal. This setup enhances AI applications by integrating continuous knowledge updates and domain-specific information via RAG, improving the accuracy and credibility of the AI systems. The session will conclude with discussions on community engagement and future advancements, followed by a Q&A period to address participant queries.

KubeCon会议描述: Kubespray在Kubernetes的SIG-Cluster-Lifecycle中备受推崇,以在裸金属上部署可用于生产的Kubernetes集群而闻名,特别是对于像LLM和RAG这样的AI工作负载,可以提高性能。本场演讲将探讨在裸金属环境中使用Kubespray,解决挑战,并分享最佳实践。 演讲的第一部分将展示Kubespray的关键特性并提供实用技巧。后半部分将重点介绍如何使用检索增强生成(RAG)快速部署AI,演示Kubespray如何在裸金属上设置Kubernetes集群。通过RAG集成持续的知识更新和领域特定信息,这种设置可以提升AI应用程序的性能,提高AI系统的准确性和可信度。 本场演讲将以社区参与和未来发展的讨论结束,随后进行问答环节以解答参与者的疑问。
Speakers
avatar for Kay Yan

Kay Yan

Principal Software Engineer, DaoCloud
Kay Yan is kubespray maintainer, containerd/nerdctl maintainer. He is the Principal Software Engineer in DaoCloud, and develop the DaoCloud Enterprise Kubernetes Platform since 2016.
avatar for Alan Leung

Alan Leung

Digital Technical Specialist, Equinix
Alan is the Digital Technical Specialist at Equinix with focus on enabling customers, prospects and partners to develop innovative solutions to solve business challenges at the digital edge.
Wednesday August 21, 2024 1:50pm - 2:25pm HKT
Level 1 | Hung Hom Room 2

1:50pm HKT

Zen and the Art of OSPO Maintenance - Group Reflection of OSPO Summits | 禅与OSPO维护的艺术-OSPO峰会的团体反思 - Nadia Jiang, SegmentFault; Richard Sikang Bian, Ant Group; Li Jiansheng, Open Source Way; Zhiqiang Yu, Linux Foundation APAC; Jie Liu, Huawei Technologie
Wednesday August 21, 2024 1:50pm - 2:25pm HKT
Building OSPOs can be easy, but evolving and maintaining them to continue delivering sustainable and widely recognized value is challenging. Our goal is not only to assist companies in establishing their first OSPOs but also to ensure them continually generate value through community-led approaches. With this mission, the LFAPAC OSPO SIG, collaborating with OSPO Group and SegmentFault, successfully held OSPO Summits in 2023 and 2024. These summits convened OSPO practitioners, corporate project leads, and community leaders, facilitating collaboration, and earned high praise. Nonetheless, we also encountered numerous inquiries and discussions about the difficulties of sustainably developing OSPOs. In this panel discussion, we gathered the co-chairs of the OSPO Summits. They will explore these challenges, share their insights and strategies to make overall "OSPO maintenance" easier with the support from OSS Zen and methodologies.

构建OSPO可能很容易,但是让它们不断发展和维持以持续提供可持续和广泛认可的价值是具有挑战性的。我们的目标不仅是帮助公司建立他们的第一个OSPO,还要确保他们通过社区主导的方法持续产生价值。 LFAPAC OSPO SIG与OSPO Group和SegmentFault合作,成功举办了2023年和2024年的OSPO峰会。这些峰会汇集了OSPO从业者、企业项目负责人和社区领导者,促进了合作,并获得了高度赞誉。然而,我们也遇到了许多关于可持续发展OSPO的困难的询问和讨论。 在这个小组讨论中,我们邀请了OSPO峰会的联合主席。他们将探讨这些挑战,分享他们的见解和策略,以便在OSS Zen和方法论的支持下使整体的“OSPO维护”更加容易。
Speakers
avatar for Jie Liu

Jie Liu

Open Source Evangelist, Huawei Technologies Co. Ltd.
Co-Chair of the 2nd OSPO Summit. As an open-source evangelist and OSPOer at Huawei, Jie Liu is dedicated to promoting open source development, fostering collaboration within the open-source communities, and advocating for open source culture. She has been working in the ICT industry... Read More →
avatar for Zhiqiang Yu

Zhiqiang Yu

Open Source Evangelist, Linux Foundation APAC
Zhiqiang Yu is the Chief Open Source Liaison Officer at China Mobile Research. He has been a member of the LF APAC Open Source Evangelist team since 2022 and currently serves as the co-chair of the LF APAC OSPO SIG. Alongside Nadia Jiang and Jiangsheng Li, he launched the first OSPO... Read More →
avatar for Li Jiansheng

Li Jiansheng

creator, 「Open Source Way 」
Open Source advocate.
avatar for Nadia Jiang

Nadia Jiang

COO, SegmentFault
Nadia Jiang currently serves as the COO of SegmentFault and is a co-founder of Apache Answer. She is an active contributor to several open source organizations, including KAIYUANSHE (China Open Source Alliance), Chance Foundation, China Computer Federation (CCF), and China Institute... Read More →
avatar for Richard Sikang Bian

Richard Sikang Bian

Head of Open Source Growth and Strategy, Ant Group
As an engineer by training and father to a toddler, Richard was ex-Square, ex-Microsoft who currently works on the Technical Strategy Initiatives team of Ant Group. Richard is also in charge of Ant Group's Open Source Program Office (OSPO) and enjoys being the evangelist of Open Source... Read More →
Wednesday August 21, 2024 1:50pm - 2:25pm HKT
Level 1 | Hung Hom Room 5

1:55pm HKT

⚡ Lightning Talk: Discussion on CNAI Widely Used in Education | ⚡ 闪电演讲: 教育中广泛使用的CNAI讨论 - Chen Lin, VMware by Broadcom
Wednesday August 21, 2024 1:55pm - 2:00pm HKT
This lightening talk will discuss Cloud Native Artificial Intelligence (CNAI) used in education from three aspects. Firstly, introduce the current situations on CNAI applied on children education. Secondly, demo a kids-friendly prototype of AI training process on cloud native infrastructure. Thirdly, talk about more possibilities of CNAI used in pre-school and in-school education(children enlightenment, students assignment corrections, AI teaching... ), also brings up the foreseen malicious abuse of CNAI problems.

这个闪电演讲将从三个方面讨论在教育领域中使用的云原生人工智能(CNAI)。首先,介绍目前在儿童教育中应用CNAI的现状。其次,演示一个儿童友好的AI培训过程原型在云原生基础设施上的应用。第三,讨论CNAI在学前和学校教育中的更多可能性(儿童启蒙,学生作业批改,AI教学...),同时提出CNAI可能存在的恶意滥用问题。
Speakers
avatar for Chen Lin

Chen Lin

Software Engineer, VMware by Broadcom
Chen Lin joined VMware in 2019 and has 5 years cloud native experience. Chen worked on PKS, Tanzu and TKGs product targeting at networking and production CI/CD. Chen is also member of Kubernetes community, and the maintainer of cloud-provider-vsphere.
Wednesday August 21, 2024 1:55pm - 2:00pm HKT
Level 1 | Hung Hom Room 1

2:00pm HKT

⚡ Lightning Talk: WASM on Embedded Systems (RTOS) | ⚡ 闪电演讲: 嵌入式系统(RTOS)上的WASM - Han Wu, University of Exeter
Wednesday August 21, 2024 2:00pm - 2:05pm HKT
Web Assembly (WASM) has seen significant success in web applications and is now making inroads into other areas like cloud services and even embedded systems that run Real-time Operating Systems (RTOS), such as Zephyr, RT-Thread, Nuttx, and ESP-IDF. This lighting talk will present different approaches to using WASM on embedded systems. - wasmtime (Arm Linux) - wasm-micro-runtime (RTOS) - wasm3 (Baremetal) The above WASM runtimes offer full support for the WASM core specifications. Additionally, their limited support for the WebAssembly System Interface (WASI) enables access to components such as threads, file systems, and network sockets. Although the WASI specifications that provide access to hardware peripherals such as wasi-i2c, wasi-spi, and wasi-digital-io are still in the early stages of development, the potential advantages in portability, security, and deployment simplicity make WASM a promising choice for embedded systems.

Web Assembly(WASM)在Web应用程序中取得了显著的成功,现在正在进入其他领域,如云服务甚至运行实时操作系统(RTOS)的嵌入式系统,例如Zephyr、RT-Thread、Nuttx和ESP-IDF。 这个Lightning Talk将介绍在嵌入式系统中使用WASM的不同方法。 - wasmtime(Arm Linux) - wasm-micro-runtime(RTOS) - wasm3(裸机) 上述WASM运行时完全支持WASM核心规范。此外,它们对WebAssembly系统接口(WASI)的有限支持使得可以访问诸如线程、文件系统和网络套接字等组件。 尽管提供访问硬件外设的WASI规范,如wasi-i2c、wasi-spi和wasi-digital-io仍处于早期开发阶段,但WASM在可移植性、安全性和部署简易性方面的潜在优势使其成为嵌入式系统的一个有前途的选择。
Speakers
avatar for Han

Han

Ph.D. Student, University of Exeter
Ph.D. Student at the University of Exeter in the U.K. for Deep Learning Security in Autonomous Systems. Prior research experience at RT-Thread, LAIX, Xilinx.
Wednesday August 21, 2024 2:00pm - 2:05pm HKT
Level 1 | Hung Hom Room 1
  ⚡ Lightning Talks | ⚡ 闪电演讲, Cloud Native Novice

2:05pm HKT

⚡ Lightning Talk: How Prometheus AI Agent Helps Build Interactive Monitoring? | ⚡ 闪电演讲: Prometheus AI代理如何帮助构建交互式监控? - Zhihao Liu, Quwan
Wednesday August 21, 2024 2:05pm - 2:10pm HKT
In day-to-day work, both SREs and developers often struggle when working with the observability tools like Prometheus, mainly due to the complex PromQL syntax and disorganized metrics. This talk will showcase how to build Agent. It will have the ability to think, act, and analyze like a human, and it will solve user issues through conversation. This talk presents two main standout ideas: 1. Leveraging RAG technology, it performs multi-path retrieval from local metric knowledge, Prometheus API, Request Logs, and public domain knowledge to produce a consolidated answer. 2. Using the ReAct method, it engages in multi-round dialogues to refine and generate the correct PromQL, call api, and render the dashboard return. This talk, we hope the audience will learn: 1. How to integrate LLM effectively within the observability space. 2. The steps to create an easy-to-use and practical Prometheus AI Agent. 3. Gain experience and insights from practical examples of the Prometheus AI Agent.

在日常工作中,SRE和开发人员在使用像Prometheus这样的可观察性工具时经常遇到困难,主要是由于复杂的PromQL语法和混乱的指标。本次演讲将展示如何构建Agent。它将具有像人类一样思考、行动和分析的能力,并通过对话解决用户问题。 本次演讲提出了两个主要的突出想法: 1. 利用RAG技术,从本地度量知识、Prometheus API、请求日志和公共领域知识中进行多路径检索,以生成一个整合的答案。 2. 使用ReAct方法,进行多轮对话以完善和生成正确的PromQL,调用api,并呈现仪表板返回。 通过本次演讲,我们希望观众能学到: 1. 如何在可观察性领域有效地整合LLM。 2. 创建一个易于使用和实用的Prometheus人工智能Agent的步骤。 3. 从Prometheus人工智能Agent的实际示例中获得经验和见解。
Speakers
avatar for Zhihao Liu

Zhihao Liu

Senior Devops Engineer, Quwan
three years of experience in the observability field. I have been involved in the development of the company's observability platform.
Wednesday August 21, 2024 2:05pm - 2:10pm HKT
Level 1 | Hung Hom Room 1
  ⚡ Lightning Talks | ⚡ 闪电演讲, Observability

2:10pm HKT

⚡ Lightning Talk: K8SUG: Unleashing the Power of Community | ⚡ 闪电演讲: K8SUG:释放社区的力量 - Yongkang He, K8SUG.com
Wednesday August 21, 2024 2:10pm - 2:15pm HKT
Unveiling the Powerhouse of Knowledge: K8SUG - the Most Active Kubernetes User Group! Step into the world of K8SUG, where passion meets innovation, and connections spark like wildfire. As the brainchild of its founder, the K8SUG Singapore meetup blossomed into a global phenomenon, stretching its reach from Australia to Canada and the UK, with the USA next on the horizon. In just 1.5 electrifying years, our community has swelled to over 14,000 members worldwide, all fueled by the dedication of our volunteers. Join us and be part of the dynamic exchange shaping the future of Kubernetes!

揭开知识强大的力量:K8SUG - 最活跃的Kubernetes用户组! 走进K8SUG的世界,激情与创新相遇,连接如野火般迸发。作为其创始人的心血结晶,K8SUG新加坡聚会已经发展成为一个全球现象,其影响力从澳大利亚延伸至加拿大和英国,美国也在未来的计划之中。 在短短1.5年的时间里,我们的社区已经发展到全球超过14,000名成员,所有这一切都得益于我们志愿者的奉献。加入我们,成为塑造Kubernetes未来的动态交流的一部分!
Speakers
avatar for Yongkang He

Yongkang He

Founder / Principal Containers Specialist, K8SUG.com
Yongkang He is a {'Kubestronaut', 'CNCF Ambassador', 'AWS Builder', 'Microsoft MVP', 'Google Champion', 'Alibaba MVP'} based in Singapore. He has over 20 years experiences in IT. In recent years, he shifted the focus on Kubernetes, Multi-Cloud. He is 1 of the most certified including... Read More →
Wednesday August 21, 2024 2:10pm - 2:15pm HKT
Level 1 | Hung Hom Room 1

2:40pm HKT

⚡ Lightning Talk: Kubernetes Raises Questions. Can a PaaS Answer Them? | ⚡ 闪电演讲: Kubernetes引发了问题。 PaaS能解答吗? - Ram Iyengar, Cloud Foundry Foundation
Wednesday August 21, 2024 2:40pm - 2:45pm HKT
The enormous success of the CNCF Landscape has produced an overwhelming number of options in the space, where organizations struggle to establish their platforms quickly. This talk will help guide the community through the thought process of building these platforms, explore some examples of what a healthy source-driven platform ecosystem looks like, and showcase the power that a good cloud native platform will deliver to an organization. Though there are variations of platforms (i.e data, application, machine learning, etc) many start to have the same problems. These include artifact management, secrets management, TLS certificates, cloud permissions, and the list goes on. Providing turnkey solutions for platforms that can be ready in minutes adds much velocity to engineering teams across organizations that adopt the platform engineering model.

CNCF景观的巨大成功在该领域产生了大量的选择,组织往往难以快速建立自己的平台。本次演讲将帮助指导社区通过构建这些平台的思考过程,探讨健康的源驱动平台生态系统的一些示例,并展示一个优秀的云原生平台将为组织带来的力量。 尽管平台有各种变化(如数据、应用程序、机器学习等),许多开始出现相同的问题。这些问题包括工件管理、密钥管理、TLS证书、云权限等等。为平台提供即插即用的解决方案,可以在几分钟内准备就绪,为采用平台工程模型的组织的工程团队带来更大的速度。
Speakers
avatar for Ram Iyengar

Ram Iyengar

Chief Evangelist, Cloud Foundry Foundation
Ram Iyengar is an engineer by practice and an educator at heart. He was (cf) pushed into technology evangelism along his journey as a developer and hasn’t looked back since! He enjoys helping engineering teams around the world discover new and creative ways to work. He is a proponent... Read More →
Wednesday August 21, 2024 2:40pm - 2:45pm HKT
Level 1 | Hung Hom Room 1

2:40pm HKT

Self-Hosted LLM Agent on Your Own Laptop or Edge Device | 在自己的笔记本电脑或边缘设备上自托管LLM Agent - Michael Yuan, Second State
Wednesday August 21, 2024 2:40pm - 3:15pm HKT
As LLM applications evolve from chatbots to copilots to AI agents, there are increasing needs for privacy, customization, cost control, and value alignment. Running open-source LLMs and agents on personal or private devices is a great way to achieve those goals. With the release of a new generation of open-source LLMs, such as Llama 3, the gap between open-source and proprietary LLMs is narrowing fast. In many cases, open source LLMs are already outperforming SaaS-based proprietary LLMs. For AI agents, open-source LLMs are not just cheaper and more private. They allow customization through finetuning and RAG prompt engineering using private data. This talk shows you how to build a complete AI agent service using an open-source LLM and a personal knowledge base. We will use the open-source WasmEdge + Rust stack for LLM inference, which is fast and lightweight without complex Python dependencies. It is cross-platform and achieves native performance on any OSes, CPUs, and GPUs.

随着LLM应用程序从聊天机器人发展到副驾驶员再到AI代理,对隐私、定制、成本控制和价值对齐的需求越来越大。在个人或私人设备上运行开源LLMs和代理是实现这些目标的好方法。 随着新一代开源LLMs(如Llama 3)的发布,开源和专有LLMs之间的差距迅速缩小。在许多情况下,开源LLMs已经超越了基于SaaS的专有LLMs。对于AI代理来说,开源LLMs不仅更便宜、更私密,还允许通过微调和使用私人数据进行RAG提示工程来进行定制。 本次演讲将向您展示如何使用开源LLM和个人知识库构建完整的AI代理服务。我们将使用开源的WasmEdge + Rust堆栈进行LLM推理,这种方法快速轻便,不需要复杂的Python依赖。它是跨平台的,在任何操作系统、CPU和GPU上都能实现原生性能。
Speakers
avatar for Michael Yuan

Michael Yuan

Product Manager, Second State
Dr. Michael Yuan is a maintainer of WasmEdge Runtime (a project under CNCF) and a co-founder of Second State. He is the author of 5 books on software engineering published by Addison-Wesley, Prentice-Hall, and O'Reilly. Michael is a long-time open-source developer and contributor... Read More →
Wednesday August 21, 2024 2:40pm - 3:15pm HKT
Level 1 | Hung Hom Room 3

2:40pm HKT

What Is the Future of Service Mesh? Sidecar or Sidecarless | 服务网格的未来是什么?边车还是无边车 - Zhonghu Xu, Huawei
Wednesday August 21, 2024 2:40pm - 3:15pm HKT
Istio is the most well known service mesh since 2017. From the long term, it has been evolving around sidecar, which is widely used by people. While some people claim the bad sideeffects of sidecar, istio has began to design a new sidecarless mode `Ambient` since 2022. Recently, after more than one years development, we made it beta in 1.22. In this presentation, we will also talk about what istio community has done recently, like Gateway API support, delta xDS status. At last, What is the future of istio then? Will we run all in Ambient mode or keep dual-wheel driving? Join us with the presentation, we will discuss the plans of the future,.

Istio自2017年以来是最知名的服务网格。从长期来看,它一直围绕着sidecar进行演进,这是被广泛使用的。虽然有些人声称sidecar会带来负面影响,但自2022年以来,istio已经开始设计一种新的无sidecar模式`Ambient`。最近,在经过一年多的开发后,我们在1.22版本中将其推出为beta版。 在这个演讲中,我们还将谈论istio社区最近所做的工作,比如Gateway API支持,delta xDS状态。 最后,那么istio的未来会是什么样子呢?我们会全部在Ambient模式下运行,还是保持双轮驱动?加入我们的演讲,我们将讨论未来的计划。
Speakers
avatar for Zhonghu Xu

Zhonghu Xu

Principle Engineer, huawei
Zhonghu is an open-source enthusiast and has focused on oss since 2017. In 2023, Zhonghu was awarded `Google Open Source Peer Bonus`. He has worked on istio for more than 6 years and has been a core Istio maintainer and the TOP 3 contributors. He has been continuously serving as Istio... Read More →
Wednesday August 21, 2024 2:40pm - 3:15pm HKT
Level 1 | Hung Hom Room 6

2:40pm HKT

Best Practice: Karmada & Istio Improve Workload & Traffic Resilience of Production Distributed Cloud | 最佳实践:Karmada和Istio提高生产分布式云的工作负载和流量弹性 - Chaomeng Zhang, Huawei
Wednesday August 21, 2024 2:40pm - 3:15pm HKT
The Distributed cloud offers better resilience by providing redundancy, scalability and flexibility, especially for cloud native applications. However the complexity of multi-cluster workload and traffic management in hybrid or multi-cloud environment brings huge challenges in practice, such as the number of overall multi-cluster workload instances serve for customer request decreased when some unhealthy ones isolated in case of failures. In this speech, Chaomeng introduces a production practice of Karmada and Istio work together to promote resilience of multi-cluster application. How Karmada and Istio policies configured in a centralized control plane controls both replica and traffic distribution across cluster automatically. In case of failures, how Istio’s failover acts to remove unhealthy endpoints from global load balancing pool, and how Karmada rebuild the according number of instance in other healthy clusters, ensure multi-cluster instances always meet the capacity design.

分布式云通过提供冗余、可伸缩性和灵活性,特别是对于云原生应用程序,提供了更好的弹性。然而,在混合或多云环境中的多集群工作负载和流量管理的复杂性在实践中带来了巨大挑战,例如当一些不健康的实例在故障情况下被隔离时,为客户请求提供服务的整体多集群工作负载实例数量减少。 在这次演讲中,Chaomeng介绍了Karmada和Istio共同推动多集群应用程序弹性的生产实践。Karmada和Istio策略如何在集中控制平面中配置,自动控制跨集群的副本和流量分发。在发生故障时,Istio的故障转移如何从全局负载均衡池中移除不健康的端点,以及Karmada如何在其他健康集群中重新构建相应数量的实例,确保多集群实例始终满足容量设计。
Speakers
avatar for Chaomeng Zhang

Chaomeng Zhang

Architect of UCS (HUAWEI Distributed Cloud Native), Huawei
Zhang Chaomeng is the architect of UCS (HUAWEI Distributed Cloud Native), has 9 years cloud computing related design and developing experience in HUAWEI Cloud, including service mesh, Kubernetes, micro service, cloud service catalog, big data, APM, cloud computing reliability and... Read More →
Wednesday August 21, 2024 2:40pm - 3:15pm HKT
Level 2 | Grand Ballroom 1-2
  KubeCon + CloudNativeCon Sessions, Connectivity

2:40pm HKT

Connecting the Dots: Towards a Unified Multi-Cluster AI/ML Experience | 连接点:走向统一的多集群AI/ML体验 - Qing Hao, RedHat & Chen Yu, Microsoft
Wednesday August 21, 2024 2:40pm - 3:15pm HKT
Today cloud-native infra is vital for AI/ML, administrative complexities and the growing demand for compute resources drive devs towards multi-cluster patterns. Batch scheduling projects, like Kueue, are valuable for efficient AI/ML training in a single Kubernetes cluster. Multi-cluster management platforms like OCM and Fleet simplify cluster management and provide advanced scheduling features. We hope to bridge the best of both worlds to simplify user operations and reduce confusion between different systems. In this talk, we will showcase that with the help of Sig Multi-Cluster's newly proposed API - ClusterProfile, combined with OCM, Fleet, and Kueue, to address these challenges. We will demonstrate that MultiKueue setup can be easily automated with the help of the ClusterProfile API; with a few tweaks, users can use OCM and Fleet's advanced scheduling features through MultiKueue to smart place AI/ML jobs across the clusters to maximize resource utilization like GPU to save costs.

今天,云原生基础设施对于人工智能/机器学习、管理复杂性以及对计算资源需求不断增长至关重要,这推动开发人员转向多集群模式。像Kueue这样的批处理调度项目对于在单个Kubernetes集群中高效进行人工智能/机器学习训练非常有价值。OCM和Fleet等多集群管理平台简化了集群管理,并提供了高级调度功能。我们希望将两者的优势结合起来,简化用户操作,减少不同系统之间的混乱。 在本次演讲中,我们将展示如何借助Sig Multi-Cluster最新提出的API - ClusterProfile,结合OCM、Fleet和Kueue来解决这些挑战。我们将演示如何通过ClusterProfile API轻松自动化MultiKueue设置;通过一些调整,用户可以利用OCM和Fleet的高级调度功能,通过MultiKueue智能地在集群之间放置人工智能/机器学习作业,以最大化资源利用率,如GPU,以节省成本。
Speakers
avatar for Qing Hao

Qing Hao

Senior Software Engineer, RedHat
Qing Hao is a senior software engineer at RedHat, where she works as the maintainer of Open Cluster Management. Qing has interest in solving complex problems in the multi-clusters areas, eg, application scheduling, and management components rolling upgrade. Prior to RedHat, she worked... Read More →
avatar for Chen Yu

Chen Yu

Senior Software Engineer, Microsoft
Chen Yu is a senior software engineer at Microsoft with a keen interest in cloud-native computing. He is currently working on Multi-Cluster Kubernetes and contributing to the Fleet project open-sourced by Azure Kubernetes Service.
Wednesday August 21, 2024 2:40pm - 3:15pm HKT
Level 1 | Hung Hom Room 7
  KubeCon + CloudNativeCon Sessions, AI + ML

2:40pm HKT

Scaling Kubernetes: Best Practices for Managing Large-Scale Batch Jobs with Spark and Argo Workflow | 扩展Kubernetes:管理大规模批处理作业的最佳实践与Spark和Argo工作流 - Yu Zhuang & Liu Jiaxu, Alibaba Cloud
Wednesday August 21, 2024 2:40pm - 3:15pm HKT
Are you managing large-scale batch jobs on Kubernetes, like data processing with Spark applications or genomics computing with Argo workflows? To complete these jobs promptly, a significant number of pods have to be scaled out/in quickly for parallel computation. It means a big pressure to Kubernetes control plane. In this talk, we will use Spark and Argo workflows as example, guiding you how to build a Kubernetes cluster which supports creating/deleting 20000 of pods frequently. Our focus will be on tuning the Kubernetes control plane, including optimizing the list-watch mechanism, service broadcasting, environment variable attachments, API server configurations. Additionally, we'll share some of the best practices for configuring Spark operator and Argo workflows controller.

您是否正在Kubernetes上管理大规模的批处理作业,比如使用Spark应用程序进行数据处理或使用Argo工作流进行基因组计算?为了及时完成这些作业,需要快速地扩展/缩减大量的Pod以进行并行计算,这给Kubernetes控制平面带来了巨大压力。 在本次演讲中,我们将以Spark和Argo工作流为例,指导您如何构建一个支持频繁创建/删除20000个Pod的Kubernetes集群。我们将重点放在调优Kubernetes控制平面上,包括优化列表-观察机制、服务广播、环境变量附加、API服务器配置等。此外,我们还将分享一些配置Spark操作员和Argo工作流控制器的最佳实践。
Speakers
avatar for Liu Jiaxu

Liu Jiaxu

Senior Engineer, Alibaba Cloud
Jiaxu Liu is a Senior Engineer on the Container Service Team at Alibaba Cloud. He specializes in observability enhancement and large-scale cluster management and optimization for Alibaba Cloud's container service offerings. Before joining Alibaba Cloud, he worked at Nokia as a Senior... Read More →
Wednesday August 21, 2024 2:40pm - 3:15pm HKT
Level 1 | Hung Hom Room 2

2:40pm HKT

The Zen and Learning from Project Open Governance to Corporate OSS Governance | 从项目开放治理到企业开源治理的禅意与学习 - Xu Wang, Ant Group
Wednesday August 21, 2024 2:40pm - 3:15pm HKT
As an Open Source veteran who've been working on secure container technology (Kata Containers), the speaker has been crafting Open Source governance and strategies for projects for years. The team joined Ant Group 5 years ago and was continuously focusing on Cloud Native and Trust technologies. In 2023, the speaker was appointed to assume the role of Vice President of Open Source Technical Oversight Committee for Ant Group.The TOC job requires setting up open source strategy and growth tactics, but now for a company with 25K employees and 13K engineers. It turned out that the experience leading a top level project was immensely valuable for the new position. In this session, we'll share first hand experiences for a tech leader to wear multiple hats of tech director, open source leader, and the go-to person for OSS strategies for a large corporation, and the learnings / reflections coming from the new challenges.

作为一位开源资深人士,演讲者一直致力于安全容器技术(Kata Containers),并多年来一直在为项目制定开源治理和战略。团队于5年前加入蚂蚁集团,一直专注于云原生和信任技术。在2023年,演讲者被任命为蚂蚁集团开源技术监督委员会副主席。TOC的工作需要制定开源战略和增长策略,但现在是为一个拥有25,000名员工和13,000名工程师的公司。事实证明,领导一个顶级项目的经验对新职位非常有价值。在这场演讲上,我们将分享一个技术领导者如何在大公司中扮演技术总监、开源领导者和开源战略的权威人士等多重角色的第一手经验,以及从新挑战中获得的经验和反思。
Speakers
avatar for Xu Wang

Xu Wang

Vice President of Ant Group Open Source Technical Committee, Ant Group
Xu joined Ant Group in 2019 and is in charge of container-based Cloud-Native infrastructure and the open-source related strategies of Ant Group. Xu is also a director of the Open Infrastructure Foundation (OIF) Board. Before joining Ant Group, Xu was the CTO and co-founder of hyper.sh... Read More →
Wednesday August 21, 2024 2:40pm - 3:15pm HKT
Level 1 | Hung Hom Room 5

2:45pm HKT

⚡ Lightning Talk: Rocket Power Your Kubernetes Career with Kubestronaut Program | ⚡ 闪电演讲: 用Kubestronaut计划提升您的Kubernetes职业生涯火力 - Giorgi Keratishvili, EPAM Systems
Wednesday August 21, 2024 2:45pm - 2:50pm HKT
Are you a person who wants to fly high? Conquer mountains of Kubernetes certifications then this talk is for you, Giorgi will share all details of kubestronaut program, what benefits does it gives to person and his certification journey as he holds all 5 and even more certificates from CNCF also he has been beta tester and exam developer some of them...

您是想要飞得更高的人吗?征服 Kubernetes 认证的高山?那么这个讲座适合您。Giorgi 将分享 kubestronaut 计划的所有细节,以及它对个人和他的认证之旅带来的好处。他拥有 CNCF 颁发的所有 5 个甚至更多证书,并且还曾担任其中一些证书的测试人员和考试开发人员...
Speakers
avatar for Giorgi Keratishvili

Giorgi Keratishvili

Lead System Engineer (DevOps), EPAM Systems
Giorgi has been in IT field a decade, during this period he has been exposed to majority fields of Development and Operation starting from bear metal infrastructure to higher level of automatization, beside working hour Giorgi is very actively participating in community He plays role... Read More →
Wednesday August 21, 2024 2:45pm - 2:50pm HKT
Level 1 | Hung Hom Room 1

2:50pm HKT

⚡ Lightning Talk: Running Native WebAssembly AI Applications Everywhere | ⚡ 闪电演讲: 在任何地方运行原生WebAssembly人工智能应用程序 - Tiejun Chen, VMware
Wednesday August 21, 2024 2:50pm - 2:55pm HKT
In recent years WASM has been one of the hottest topics in the world of computing due to its portability, small size, fast loading, and compatibility. And given these advantages, WebAssembly is an ideal technology based on sandbox schemes for modern applications including ML/AI. But beyond the browser, currently WebAssembly only can leverage CPU to accelerate ML/AI mostly. Here we offer a flexible way to make running ML/AI on WebAssembly over a variety of AI Accelerators by empowering WASM with a transparent backend interposer. With this, your native ML/AI WebAssembly workloads can seamlessly enjoy the underlying AI accelerators such as CPU, GPU, FPGA and so on, with best performance. During this presentation we also would like to show our latest implementation with demos to help users get direct insight of running ML/AI with WebAssembly on AI accelerators.

近年来,由于其可移植性、体积小、加载速度快和兼容性等优势,WASM已成为计算领域最热门的话题之一。鉴于这些优势,WebAssembly是基于沙箱方案的现代应用程序,包括ML/AI的理想技术。但除了浏览器之外,目前WebAssembly只能利用CPU来加速大部分ML/AI。在这里,我们提供了一种灵活的方式,通过为WASM赋予一个透明的后端插入器,使其能够在各种AI加速器上运行ML/AI。借助这一技术,您的本地ML/AI WebAssembly工作负载可以无缝地享受CPU、GPU、FPGA等底层AI加速器的最佳性能。在本次演示中,我们还将展示我们最新的实现,并通过演示帮助用户直观了解在AI加速器上运行ML/AI的WebAssembly。
Speakers
avatar for Tiejun Chen

Tiejun Chen

Sr. Technical Lead, VMware
Tiejun Chen was Sr. technical leader. He ever worked several tech companies such as VMware, Intel, Wind River Systems and so on, involved in - cloud native, edge computing, ML/AI, RISC-V, WebAssembly, etc. He ever made many presentations at AI.Dev NA 2023, kubecon China 2021, Kube... Read More →
Wednesday August 21, 2024 2:50pm - 2:55pm HKT
Level 1 | Hung Hom Room 1

2:55pm HKT

⚡ Lightning Talk: Tips and Tricks to (Right) Size Your Kubernetes Cluster for Efficiency and Cost Saving | ⚡ 闪电演讲: 为了提高效率和节约成本,调整Kubernetes集群大小的技巧和窍门 - Daniele Polencic, Learnk8s
Wednesday August 21, 2024 2:55pm - 3:00pm HKT
In this session, you will learn how Kubernetes allocates resources in worker nodes and how you can obtain the most out of them by choosing the right kind of limits and requests for your workloads. You will cover some practical tips to allocate the right number of nodes and resources to your cluster: - Should you have larger or smaller nodes? - How reservation affects efficiency and cost savings? - How to "defrag" your cluster to optimize allocations And more.

在这场演讲中,您将学习Kubernetes如何在工作节点中分配资源,以及如何通过为工作负载选择正确的限制和请求来充分利用它们。 您将学习一些实用的技巧,来为您的集群分配正确数量的节点和资源: - 您应该选择更大还是更小的节点? - 预留资源如何影响效率和节约成本? - 如何“整理”您的集群以优化分配 等等。
Speakers
avatar for Daniele Polencic

Daniele Polencic

Instructor, Learnk8s
Daniele teaches containers and Kubernetes at Learnk8s. Daniele is a certified Kubernetes administrator by the Linux Foundation. In the last decade, Daniele trained developers for companies in the e-commerce, finance and public sector.
Wednesday August 21, 2024 2:55pm - 3:00pm HKT
Level 1 | Hung Hom Room 1

3:00pm HKT

⚡ Lightning Talk: Use Keycloak to Build an Authentication System for Cloud-Native Application | ⚡ 闪电演讲: 使用Keycloak为云原生应用构建身份验证系统 - Yiting Jiang, DaoCloud
Wednesday August 21, 2024 3:00pm - 3:05pm HKT
The identity authentication mechanism is the most basic function for applications, especially for the enterprise-level management system. They usually need to implement functions such as Identity management, single sign-on, and security policy settings. Keycloak is an open source identity and access management (IAM) solution, it can be easily deployed on Kubernetes, and provide applications with features such as centralized authentication. This speech will explain how our cloud native management system makes full use of the powerful and comprehensive features of Keycloak to implement enterprise-level identity and security access management functions. In order to meet our own requirement, we also created some Keycloak plugins to extend its IDP and Event functions, which can be a good example to learn when customization is needed.

身份认证机制是应用程序最基本的功能,尤其对于企业级管理系统而言。它们通常需要实现身份管理、单点登录和安全策略设置等功能。Keycloak 是一个开源的身份和访问管理(IAM)解决方案,可以轻松部署在 Kubernetes 上,为应用程序提供集中认证等功能。本次演讲将解释我们的云原生管理系统如何充分利用 Keycloak 强大而全面的功能来实现企业级身份和安全访问管理功能。为了满足我们的需求,我们还创建了一些 Keycloak 插件来扩展其身份提供者(IDP)和事件功能,当需要定制化时,这些插件是很好的学习例子。
Speakers
avatar for Yiting Jiang

Yiting Jiang

Dev Manager, DaoCloud
Graduated at Tong ji University with Master degree, majored in Computer Software and Theory. Worked in EMC, VMWare and DellEMC Companies before.
Wednesday August 21, 2024 3:00pm - 3:05pm HKT
Level 1 | Hung Hom Room 1
  ⚡ Lightning Talks | ⚡ 闪电演讲, Security

3:00pm HKT

EmpowerUs | 女性赋能交流会
Wednesday August 21, 2024 3:00pm - 4:00pm HKT
Attendees who identify as women, non-binary individuals, or allies at KubeCon + CloudNativeCon + Open Source Summit + AI_dev are invited to join this networking break to have open discussions with fellow attendees about challenge, leadership innovation, and empowerment in our fast-growing ecosystem.

我们很高兴邀请在KubeCon + CloudNativeCon + Open Source Summit + AI_dev的参会者中认同为女性、非二元性别个体或盟友的人士参加这个交流活动,与其他参会者一起开放讨论我们快速发展的生态系统中的挑战、领导创新和赋权。

Wednesday August 21, 2024 3:00pm - 4:00pm HKT
Level 1 | Hung Hom Room 4

3:15pm HKT

Coffee Break ☕ | 茶歇
Wednesday August 21, 2024 3:15pm - 3:35pm HKT
Wednesday August 21, 2024 3:15pm - 3:35pm HKT
Level 2 | Grand Ballroom 3-4

3:35pm HKT

Sit Back and Relax with Fault Awareness and Robust Instant Recovery for Large Scale AI Workloads | 坐和放宽,了解大规模 AI 负载场景下的故障感知和健壮的快速故障恢复 - Fanshi Zhang & Kebe Liu, DaoCloud
Wednesday August 21, 2024 3:35pm - 4:10pm HKT
The fault tolerance during train, fine-tuning, and even inferencing is crucial to modern AI workloads when it happens on large scale, with loads of GPU clusters. For training and fine-tuning tasks, failure of GPUs, storages, any hardware issues often cause the extending the training time to weeks and even months significantly. For inferencing, when massive loads of requests income, if one of the inferencing servers went faulty, we need a policy and scheduler to perform mitigation to transfer the workloads fast and efficiently. In this talk, We will introduce a series of mechanism we have designed to help Kubernetes clusters and workloads itself to locate, diagnostic the root cause, schedule and perform mitigation when it comes to any of hardware or CUDA API call failures to reduce the overall operating challenges. But the possibilities will not stop here, the fault awareness and mitigation scheduler will help any of the workloads to mitigate during failures.

在大规模GPU集群上进行训练、微调甚至推理时的容错性对现代人工智能工作负载至关重要。 对于训练和微调任务,GPU、存储等硬件故障经常会导致训练时间延长至数周甚至数月。对于推理任务,当大量请求涌入时,如果其中一个推理服务器出现故障,我们需要一种策略和调度程序来快速高效地转移工作负载。 在本次演讲中,我们将介绍一系列我们设计的机制,帮助Kubernetes集群和工作负载本身定位、诊断根本原因,并在硬件或CUDA API调用失败时进行调度和执行缓解,以减少整体运营挑战。但可能性不会止步于此,故障感知和缓解调度程序将帮助任何工作负载在故障期间进行缓解。
Speakers
avatar for Kebe Liu

Kebe Liu

Senior software engineer, DaoCloud
Member of Istio Steering Committee, focused on cloud-native and Istio, eBPF and other areas in recent years. Founder of Merbridge project.
avatar for Neko Ayaka

Neko Ayaka

Software Engineer, DaoCloud
Cloud native developer, AI researcher, Gopher with 5 years of experience in loads of development fields across AI, data science, backend, frontend. Co-founder of https://github.com/nolebase
Wednesday August 21, 2024 3:35pm - 4:10pm HKT
Level 1 | Hung Hom Room 3

3:35pm HKT

OpenTelemetry Community Update | OpenTelemetry社区更新 - Zihao Rao & Huxing Zhang, Alibaba Cloud
Wednesday August 21, 2024 3:35pm - 4:10pm HKT
OpenTelemetry has emerged as the de facto standard for observability, gaining significant industry adoption. This talk delves into two key aspects: 1. Latest OpenTelemetry community updates: We'll explore latest within the OpenTelemetry community, presented by a community contributor. 2. Alibaba Cloud's Journey with OpenTelemetry adoption: We'll share Alibaba Cloud's experience adopting OpenTelemetry over the past several years. By actively engaging with the community, we've leveraged the community power to build full-stack observability capabilities based on OpenTelemetry. This includes: - Language-specific instrumentation for Java, Go, and Python - OpenTelemetry collectors - Continuous profiling - Observability for Large Language Model (LLM) based applications

OpenTelemetry已成为可观察性的事实标准,获得了行业的广泛采用。本次讨论涉及两个关键方面: 1. 最新的OpenTelemetry社区更新:我们将探讨OpenTelemetry社区的最新动态,由社区贡献者介绍。 2. 阿里云与OpenTelemetry采用的历程:我们将分享阿里云在过去几年中采用OpenTelemetry的经验。通过积极参与社区,我们利用社区力量构建了基于OpenTelemetry的全栈观测能力。这包括: - 针对Java、Go和Python的特定语言工具 - OpenTelemetry收集器 - 持续性能分析 - 针对基于大型语言模型(LLM)的应用的可观测性。
Speakers
avatar for Huxing Zhang

Huxing Zhang

Staff Engineer, Alibaba Cloud
Huxing Zhang is a Staff Engineer of Alibaba Cloud working on observability. He is also member of Apache Software Foundation, PMC member of Apache Tomcat and Apache Dubbo. He speaks at ApacheCon, OTel Community Days, etc.
avatar for Zihao Rao

Zihao Rao

Software Engineer, Alibaba Cloud
Zihao is a software engineer at Alibaba Cloud. Over the past few years, he has participated in several well-known open source projects, he is steering committee member of Spring Cloud Alibaba project, and is a triager for OpenTelemetry Java Instrumentation now.
Wednesday August 21, 2024 3:35pm - 4:10pm HKT
Level 1 | Hung Hom Room 6

3:35pm HKT

How Fast Can Your Model Composition Run in Serverless Inference? | 您的模型组合在无服务器推理中可以运行多快? - Fog Dong, BentoML & Wenbo Qi, Ant Group
Wednesday August 21, 2024 3:35pm - 4:10pm HKT
Are you struggling with slow deployment times, high operational costs, or scalability issues when serving your ML models? Now, imagine the added complexity when typical AI apps require not just one, but an interconnected suite of models. In this session, discover how the integration of BentoML with Dragonfly effectively addresses these challenges, transforming the landscape of multi-model composition and inference within serverless Kubernetes envs. Join the co-presentation by the BentoML and Dragonfly communities to explore a compelling case study: a RAG app that combines 3 models—LLM, embedding, and OCR. Learn how our framework not only packages these diverse models efficiently but also utilizes Dragonfly's innovative P2P network for swift distribution. We'll further delve into how other open-source technologies like JuiceFS and VLLM have enabled us to achieve remarkable deployment times of just 40 seconds and establish a scalable blueprint for multi-model composition deployments.

您是否在为机器学习模型的部署时间慢、运营成本高或可扩展性问题而苦恼?现在,想象一下当典型的人工智能应用程序不仅需要一个模型,而是一个相互连接的模型套件时所增加的复杂性。在本场演讲中,了解BentoML与Dragonfly的集成如何有效解决这些挑战,改变了无服务器Kubernetes环境中多模型组合和推理的格局。 加入BentoML和Dragonfly社区的联合演示,探索一个引人注目的案例研究:一个结合了LLM、嵌入和OCR三个模型的RAG应用程序。了解我们的框架不仅高效打包这些多样化的模型,还利用Dragonfly创新的P2P网络进行快速分发。我们还将深入探讨其他开源技术,如JuiceFS和VLLM,如何帮助我们实现仅需40秒的部署时间,并为多模型组合部署建立可扩展的蓝图。
Speakers
avatar for Wenbo Qi

Wenbo Qi

Senior Software Engineer, Ant Group
Wenbo Qi is a software engineer at Ant Group working on Dragonfly. He is a maintainer of the Dragonfly. He hopes to do some positive contributions to open source software and believe that fear springs from ignorance.
avatar for Fog Dong

Fog Dong

Senior Software Engineer, BentoML
Fog Dong, a Senior Engineer at BentoML, KubeVela maintainer, CNCF Ambassador, and LFAPAC Evangelist, has a rich background in cloud native. Previously instrumental in developing Alibaba's large-scale Serverless Application Engine workflows and Bytedance's cloud-native CI/CD platform... Read More →
Wednesday August 21, 2024 3:35pm - 4:10pm HKT
Level 1 | Hung Hom Room 7
  KubeCon + CloudNativeCon Sessions, AI + ML

3:35pm HKT

Implementing Seamless Connectivity and Service Governance in Multi Kubernetes Cluster with ZTM | 在多个Kubernetes集群中使用ZTM实现无缝连接和服务治理 - Xiaohui Zhang, Flomesh
Wednesday August 21, 2024 3:35pm - 4:10pm HKT
In the evolving cloud-native ecosystem, Kubernetes is vital for microservices. As enterprises adopt multi-cluster Kubernetes setups, securely managing cross-cluster communications becomes challenging due to the limitations of traditional gateways and Ingress solutions. This session explores how ZTM (Zero Trusted Mesh) acts as a bridge across K8s clusters, bypassing traditional gateways and network constraints, thus ensuring zero exposure and boosting security. ZTM uses an HTTP/2-based tunneling mechanism with end-to-end encryption, minimizing public exposure and securing data during transmission. Its design enables quick deployment of cross-cluster communications without altering existing networks or applications, easing management. Furthermore, ZTM integrates with service mesh technologies to provide a secure framework for microservices, supporting service discovery, load balancing, and advanced routing policies, allowing flexible and secure cross-cluster service management.

在不断发展的云原生生态系统中,Kubernetes 对于微服务至关重要。随着企业采用多集群 Kubernetes 设置,由于传统网关和入口解决方案的限制,安全地管理跨集群通信变得具有挑战性。 本场演讲探讨了 ZTM(Zero Trusted Mesh)如何作为跨 K8s 集群的桥梁,绕过传统网关和网络限制,从而确保零暴露并提升安全性。 ZTM 使用基于 HTTP/2 的隧道机制进行端到端加密,最大程度减少公开暴露并在传输过程中保护数据安全。其设计能够快速部署跨集群通信,而无需改变现有网络或应用程序,简化管理。 此外,ZTM 还与服务网格技术集成,为微服务提供安全框架,支持服务发现、负载均衡和高级路由策略,实现灵活且安全的跨集群服务管理。
Speakers
avatar for AddoZhang

AddoZhang

Cloud Native Architect, Flomesh
Senior programmer, LFAPAC open source evangelist, CNCF Ambassador, Microsoft MVP, author of the WeChat public account "云原生指北". Years of practical experience in microservices and cloud-native, the main work involves microservices, containers, Kubernetes, DevOps, etc.
Wednesday August 21, 2024 3:35pm - 4:10pm HKT
Level 1 | Hung Hom Room 1
  KubeCon + CloudNativeCon Sessions, Connectivity

3:35pm HKT

Strengthening Container Security: A Collaborative Journey | 加强容器安全性:共同的旅程 - Yi Zha, Microsoft & Beltran Rueda Borrego, VMware (part of Broadcom)
Wednesday August 21, 2024 3:35pm - 4:10pm HKT
Ensuring the integrity and authenticity of container images is critical in securing the container supply chain. As developers are increasingly using images from external sources, questions arise: How can we verify these images originate from trusted vendors? How do we guarantee they are not altered since their creation? In this session, you will learn from the real-world experience of VMware Bitnami, who partnered with the Notary Project community to implement image signing and verification. Bitnami will show you how they use Notary Project signatures to ensure the integrity and authenticity of images from Docker Hub. Don't miss this opportunity to gain practical insights into container security with Notary Project within your CI/CD pipelines and during Kubernetes deployments! Additionally, we’ll explore future enhancements, including attestation support, empowering users to verify images from various perspectives such as provenance, vulnerability assessment, and software compliance.

确保容器镜像的完整性和真实性对于保护容器供应链至关重要。随着开发人员越来越多地使用来自外部来源的镜像,一些问题浮出水面:我们如何验证这些镜像来自可信赖的供应商?我们如何确保它们自创建以来没有被篡改?在这场演讲中,您将从VMware Bitnami的实际经验中学习,他们与Notary项目社区合作实施了镜像签名和验证。Bitnami将向您展示他们如何使用Notary项目签名来确保来自Docker Hub的镜像的完整性和真实性。不要错过这个机会,在您的CI/CD流水线和Kubernetes部署中通过Notary项目获得容器安全的实用见解!此外,我们将探讨未来的增强功能,包括证明支持,使用户能够从各种角度验证镜像,如来源、漏洞评估和软件合规性。
Speakers
avatar for Yi Zha

Yi Zha

Senior Product Manager, Microsoft
Yi is a senior product manager in Azure Container Upstream team at Microsoft and is responsible for container supply chain security for Azure services and customers. He is also a maintainer of CNCF project Notary, and a contributor of CNCF ORAS and OSS project Ratify.
Wednesday August 21, 2024 3:35pm - 4:10pm HKT
Level 2 | Grand Ballroom 1-2
  KubeCon + CloudNativeCon Sessions, Security

3:35pm HKT

Tackling Operational Time-to-Market Decelerators in AI/ML Projects | 应对人工智能/机器学习项目中的运营时间市场减速器 - Adrian Matei & Andreea Munteanu, Canonical
Wednesday August 21, 2024 3:35pm - 4:10pm HKT
In the competitive AI market, Time To Market (TTM) is crucial for success. Ensuring secure, scalable, and compliant ML infrastructures often slows TTM due to the complexities of updates, patches, monitoring, and security enforcement. This leads to decreases in ROI, profitability, reproducibility, and competitive edge. To address this, companies can engage Managed Service Providers (MSPs) to offload operational burdens and focus on innovation, yet selecting the right MSP requires consideration of expertise, automation capabilities, and compliance adherence. This presentation explores the AI operational landscape, highlighting indicators and challenges in MSP collaboration. We will focus on the management of open source tools like Kubeflow and MLflow across hybrid and multicloud environments. By understanding operational excellence in AI and available options to achieve it, attendees will gain insights into choosing an approach that aligns with their greater objectives.

在竞争激烈的人工智能市场中,上市时间对于成功至关重要。确保安全、可扩展和合规的机器学习基础设施通常会因更新、补丁、监控和安全执行的复杂性而减慢上市时间,导致投资回报率、盈利能力、可复制性和竞争优势下降。为了解决这个问题,公司可以与托管服务提供商(MSPs)合作,减轻运营负担,专注于创新,但选择合适的MSP需要考虑专业知识、自动化能力和合规性。 本次演讲探讨了人工智能运营领域,重点介绍了MSP合作中的指标和挑战。我们将重点关注在混合和多云环境中管理开源工具如Kubeflow和MLflow。通过了解人工智能运营卓越性以及实现卓越性的可用选项,与会者将获得选择与其更大目标一致的方法的见解。
Speakers
avatar for Andreea Munteanu

Andreea Munteanu

AI Product Manager, Canonical
Andreea Munteanu is a Product Manager at Canonical, leading the MLOps area. With a background in Data Science in various industries, she used AI techniques to enable enterprises to benefit from their initiatives and make data-driven decisions. Nowadays, Andreea is looking to help... Read More →
avatar for Adrian Matei

Adrian Matei

Product Manager, Canonical
With a degree in Information Management for Business, Adrian is now guiding Canonical’s open-source operational management toolset as Product Manager. He has been working in open source operations for the past two years, having previously accumulated experience in technology consulting... Read More →
Wednesday August 21, 2024 3:35pm - 4:10pm HKT
Level 1 | Hung Hom Room 2

3:35pm HKT

Session to be Announced | 会议将很快公布 - Greg Kroah-Hartman, Kernel Maintainer & Linux Fellow
Wednesday August 21, 2024 3:35pm - 4:10pm HKT
Speakers
avatar for Greg Kroah-Hartman

Greg Kroah-Hartman

Fellow, Linux Foundation
Greg Kroah-Hartman is among a distinguished group of software developers who maintain Linux at the kernel level. In his role as a Linux Foundation Fellow, he continues his work as the maintainer for the Linux stable kernel branch and a variety of subsystems while working in a fully... Read More →
Wednesday August 21, 2024 3:35pm - 4:10pm HKT
Level 1 | Hung Hom Room 5

4:25pm HKT

Simplify AI Infrastructure with Kubernetes Operators | 使用Kubernetes Operators简化AI基础设施 - Ganeshkumar Ashokavardhanan, Microsoft & Tariq Ibrahim US, NVIDIA
Wednesday August 21, 2024 4:25pm - 5:00pm HKT
ML applications often require specialized hardware and additional configuration to run efficiently and reliably on Kubernetes. However, managing the cluster lifecycle, the diversity and complexity of hardware configuration across nodes can be challenging. How can we simplify and automate this process to ensure a smooth experience for kubernetes users? Kubernetes Operators offer a great solution. In this session, we will go over operators and demonstrate how they can help automate the installation, configuration, and lifecycle management of AI-ready infra end to end from cluster provisioning and k8s node configuration to deep learning model deployments. We will demo a fine-tuning LLM workload, to showcase how existing operators in the ecosystem such as Cluster API Operator, GPU Operator, Network Operator, and the Kubernetes AI Toolchain Operator, can be used to simplify the infra. Finally, we will discuss challenges and best practices of using operators in production.

ML 应用通常需要专门的硬件和额外的配置才能在 Kubernetes 上高效可靠地运行。然而,管理集群生命周期、节点间硬件配置的多样性和复杂性可能具有挑战性。我们如何简化和自动化这个过程,以确保 Kubernetes 用户的顺畅体验? Kubernetes 运算符提供了一个很好的解决方案。在本场演讲中,我们将介绍运算符,并演示它们如何帮助自动化 AI-ready 基础架构的安装、配置和生命周期管理,从集群提供和 k8s 节点配置到深度学习模型部署。我们将演示一个微调 LLM 工作负载,展示生态系统中现有运算符(如 Cluster API Operator、GPU Operator、Network Operator 和 Kubernetes AI Toolchain Operator)如何简化基础架构。最后,我们将讨论在生产环境中使用运算符的挑战和最佳实践。
Speakers
avatar for Ganeshkumar Ashokavardhanan

Ganeshkumar Ashokavardhanan

Software Engineer, Microsoft
Ganesh is a Software Engineer on the Azure Kubernetes Service team at Microsoft, working on node lifecycle, and is the lead for the GPU workload experience on this kubernetes platform. He collaborates with partners in the ecosystem like NVIDIA to support operator models for machine... Read More →
avatar for Tariq Ibrahim US

Tariq Ibrahim US

Senior Cloud Platform Engineer, NVIDIA
Tariq Ibrahim is a Senior Cloud Platform Engineer on the Cloud Native team at NVIDIA where he works on enabling GPUs in containers and Kubernetes. He is a maintainer of the NVIDIA GPU Operator. He has also contributed to several cloud native OSS projects like kube-state-metrics, Istio... Read More →
Wednesday August 21, 2024 4:25pm - 5:00pm HKT
Level 1 | Hung Hom Room 3

4:25pm HKT

XRegistry - Looking Beyond CloudEvents | xRegistry - 超越CloudEvents - Leo Li, Red Hat
Wednesday August 21, 2024 4:25pm - 5:00pm HKT
CloudEvents helps in the delivery of events by standardizing where common event metadata can be found in the messages carrying those events without the need to understand the schema of each event. But discovering which endpoints support those events, how to communicate with them, and finding the schema of the messages carrying those events can be challenging. This is where xRegistry can be used. xRegisty defines a core set of interoperable APIs for a generic "registry" that can be used to persist and query its contents to help discover resources and their metadata. On top of this extensible base registry model we are developing 3 domain specific registries: Endpoint, Message and Schema registries - specifically aimed at enabling the automation, tooling and code generation often needed in distributed systems development. In this session you will learn about CloudEvents, xRegistry and how we're trying to help users be more productive in an event-driven world.

CloudEvents通过标准化事件元数据在携带这些事件的消息中的位置,帮助传递事件,而无需了解每个事件的架构。但是,发现哪些端点支持这些事件,如何与它们通信,以及找到携带这些事件的消息的架构可能具有挑战性。这就是xRegistry可以使用的地方。xRegistry定义了一组用于通用“注册表”的可互操作API,可用于持久化和查询其内容,以帮助发现资源及其元数据。在这个可扩展的基础注册表模型之上,我们正在开发3个特定领域的注册表:端点、消息和架构注册表 - 专门旨在实现分布式系统开发中经常需要的自动化、工具和代码生成。在本场演讲中,您将了解CloudEvents、xRegistry以及我们如何努力帮助用户在事件驱动的世界中更加高效。
Speakers
avatar for Leo Li

Leo Li

Software Engineer Intern, Red Hat
Leo is a passionate Knative Eventing Maintainer and the technical lead of Knative UX Working Group. He developed a comprehensive Knative sample app, co-created the “Intro to Open Source” learning path on KubeByExample, and implemented key features like HTTPS support for the Kafka... Read More →
Wednesday August 21, 2024 4:25pm - 5:00pm HKT
Level 1 | Hung Hom Room 6

4:25pm HKT

Istio and Modern API Gateways: Navigating the Future of Service Meshes | Istio和现代API网关:引领服务网格的未来 - Jimmy Song & Jianpeng He, Tetrate; Jiaqi Zhang, Alibaba Cloud; Jintao Zhang, Kong Inc.; Xunzhuo Liu, Tencent
Wednesday August 21, 2024 4:25pm - 5:00pm HKT
Join our esteemed panel of experts as they delve into the latest advancements and integrations in the world of Istio and API gateways. This discussion, led by Jimmy Song from Tetrate and founder of the China Cloud Native Community, will feature insights from core contributors and thought leaders including Jianpeng He (Tetrate), Jintao Zhang (Kong), Xunzhuo Liu (Tencent) and Zhang Jiaqi (Alibaba Cloud). The panel will explore Istio's recent developments such as Ambient Mesh, sidecar-less architectures, and the application of eBPF, along with the evolving role of Envoy Gateway. Participants will gain an in-depth understanding of how API gateways are blending with service meshes to create more dynamic, efficient, and secure cloud-native environments.

加入我们尊贵的专家小组,他们将深入探讨 Istio 和 API 网关领域的最新进展和集成。这次讨论由 Tetrate 的 Jimmy Song 主持,他是中国云原生社区的创始人,将邀请核心贡献者和思想领袖,包括 Jianpeng He(Tetrate)、Jintao Zhang(Kong)、Xunzhuo Liu(腾讯)和张佳琦(阿里云)分享见解。小组将探讨 Istio 的最新发展,如环境网格、无边车架构以及 eBPF 的应用,以及 Envoy 网关的不断演变角色。参与者将深入了解 API 网关如何与服务网格融合,创造更具动态、高效和安全的云原生环境。
Speakers
avatar for Jintao Zhang

Jintao Zhang

Sr. SE, Kong
Jintao Zhang is a Microsoft MVP, CNCF Ambassador, Apache PMC, and Kubernetes Ingress-NGINX maintainer, he is good at cloud-native technology and Azure technology stack. He worked for Kong Inc.
avatar for Jimmy Song

Jimmy Song

Developer Advocate, Tetrate
Jimmy Song is a developer advocate at Tetrate, CNCF Ambassador, Cloud Native Community founder. He is an outstanding translator, author, and producer of PHEI. Early adopters and evangelists of Kubernetes and Istio. Previously, he worked at iFlytek, TalkingData, and Ant Group.
avatar for Xunzhuo

Xunzhuo

Software Engineer, Tencent
Xunzhuo Liu, Software Engineer working at Tencent Kubernetes Engine Team. He is an Open Source Enthusiast, focusing on API Gateway, Service Mesh, and Kubernetes Networking. He is the steering committee member, core maintainer of Envoy Gateway, also maintaining a couple of CNCF projects... Read More →
avatar for Jianpeng He

Jianpeng He

Software Engineer, Tetrate
Jianpeng is a core maintainer of istio, co-leader of Extensions and Telemetry wroking group, has been working on Istio for almost 3 years, he is the maintainer of Envoy Gateway.
avatar for Jiaqi Zhang

Jiaqi Zhang

software engineer, Alibaba Cloud
Zhang Jiaqi, working on Alibaba Cloud Service Mesh as software engineer, , focusing on traffic management and telemetry related fields, after graduated from the School of Computer Science, Peking University. Participated in several software computer academic conferences, and keen... Read More →
Wednesday August 21, 2024 4:25pm - 5:00pm HKT
Level 1 | Hung Hom Room 1
  KubeCon + CloudNativeCon Sessions, Connectivity

4:25pm HKT

Leverage Topology Modeling and Topology-Aware Scheduling to Accelerate LLM Training | 利用拓扑建模和拓扑感知调度加速LLM训练 - Yang Wang, Huawei
Wednesday August 21, 2024 4:25pm - 5:00pm HKT
In the LLM training and inference era, the bottle neck has changed from computing to network. A lot of high throughput and low latency inter-connect technology are widely used, e.g. nvlink, nvswitch to build hyper computer such as nvidia super pod, google multi-slice, AWS placement group. However, Kubernetes has net yet addressed topology awareness efficiently, resulting in low performance when sub-optimal resources are provisioned. This talk will explore the inter-node communication and resources within node inter-connect. Also analyze how these two toplogical factors impacts on the runtime performance of AI workload especially for large language model training. The talk will cover: - How to model the topology on underlying resources like NUMA, Rack, Super Pod, Hyper Computer - How to make scheduler to aware of topology and make the best scheduling - How to coordinate topology-aware scheduling with DRA on node

在LLM训练和推断时代,瓶颈已经从计算转变为网络。许多高吞吐量和低延迟的互连技术被广泛使用,例如nvlink、nvswitch用于构建超级计算机,如nvidia超级Pod、谷歌多片、AWS放置组。 然而,Kubernetes尚未有效地解决拓扑意识问题,导致在资源配置不佳时性能较低。 本次演讲将探讨节点间通信和节点内部资源的互连。还将分析这两个拓扑因素如何影响AI工作负载的运行性能,特别是对于大型语言模型训练。 演讲内容包括: - 如何对底层资源(如NUMA、机架、超级计算机)建模拓扑 - 如何使调度程序意识到拓扑并进行最佳调度 - 如何协调拓扑感知调度与节点上的DRA
Speakers
avatar for Yang Wang

Yang Wang

Senior engineer and maintainer of Volcano, Huawei Cloud Technologies Co., LTD
Volcano maintainer and speaker at KCD and GOTC. Focus on cloud native scheduling and multi-cluster managment.
Wednesday August 21, 2024 4:25pm - 5:00pm HKT
Level 1 | Hung Hom Room 7
  KubeCon + CloudNativeCon Sessions, AI + ML

4:25pm HKT

Staying Ahead of Fast-Moving Attackers | 保持领先于快速移动的攻击者 - Aizhamal Nurmamat kyzy, Sysdig
Wednesday August 21, 2024 4:25pm - 5:00pm HKT
How to find the right balance between convenience, operational efficiency, and a strong security policy in a world of ephemeral containers? And how can we ensure security at a time when Advanced Persistent Threats (APTs) are more prevalent? In this talk we will present the latest Cloud Native Security & Usage Report findings on critical vulnerabilities inherent in today’s container security practices. We will also demonstrate how a compromised, short-lived container can be an insidious security risk, and what we can do to detect and mitigate those risks in real time using cloud native open source tools.

在一个短暂容器世界中,如何在便利性、运营效率和强大安全政策之间找到合适的平衡?在APT(高级持续性威胁)更加普遍的时代,我们如何确保安全? 在这次演讲中,我们将介绍最新的云原生安全和使用报告发现,揭示当今容器安全实践中存在的关键漏洞。 我们还将演示一个被 compromise 的短暂容器如何成为一个隐蔽的安全风险,以及我们如何使用云原生开源工具实时检测和减轻这些风险。
Speakers
avatar for Aizhamal Nurmamat kyzy

Aizhamal Nurmamat kyzy

Director, DevRel, Sysdig
Aizhamal is a Director of DevRel at Sysdig where she focuses on education around security and open source. Previously she worked at Google's OSPO where she helped build open source communities in cloud native and data analytics ecosystems.
Wednesday August 21, 2024 4:25pm - 5:00pm HKT
Level 2 | Grand Ballroom 1-2
  KubeCon + CloudNativeCon Sessions, Security

4:25pm HKT

Unleashing the Power of Cluster API: Extensibility and Customization | 释放Cluster API的力量:可扩展性和定制化 - Zain Malik, CityStorageSystems & Nibir Bora, Startup
Wednesday August 21, 2024 4:25pm - 5:00pm HKT
Cluster API, designed with extensibility at its core, has revolutionized Kubernetes cluster management. Its open and pluggable architecture empowers providers to implement custom solutions tailored to their unique requirements. In this session, we will explore how Cluster API's extension-by-design philosophy has opened new horizons for organizations seeking to create bespoke Kubernetes clusters. Managing Kubernetes clusters at scale presents unique operational challenges that cannot be tamed with manual operations. Through real-world examples and lessons learned, we will demonstrate how Cluster API's flexibility allows for the integration of diverse infrastructure providers and the implementation of organization-specific customizations. Attendees will gain insights into best practices for extending Cluster API, including developing custom controllers, integrating third-party tools, and creating bespoke workflows.

Cluster API是以可扩展性为核心设计的,已经彻底改变了Kubernetes集群管理。其开放和可插拔的架构赋予提供者实施定制解决方案的能力,以满足其独特需求。在本场演讲中,我们将探讨Cluster API的“通过设计进行扩展”的理念如何为寻求创建定制化Kubernetes集群的组织开辟了新的视野。 在规模化管理Kubernetes集群时,会面临无法通过手动操作解决的独特运营挑战。 通过现实世界的例子和经验教训,我们将演示Cluster API的灵活性如何允许集成各种基础设施提供者,并实施组织特定的定制化。与会者将获得有关扩展Cluster API的最佳实践的见解,包括开发自定义控制器、集成第三方工具和创建定制工作流程。
Speakers
avatar for Zain Malik

Zain Malik

Staff Software Engineer, CityStorageSystems
Zain Malik serves as a tech lead in the compute team for a startup, where he has significantly contributed to projects related to cost saving and reliability. And help mature cluster lifecycle management. Before this role, Zain was a product owner and staff software engineer in the... Read More →
avatar for Nibir Bora

Nibir Bora

Engineering Manager, Startup
Nibir is a Engineering Manager in charge of Core Infrastructure at a Stealth Startup, where he is responsible for the company's Kubernetes infrastructure running 100s of clusters globally.
Wednesday August 21, 2024 4:25pm - 5:00pm HKT
Level 1 | Hung Hom Room 2
  KubeCon + CloudNativeCon Sessions, Operations + Performance

4:25pm HKT

Scaling Open Source Impact: FOSSASIA's Journey from Bootstrap to Educating 300,000 Developers | 扩大开源影响力:FOSSASIA从初创到教育30万开发者的旅程 - Hong Phuc Dang, FOSSASIA
Wednesday August 21, 2024 4:25pm - 5:00pm HKT
Hong Phuc Dang and Mario Behling will share FOSSASIA's journey from humble beginnings to educating over 300,000 developers in Asia. Learn how FOSSASIA scaled open-source education, engaged communities, & developed pioneering projects. Discover FOSSASIA's approach to automation and technological solutions, which streamlined operations long before the low-code movement. They'll spotlight projects like Eventyay & SUSI.AI, showcasing pioneering yet challenging endeavors. Learn about FOSSASIA's event organization best practices and their strategy of involving non-tech students in programs. Gain insights applicable beyond open source, impacting education and business operations. This session offers valuable knowledge for educators, open-source enthusiasts, developers, & business professionals. Whether you aim to expand projects or infuse startups with fresh ideas, join us to learn how a pragmatic open-source strategy can revolutionize organizations and empower tech pioneers and startups.

洪福·邓和马里奥·贝林将分享FOSSASIA从起步阶段到在亚洲教育超过30万开发人员的旅程。了解FOSSASIA如何扩展开源教育,参与社区,并开发开创性项目。 探索FOSSASIA的自动化和技术解决方案,这些解决方案在低代码运动之前就已经简化了运营。他们将重点介绍像Eventyay和SUSI.AI这样的项目,展示开创性而具有挑战性的努力。 了解FOSSASIA的活动组织最佳实践以及他们在项目中吸引非技术学生的策略。获得超越开源的见解,影响教育和业务运营。 这场演讲为教育工作者、开源爱好者、开发人员和商业专业人士提供宝贵的知识。无论您的目标是扩大项目还是为初创企业注入新思路,加入我们,了解一个务实的开源策略如何革新组织,赋予科技先驱和初创企业力量。
Speakers
avatar for Hong Phuc Dang

Hong Phuc Dang

Founder, FOSSASIA
Hong Phuc is the founder of FOSSASIA, an organization dedicated to leveraging open technologies to enhance societal well-being and foster sustainable production practices. She chairs the annual FOSSASIA Summit, one of the largest open source conferences in Asia. With over a decade... Read More →
Wednesday August 21, 2024 4:25pm - 5:00pm HKT
Level 1 | Hung Hom Room 5

5:15pm HKT

Unlocking Heterogeneous AI Infrastructure K8s Cluster: Leveraging the Power of HAMi | 解锁异构AI基础设施K8s集群:发挥HAMi的力量 - Xiao Zhang, DaoCloud & Mengxuan Li, The 4th Paradigm
Wednesday August 21, 2024 5:15pm - 5:50pm HKT
With AI's growing popularity, Kubernetes has become the de facto AI infrastructure. However, the increasing number of clusters with diverse AI devices (e.g., NVIDIA, Intel, Huawei Ascend) presents a major challenge. AI devices are expensive, how to better improve resource utilization? How to better integrate with K8s clusters? How to manage heterogeneous AI devices consistently, support flexible scheduling policies, and observability all bring many challenges The HAMi project was born for this purpose. This session including: * How K8s manages heterogeneous AI devices (unified scheduling, observability) * How to improve device usage by GPU share * How to ensure the QOS of high-priority tasks in GPU share stories * Support flexible scheduling strategies for GPU (NUMA affinity/anti-affinity, binpack/spread etc) * Integration with other projects (such as volcano, scheduler-plugin, etc.) * Real-world case studies from production-level users. * Some other challenges still faced and roadmap

随着人工智能的日益普及,Kubernetes已成为事实上的人工智能基础设施。然而,不断增加的具有多样化人工智能设备(如NVIDIA、Intel、华为Ascend)的集群数量带来了重大挑战。人工智能设备价格昂贵,如何更好地提高资源利用率?如何更好地与K8s集群集成?如何一致地管理异构人工智能设备,支持灵活的调度策略和可观察性都带来了许多挑战。HAMi项目应运而生。本场演讲包括: * K8s如何管理异构人工智能设备(统一调度、可观察性) * 如何通过GPU共享提高设备使用率 * 如何确保GPU共享故事中高优先级任务的QOS * 为GPU支持灵活的调度策略(NUMA亲和性/反亲和性、binpack/spread等) * 与其他项目的集成(如volcano、scheduler-plugin等) * 来自生产级用户的实际案例研究。 * 仍然面临的一些其他挑战和路线图
Speakers
avatar for xiaozhang

xiaozhang

Senior Technical Lead, DaoCloud
- Xiao Zhang is leader of the Container team(focus on infra,AI,Muti-Cluster,Cluster - LCM,OCI) - Kubernetes / Kubernetes-sigs active Contributor、member - Karmada maintainer,kubean maintainer,HAMi maintainer - Cloud-Native Developer - CNCF Open Source Enthusiast. - GithubID: waw... Read More →
avatar for Mengxuan Li

Mengxuan Li

senior developer, The 4th Paradigm Co., Ltd
Reviewer of volcano community Founder of CNCF Landscape project HAMi responsible for the development of gpu virtualization mechanism on volcano. It have been merged in the master branch of volcano, and will be released in v1.8. speaker, in OpenAtom Global Open Source Commit#2023 speaker... Read More →
Wednesday August 21, 2024 5:15pm - 5:50pm HKT
Level 1 | Hung Hom Room 3

5:15pm HKT

Enhancing Application Delivery with KubeVela: Introducing the New Cuex Feature | 通过KubeVela增强应用交付:介绍新的Cuex功能 - Fog Dong, BentoML
Wednesday August 21, 2024 5:15pm - 5:50pm HKT
As the pace of software development accelerates, the need for more dynamic and flexible application delivery systems becomes crucial. KubeVela, as a modern application delivery system, has continuously evolved to meet these demands. In this session, the maintainers are excited to present the latest advancements in KubeVela, focusing on the introduction of our groundbreaking feature: Cuex. Cuex is designed to revolutionize the way developers interact with KubeVela by simplifying the process of writing and managing application definitions. This innovative feature enhances the core capabilities of KubeVela, allowing users not only to write definitions more effectively but also to extend the platform's functionality by registering their own custom functions as Cuex actions. Join us to explore how KubeVela is setting new standards in application delivery and how you can be a part of this evolving journey.

随着软件开发速度加快,对更具动态性和灵活性的应用交付系统的需求变得至关重要。作为现代应用交付系统,KubeVela不断发展以满足这些需求。在本场演讲中,维护人员很高兴地介绍KubeVela的最新进展,重点介绍我们的突破性功能Cuex的介绍。 Cuex旨在通过简化编写和管理应用程序定义的过程,彻底改变开发人员与KubeVela互动的方式。这一创新功能增强了KubeVela的核心功能,使用户不仅可以更有效地编写定义,还可以通过注册自定义函数作为Cuex操作来扩展平台的功能。 加入我们,探索KubeVela如何在应用交付领域树立新的标准,以及您如何成为这一不断发展之旅的一部分。
Speakers
avatar for Fog Dong

Fog Dong

Senior Software Engineer, BentoML
Fog Dong, a Senior Engineer at BentoML, KubeVela maintainer, CNCF Ambassador, and LFAPAC Evangelist, has a rich background in cloud native. Previously instrumental in developing Alibaba's large-scale Serverless Application Engine workflows and Bytedance's cloud-native CI/CD platform... Read More →
Wednesday August 21, 2024 5:15pm - 5:50pm HKT
Level 1 | Hung Hom Room 6

5:15pm HKT

How to Manage Database Clusters Without a Dedicated Operator | 如何在没有专门Operator的情况下管理数据库集群 - Shanshan Ying, ApeCloud & Shun Ding, China Mobile Cloud
Wednesday August 21, 2024 5:15pm - 5:50pm HKT
As Kubernetes becomes integral to cloud-native environments, more organizations are deploying database services on K8S, facing significant challenges. Integrating new database engines typically requires developing a dedicated Kubernetes operator that manages not only resource provisioning but also essential maintenance tasks like high availability, backup & restore, and configuration management. This session introduces a universal operator framework that supports various database engines, enabling rapid, minimal-code integration. We will present a case study from China Mobile Cloud on integrating a new cloud-native database engine into K8S using this framework, achieved with minimal coding and reduced time investment, bypassing the extensive Golang coding usually required for developing a dedicated operator.

随着Kubernetes成为云原生环境中不可或缺的一部分,越来越多的组织在K8S上部署数据库服务,面临着重大挑战。集成新的数据库引擎通常需要开发一个专门的Kubernetes operator,管理资源提供以及高可用性、备份和恢复、配置管理等重要维护任务。 本场演讲将介绍一个支持各种数据库引擎的通用operator框架,实现快速、最小代码集成。我们将从中国移动云的一个案例研究中介绍如何使用这个框架将新的云原生数据库引擎集成到K8S中,通过最小的编码和减少时间投入来实现,避免通常需要开发专门operator所需的大量Golang编码。
Speakers
avatar for Shanshan Ying

Shanshan Ying

Maintainer, ApeCloud
Shanshan is currently a maintainer of KubeBlocks by ApeCloud. Before joining ApeCloud, she worked in Aliyun Database Group for years. She received her PhD degree from National University of Singapore.
avatar for Shun Ding

Shun Ding

Senior Systems Architect, China Mobile Cloud
Shun is a Senior Systems Architect at China Mobile Cloud, leading the design, development, and deployment of next-generation Kubernetes-based large-scale database managing service. With over a decade of experience in cloud computing and database technologies, Shun has extensive expertise... Read More →
Wednesday August 21, 2024 5:15pm - 5:50pm HKT
Level 1 | Hung Hom Room 2
  KubeCon + CloudNativeCon Sessions, Operations + Performance

5:15pm HKT

Leveraging Wasm for Portable AI Inference Across GPUs, CPUs, OS & Cloud-Native Environments | 利用Wasm在GPU、CPU、操作系统和云原生环境中进行可移植的AI推理 - Miley Fu & Hung-Tung Tai, Second State
Wednesday August 21, 2024 5:15pm - 5:50pm HKT
This talk will focus on the advantages of using WebAssembly (Wasm) for running AI inference tasks in a cloud-native ecosystem. We will explore how wasm empowers devs to develop on their own PC and have their AI inference uniformly performed across different hardware, including GPUs and CPUs, operating systems, edge cloud etc. We'll discuss how Wasm and Wasm runtime facilitates seamless integration into cloud-native frameworks, enhancing the deployment and scalability of AI applications. This presentation will specifically highlight how Wasm provides a flexible, efficient solution suitable for diverse cloud-native architectures, including Kubernetes, to allow developers to fully tap the potential of LLMs, especially open source LLMs. The session offers insights into maximizing the potential of AI applications by leveraging the cross-platform capabilities of Wasm, ensuring consistency, low cost, and efficiency in AI inference across different computing environments.

本次演讲将重点介绍在云原生生态中运行AI推理任务时使用WebAssembly(Wasm)的优势。我们将探讨如何使用Wasm使开发者能够在自己的个人电脑上开发,并在不同硬件(包括GPU和CPU)、操作系统、边缘云等上统一执行他们的AI推理。 我们将讨论Wasm和Wasm运行时如何实现无缝集成到云原生框架中,增强AI应用程序的部署和可扩展性。本次演示将重点展示Wasm如何提供灵活、高效的解决方案,适用于各种云原生架构,包括Kubernetes,以帮助开发者充分发挥大语言模型的潜力,特别是开源大语言模型。 将深入探讨通过利用Wasm的跨平台能力来最大限度地发挥AI应用的潜力,确保在不同计算环境中实现AI推理的一致性、低成本和高效性。
Speakers
avatar for Hung-Ying Tai

Hung-Ying Tai

Software Engineer, Second State
Hung-Ying is a maintainer of the WasmEdge project and a pioneer in compiler optimization and virtual machine design. He is a prolific open-source contributor, participating in many open-source projects, including go-ethereum, solidity, SOLL, crun, and WasmEdge.
avatar for Miley Fu

Miley Fu

CNCF Ambassador, Founding member at WasmEdge, Second State Inc
Miley is a Developer Advocate with a passion for empowering devs to build and contribute to open source. With over 5 years of experience working on WasmEdge runtime in CNCF sandbox as the founding member, she talked at KubeCon, KCD Shenzhen, CloudDay Italy, DevRelCon, Open Source... Read More →
Wednesday August 21, 2024 5:15pm - 5:50pm HKT
Level 1 | Hung Hom Room 7
  KubeCon + CloudNativeCon Sessions, AI + ML

5:15pm HKT

Multi-Cluster Networking and Service Discovery Leveraging NRI | 利用NRI的多集群网络和服务发现 - LingMing Xia, Purple Mountain Laboratories & Di Xu, Xiaohongshu
Wednesday August 21, 2024 5:15pm - 5:50pm HKT
Connection and service discovery are usually key challenges for multi-cluster management, existing solutions such as Submariner introduce pre-conditions for public IP and specific CNI. This is problematic for projects like the "East-to-West Computing Resource Transfer Project" where clusters lack public IPs and have diverse CNIs due to different ownership. This session introduces a solution to establish an independent and unified parallel network for east-west traffic cross clusters based on Node Resource Interface (NRI) to avoid intrusive modifications for clusters and limitations on CNI. A hybrid approach is provided for inter-cluster traffic: clusters can communicate through a hub cluster with public IP or connect directly if public IP is equipped. Moreover, cross-cluster service discovery follows the MCS standard to ensure seamless service access. All functionalities remain agnostic to Kubernetes and applications. A live demo will be shown in this session.

连接和服务发现通常是多集群管理的关键挑战,现有解决方案如Submariner引入了公共IP和特定CNI的先决条件。这对于像“东西计算资源转移项目”这样的项目是有问题的,因为集群缺乏公共IP并且由于不同所有权而具有不同的CNI。 本场演讲介绍了一种解决方案,基于节点资源接口(NRI)建立一个独立和统一的跨集群东西流量网络,以避免对集群进行侵入性修改和对CNI的限制。提供了一种混合方法用于集群间流量:集群可以通过具有公共IP的中心集群进行通信,或者如果具有公共IP则可以直接连接。此外,跨集群服务发现遵循MCS标准,以确保无缝的服务访问。所有功能都与Kubernetes和应用程序无关。 本场演讲将展示现场演示。
Speakers
avatar for Di Xu

Di Xu

Principle Software Engineer, Xiaohongshu
Currently, he serves as a Tech Lead at Xiaohongshu, where he leads a team focused on building a highly reliable and scalable container platform. He is the founder of CNCF Sandbox Project Clusternet. Also, he is a top 50 code contributor in Kubernetes community. He had spoken many... Read More →
avatar for Lingming

Lingming

Researcher in Purple Mountain Laboratories, Purple Mountain Laboratories
Focusing on subjects such as cloud-native and distributed clouds. I am currently working as a researcher in the New Computing Architecture Research group of Purple Mountain Laboratories.
Wednesday August 21, 2024 5:15pm - 5:50pm HKT
Level 1 | Hung Hom Room 1
  KubeCon + CloudNativeCon Sessions, Connectivity

5:15pm HKT

Time Series Database on Kubernetes: Efficient Management of Massive Internet of Vehicles Data | Kubernetes上的时序数据库:高效管理海量物联网车辆数据 - Vicky Lee, Huawei Cloud Computing Technology Co., Ltd.
Wednesday August 21, 2024 5:15pm - 5:50pm HKT
Today, more and more car companies are building a new generation of Internet of Vehicles platforms based on cloud-native technology stacks such as Kubernetes. However, as more and more cars are produced, they generate hundreds of GB of data every second, making it difficult to store massive data in real-time and making storage costs difficult to control. which requires the platform's underlying database to be low-cost, high-performance, and efficient. openGemini is a cloud-native distributed time series database with high performance and low cost. In data writing, we provide a dedicated high-performance data writing component that supports Arrow Flight. Regarding data storage, we provide specialized data compression algorithms and support local data storage and object storage. This talk will introduce how to build Internet of Vehicles platforms based on cloud-native technology stacks and share the technical practices on how to efficiently manage massive vehicle data.

今天,越来越多的汽车公司正在基于Kubernetes等云原生技术堆栈构建新一代车联网平台。然而,随着汽车的生产越来越多,它们每秒产生数百GB的数据,使得实时存储海量数据变得困难,存储成本难以控制。这就要求平台的底层数据库要低成本、高性能和高效。openGemini是一个具有高性能和低成本的云原生分布式时间序列数据库。在数据写入方面,我们提供了支持Arrow Flight的专用高性能数据写入组件。在数据存储方面,我们提供了专门的数据压缩算法,并支持本地数据存储和对象存储。 本次演讲将介绍如何基于云原生技术堆栈构建车联网平台,并分享如何有效管理海量车辆数据的技术实践。
Speakers
avatar for Vicky Lee

Vicky Lee

Engineer, Huawei Cloud Computing Technology Co., Ltd.
Vicky Lee, a Time-series database expert in the HUAWEI CLOUD Database Innovation Lab and the Co-founder of the openGemini community, has been engaged in distributed databases and NoSQL databases as a cloud service for many years. Currently, mainly dedicated to openGemini developm... Read More →
Wednesday August 21, 2024 5:15pm - 5:50pm HKT
Level 2 | Grand Ballroom 1-2

5:15pm HKT

Scorecard: Assessments Made Easy | Scorecard:让开源项目评估更轻松 - Ram Iyengar, Cloud Foundry Foundation
Wednesday August 21, 2024 5:15pm - 5:50pm HKT
Scorecard is a project of the OpenSSF, which makes it simple to assess the health of any repository. It is a fully open source project built with the aim of bringing transparency and standardization around security health metrics. Scorecard is a cross-industry collaboration between big and small names in OSS/security. Scorecard checks for vulnerabilities affecting different parts of the software supply chain including source code, build, dependencies, testing, and project maintenance.

Scorecard 是 OpenSSF 的一个项目,它简化了对任何代码仓库健康状况的评估。这是一个完全开源的项目,旨在为安全健康指标带来透明度和标准化。Scorecard 是开源软件/安全领域大大小小公司之间的跨行业合作。Scorecard 检查影响软件供应链不同部分的漏洞,包括源代码、构建、依赖关系、测试和项目维护。
Speakers
avatar for Ram Iyengar

Ram Iyengar

Chief Evangelist, Cloud Foundry Foundation
Ram Iyengar is an engineer by practice and an educator at heart. He was (cf) pushed into technology evangelism along his journey as a developer and hasn’t looked back since! He enjoys helping engineering teams around the world discover new and creative ways to work. He is a proponent... Read More →
Wednesday August 21, 2024 5:15pm - 5:50pm HKT
Level 1 | Hung Hom Room 5
  Open Source Summit Sessions, Supply Chain Security

6:00pm HKT

Welcome Reception 🎉 | 欢迎酒会
Wednesday August 21, 2024 6:00pm - 8:00pm HKT
Join us onsite for drinks, appetizers, and conversations with old and new friends in the Solutions Showcase. Explore the exhibit booths to learn more about the latest technologies, browse special offers and job posts, and much more.
In order to facilitate networking and business relationships at the event, you may choose to visit a third party’s booth or to access sponsored content. You are never required to visit third-party booths or to access sponsored content.

When visiting a booth or participating in sponsored activities, the third party will receive some of your registration data. This data includes your first name, last name, title, company, address, email, standard demographics questions (i.e. job function, industry), and details about the sponsored content or resources you interacted with. If you choose to interact with a booth or access sponsored content, you are explicitly consenting to receipt and use of such data by the third-party recipients, which will be subject to their own privacy policies.

一起来参加我们的欢迎酒会,在解决方案展示区与新老朋友一起享用美味的饮品、开胃菜,并愉快地畅谈。探索展台,了解最新的技术,浏览特别优惠和招聘信息等等。
为了促进活动中的互动和业务交流,您可以选择访问第三方的展位或者获取赞助内容。我们不会强制要求您参观第三方展位或获取赞助内容。当您访问展位或参与赞助活动时,第三方将收到您的一些注册数据。这些数据包括您的名字、姓氏、职位、公司、地址、电子邮件、标准人口统计问题(例如工作职能、行业)以及您与赞助内容或资源互动的详细信息。如果您选择与展位互动或获取赞助内容,您明确同意第三方接收和使用此类数据,这些数据将受到他们自己的隐私政策的约束。
Wednesday August 21, 2024 6:00pm - 8:00pm HKT
Level 2 | Grand Ballroom 3-4

6:10pm HKT

Project Pavilion Tour with Jorge Castro, CNCF | 与 Jorge Castro 进行的 CNCF 项目展厅之旅
Wednesday August 21, 2024 6:10pm - 6:30pm HKT
Explore the Project Pavilion, a hub of innovation and discovery! Take part in daily tours, interact with project maintainers at their kiosks, gain insights on community engagement and KCD event organization, and learn more about certification opportunities to showcase your expertise.

Join cloud veteran Jorge Castro as he takes you on a guided tour of our cloud native projects. This tour will include an introduction to the Pavilion, making introductions, interacting with maintainers, and ensuring you end up talking to the right projects!

Meeting Point: Please meet Jorge over at the Project Pavilion at the sign "CNCF Project Team Here to Help!"
Wednesday August 21, 2024 6:10pm - 6:30pm HKT
Level 2 | Grand Ballroom 3-4 | Project Pavilion
 
Thursday, August 22
 

7:30am HKT

Registration + Badge Pickup | 会议签到 + 胸牌领取
Thursday August 22, 2024 7:30am - 6:00pm HKT
TBA
Thursday August 22, 2024 7:30am - 6:00pm HKT
TBA

9:00am HKT

9:10am HKT

10:05am HKT

Keynote: Supporting Large-Scale and Reliability Testing in Kubernetes using KWOK | 主论坛演讲: 支持在Kubernetes中使用KWOK进行大规模和可靠性测试 - Yuan Chen, NVIDIA & Shiming Zhang, DaoCloud
Thursday August 22, 2024 10:05am - 10:20am HKT
Kubernetes is the de facto platform for running workloads at scale. This talk will present KWOK (https://kwok.sigs.k8s.io/), an open-source toolkit that enables the creation and testing of large-scale Kubernetes clusters with minimal resources, even on a laptop.
Shiming Zhang, the creator and maintainer of KWOK, and Yuan Chen, an engineer at NVIDIA GPU Cloud, will outline KWOK's capabilities to generate and manage a large number of virtual nodes that simulate Kubelet APIs and mimic real nodes, allowing for workload deployment and testing. They will discuss practical use cases of KWOK.

The talk will then introduce KWOK's recent enhancements for reliability and fault-tolerance testing, showcasing its ability to simulate failures by injecting targeted faults into nodes and pods. Through examples and demos, the talk will demonstrate how KWOK can be used for reliability testing and evaluating fault-tolerance mechanisms, ultimately improving workload resilience in Kubernetes.



Kubernetes是运行大规模工作负载的事实标准平台。本次演讲将介绍KWOK(https://kwok.sigs.k8s.io/),这是一个开源工具包,可以利用极少的资源(甚至在笔记本电脑上)创建和测试大规模Kubernetes集群。

KWOK的创始人和维护者张世明,以及NVIDIA GPU Cloud的工程师陈源,将详细阐述KWOK的功能,包括生成和管理大量模拟Kubelet API和真实节点的虚拟节点,从而支持工作负载的部署和测试。他们将讨论KWOK的实际使用案例。

演讲还将介绍KWOK最近针对可靠性和容错性测试的增强功能,展示其通过向节点和Pod注入有针对性的故障来模拟故障的能力。通过示例和演示,演讲将展示如何利用KWOK进行可靠性测试和评估容错机制,从而最终提升Kubernetes中工作负载的弹性能力。
Speakers
avatar for Yuan Chen

Yuan Chen

Principal Software Engineer, NVIDIA
Yuan Chen is a Principal Software Engineer at NVIDIA, working on building NVIDIA GPU Cloud. He served as a Staff Software Engineer at Apple from 2019 to 2024, where he contributed to the development of Apple's Kubernetes infrastructure. Yuan has been an active code contributor to... Read More →
avatar for Shiming Zhang

Shiming Zhang

Software Engineer, DaoCloud
Shiming Zhang is a contributor to Kubernetes with the main focus on scalability, performance, reliability, and testing, he gained experience and contributed to many Kubernetes features and most of its components.
Thursday August 22, 2024 10:05am - 10:20am HKT
Level 2 | Grand Ballroom 1-2

10:20am HKT

10:30am HKT

Coffee Break ☕ | 茶歇
Thursday August 22, 2024 10:30am - 11:00am HKT
Thursday August 22, 2024 10:30am - 11:00am HKT
Level 2 | Grand Ballroom 3-4

10:30am HKT

Solutions Showcase | 解决方案展示
Thursday August 22, 2024 10:30am - 5:30pm HKT
Visit our sponsors in the Solutions Showcase to try the latest demos, watch live presentations, talk to experts, check out job opportunities, and score some swag.

请访问我们在解决方案展示区的赞助商,尝试最新的演示,观看现场演示,与专家交谈,了解工作机会,并获得一些赠品。

In order to facilitate networking and business relationships at the event, you may choose to visit a third party’s booth or to access sponsored content. You are never required to visit third-party booths or to access sponsored content. When visiting a booth or participating in sponsored activities, the third party will receive some of your registration data. This data includes your first name, last name, title, company, address, email, standard demographics questions (i.e. job function, industry), and details about the sponsored content or resources you interacted with. If you choose to interact with a booth or access sponsored content, you are explicitly consenting to receipt and use of such data by the third-party recipients, which will be subject to their own privacy policies.

为了促进活动中的网络和业务关系,您可以选择访问第三方的展位或者获取赞助内容。我们不会强制要求您参观第三方展位或获取赞助内容。当您访问展位或参与赞助活动时,第三方将收到您的一些注册数据。这些数据包括您的名字、姓氏、职位、公司、地址、电子邮件、标准人口统计问题(例如工作职能、行业)以及您与赞助内容或资源互动的详细信息。如果您选择与展位互动或获取赞助内容,您明确同意第三方接收和使用此类数据,这将受到他们自己的隐私政策的约束
Thursday August 22, 2024 10:30am - 5:30pm HKT
Level 2 | Grand Ballroom 3-4

10:30am HKT

10:40am HKT

Project Pavilion Tour with Jorge Castro, CNCF | 与 Jorge Castro 进行的 CNCF 项目展厅之旅
Thursday August 22, 2024 10:40am - 11:00am HKT
Explore the Project Pavilion, a hub of innovation and discovery! Take part in daily tours, interact with project maintainers at their kiosks, gain insights on community engagement and KCD event organization, and learn more about certification opportunities to showcase your expertise.

Join cloud veteran Jorge Castro as he takes you on a guided tour of our cloud native projects. This tour will include an introduction to the Pavilion, making introductions, interacting with maintainers, and ensuring you end up talking to the right projects!

Meeting Point: Please meet Jorge over at the Project Pavilion at the sign "CNCF Project Team Here to Help!"
Thursday August 22, 2024 10:40am - 11:00am HKT
Level 2 | Grand Ballroom 3-4 | Project Pavilion

11:00am HKT

Unlocking the Power of Kubernetes: AI-Driven Innovations for Next-Gen Infrastructure | 释放 Kubernetes 的力量:面向下一代基础设施的 AI 驱动创新 - Brandon Kang, Akamai Technologies
Thursday August 22, 2024 11:00am - 11:35am HKT
My session is about dynamic synergy between Kubernetes and AI, unveiling a transformative paradigm shift in modern infrastructure management. The presentation unveils how Kubernetes serves as an enabler for deploying and scaling AI workloads efficiently, optimizing resource utilization, and ensuring unparalleled scalability. Delving deeper, it explores the realm of AI-powered automation, showcasing how intelligent algorithms enhance auto-scaling, workload optimization, and predictive maintenance within Kubernetes clusters. Moreover, it sheds light on the crucial aspect of security, elucidating how AI-driven measures bolster threat detection and anomaly identification, fortifying Kubernetes environments against potential risks. This presentation beckons organizations to embrace the convergence of Kubernetes and AI, unlocking boundless possibilities to redefine infrastructure management and propel towards unprecedented efficiency and resilience.

我的演讲是关于 Kubernetes 和人工智能之间的动态协同作用,揭示了现代基础设施管理中的转变范式。演示展示了 Kubernetes 如何作为部署和扩展人工智能工作负载的促进者,有效优化资源利用率,并确保无与伦比的可扩展性。更深入地探讨了基于人工智能的自动化领域,展示了智能算法如何增强 Kubernetes 集群内的自动扩展、工作负载优化和预测性维护。此外,它还阐明了安全的关键方面,阐明了人工智能驱动的措施如何加强威胁检测和异常识别,加固 Kubernetes 环境抵御潜在风险。 这个演示呼吁组织 embrace Kubernetes 和人工智能的融合,解锁无限可能性,重新定义基础设施管理,并朝着前所未有的效率和韧性迈进。
Speakers
avatar for Brandon Kang

Brandon Kang

Principal Technical Solutions Architect, Akamai Technologies
Brandon Kang is a Cloud Specialist at Akamai Technologies, where he oversees cloud computing projects across the APJ markets, including China. Before his tenure at Akamai, Brandon was a software engineer at Samsung, a program manager at Microsoft, and a service platform expert at... Read More →
Thursday August 22, 2024 11:00am - 11:35am HKT
Level 1 | Hung Hom Room 3

11:00am HKT

Enhancing Security and Software Supply Chain: Recent and Upcoming Features in Harbor | 增强安全性和软件供应链:Harbor 中的最新和即将推出的功能 - Stone Zhang, Broadcom
Thursday August 22, 2024 11:00am - 11:35am HKT
In 2023 to 2024, we released Harbor 2.9 and 2.10, and we will release 2.11 soon. In these releases, we have mainly focused on adding or enhancing features related to security and the software supply chain. These features include the Security Hub, which analyzes vulnerability information in artifacts across different dimensions, and the SBOM generation feature, which can create SBOMs manually or automatically. We have also improved the performance of the garbage collector by implementing parallel processing and ensured alignment with the latest OCI specifications. In future releases, we will continue to explore the potential of using SBOMs to secure the software supply chain and facilitate AI model distribution in cloud-native applications. We welcome software engineers and DevOps professionals to join our community and explore the new features of Harbor together. Let's work together to make Harbor even better!

在2023年至2024年,我们发布了Harbor 2.9和2.10版本,很快将发布2.11版本。在这些版本中,我们主要专注于添加或增强与安全和软件供应链相关的功能。这些功能包括Security Hub,它可以分析不同维度中的构件中的漏洞信息,以及SBOM生成功能,可以手动或自动创建SBOM。我们还通过实现并行处理改进了垃圾收集器的性能,并确保与最新的OCI规范保持一致。在未来的版本中,我们将继续探索使用SBOM来保护软件供应链并促进在云原生应用中分发AI模型的潜力。我们欢迎软件工程师和DevOps专业人士加入我们的社区,一起探索Harbor的新功能。让我们共同努力,使Harbor变得更好!
Speakers
SZ

Stone Zhang

Staff Engineer, Broadcom
Thursday August 22, 2024 11:00am - 11:35am HKT
Level 1 | Hung Hom Room 6

11:00am HKT

A Story of Managing Kubernetes Watch Events End-to End Flow in Extremely Large Clusters | 在极大规模集群中管理Kubernetes watch事件端到端流程的故事 - Bo Tang, Ant Group
Thursday August 22, 2024 11:00am - 11:35am HKT
The K8s watching mechanism has not been given the attention it deserves for an extended period. However, it is critical to the K8s cluster in both stability and perfermance aspsects and watch latency is a perfect indicator of cluster health. This talk begins by introducing the measurement of watch events latency and then defines watch SLI and SLO metrics. Using watch SLO as a guide, the talk will show the bottleneck identification process for watching. And the talk will describe the optimizations made to apiserver, etcd, kubelet, controller-runtime and clients such as controllers and schedulers in various aspects wrt watching, including watch latency, pod provisioning time, bandwidth, cpu/mem etc. With these optimizations, daily P99 watch latency has improved by over 90% in large clusters (~20K nodes) impacting billions of watch events. Pod provisioning time has improved by over 60%. Apiserver bandwidth has decreased by 50%. The overall stability of K8s cluster has improved greatly.

K8s观察机制长期以来并未得到应有的重视。然而,它对于K8s集群的稳定性和性能至关重要,观察延迟是集群健康的完美指标。 本次演讲将首先介绍观察事件延迟的测量,然后定义观察SLI和SLO指标。通过观察SLO作为指导,演讲将展示观察瓶颈识别过程。演讲将描述在观察方面对apiserver、etcd、kubelet、controller-runtime和客户端(如控制器和调度器)进行的各种优化,包括观察延迟、Pod提供时间、带宽、CPU/内存等方面。 通过这些优化,大型集群(~20K节点)中每日P99观察延迟已经提高了超过90%,影响了数十亿次观察事件。Pod提供时间已经提高了超过60%。Apiserver带宽减少了50%。K8s集群的整体稳定性得到了极大的改善。
Speakers
avatar for Bo Tang

Bo Tang

Senior Engineer, Ant Group
Bo Tang is a senior engineer in Ant Group. He is currently working on scalability and performance optimization of Kubernetes clusters.
Thursday August 22, 2024 11:00am - 11:35am HKT
Level 1 | Hung Hom Room 2

11:00am HKT

Dollars and PPM's - Carbon Emissions and Cloud Spend | 美元和PPM - 碳排放和云支出 - Bryan Oliver, Thoughtworks
Thursday August 22, 2024 11:00am - 11:35am HKT
Cloud Carbon emissions are unfortunately not the priority of most enterprises. Costs, however, are. In the Cloud Native space, there is an ever-growing list of spend tracking and reduction tools. In this talk, we'll discuss several strategies you can adopt to unify the prioritization of cloud costs and carbon impact. We want to show how you can align with your business goal of simultaneously reducing cloud spend and overall carbon emissions.

云计算的碳排放很可惜并不是大多数企业的首要任务。成本,然而,是。在云原生领域,有越来越多的支出跟踪和降低工具。 在这次讨论中,我们将讨论几种您可以采用的策略,统一云成本和碳影响的优先级。我们希望展示如何与您同时降低云支出和整体碳排放的业务目标保持一致。
Speakers
avatar for Bryan Oliver

Bryan Oliver

Principal, Thoughtworks
Bryan is an experienced engineer and leader who designs and builds complex distributed systems. He has spent his career developing mobile and back-end systems whilst building autonomous teams. More recently he has been focused on delivery and cloud native at Thoughtworks. In his free... Read More →
Thursday August 22, 2024 11:00am - 11:35am HKT
Level 2 | Grand Ballroom 1-2
  KubeCon + CloudNativeCon Sessions, Observability

11:00am HKT

OpenYurt & Dragonfly: Enhancing Efficient Distribution of LLMs in Cloud-Edge Collaborative Scenarios | OpenYurt和Dragonfly:增强云边协作场景中LLM的高效分发 - Linbo He, alibaba cloud & Jim Ma, Ant Group
Thursday August 22, 2024 11:00am - 11:35am HKT
As LLMs continue to grow in size, their deployment and delivery in cloud-edge environments are faced with substantial challenges, especially within edge computing settings that encompass multiple sites with thousands of edge nodes. In this presentation, we will explore how to efficiently distribute LLM applications across dispersed edge nodes using OpenYurt. We will also delve into how Dragonfly’s P2P image distribution technology can address the issue of public network bandwidth consumption encountered during cross-site transmission, reducing public network traffic consumption by up to 90% compared to conventional LLM distribution, and achieving rapid and efficient sharing of LLMs in physically isolated environments. During this presentation, container service experts from Alibaba Cloud and Ant Group will share this solution and introduce the practical application of combining OpenYurt with Dragonfly in edge computing scenarios for LLMs.

随着LLM的规模不断增长,它们在云边缘环境中的部署和交付面临着重大挑战,特别是在涵盖数千个边缘节点的边缘计算环境中。在本次演讲中,我们将探讨如何使用OpenYurt在分散的边缘节点上高效分发LLM应用程序。我们还将深入探讨Dragonfly的P2P图像分发技术如何解决跨站点传输中遇到的公共网络带宽消耗问题,与传统的LLM分发相比,将公共网络流量消耗降低高达90%,实现在物理隔离环境中LLM的快速高效共享。 在本次演示中,来自阿里巴巴云和蚂蚁集团的容器服务专家将分享这一解决方案,并介绍在LLM的边缘计算场景中将OpenYurt与Dragonfly结合应用的实际应用。
Speakers
avatar for Jim Ma

Jim Ma

Senior Engineer, Ant Group
Kubernetes enthusiast at Ant Group, diving deep into Kubernetes CSI storage, OCI image distribution and maintaining CNCF Dragonfly.
avatar for Linbo He

Linbo He

senior software engineer, alibaba cloud
I am a member of the Alibaba Cloud Container Service team and one of the founding contributors to the OpenYurt project. Since 2015, I have been actively engaged in the design, development, and open-source initiatives related to Kubernetes. I have taken on responsibilities in a variety... Read More →
Thursday August 22, 2024 11:00am - 11:35am HKT
Level 1 | Hung Hom Room 1
  KubeCon + CloudNativeCon Sessions, Connectivity

11:00am HKT

The Journey of Next-Gen FinTech IDP at China Merchants Bank | 中国招商银行下一代金融科技IDP之旅 - Jiahang Xu, China Merchants Bank
Thursday August 22, 2024 11:00am - 11:35am HKT
Explore China Merchants Bank's (CMB), one of China's largest retail banks, transformative journey through cloud migration, cloud-native transformation, and platform engineering over the past three years. Despite challenges such as increased complexity in cloud technology and management, and potential risks to developer productivity and continuous assurance of financial services, CMB successfully leveraged KubeVela, OpenFeature, Envoy, Clilum, and OpenTelemetry to build the Next-Gen FinTech IDP. This led to the management of 70% of applications within a year and improved developer experience, covering thousands of R&D engineers. We'll discuss the strategic thinking, 'Golden Path' implementation, struggles, trade-offs, and key success metrics with platform engineering maturity model. This session provides a blueprint and reference architecture for financial organizations undergoing similar transformations.

在KubeCon的会议描述中,探索中国招商银行(CMB)作为中国最大的零售银行之一,在过去三年中通过云迁移、云原生转型和平台工程的变革之旅。尽管面临诸如云技术和管理复杂性增加、开发人员生产力和金融服务持续保障的潜在风险等挑战,CMB成功利用KubeVela、OpenFeature、Envoy、Clilum和OpenTelemetry构建了下一代金融科技IDP。这导致了一年内管理了70%的应用程序,并改善了开发人员体验,涵盖了数千名研发工程师。我们将讨论战略思维、“黄金路径”实施、挣扎、权衡和关键成功指标,以及平台工程成熟度模型。本场演讲提供了金融机构进行类似转型的蓝图和参考架构。
Speakers
avatar for Jiahang Xu

Jiahang Xu

System Architect, China Merchants Bank
Jiahang Xu is a System Architect at China Merchants Bank. He has over 14 years of unique cross-domain experience working in telecom, automotive, financial industry, startup as a co-founder, and KubeVela maintainer. He's mainly focused on cloud-native application technology and platform... Read More →
Thursday August 22, 2024 11:00am - 11:35am HKT
Level 1 | Hung Hom Room 7
  KubeCon + CloudNativeCon Sessions, Platform Engineering

11:00am HKT

Revolutionizing Service Mesh with Kernel-Native Sidecarless Architecture | 用内核原生无边车架构彻底改变服务网格 - ChangYe Wu, Huawei Technologies Co., Ltd.
Thursday August 22, 2024 11:00am - 11:35am HKT
Service mesh technology has revolutionized service governance among microservices, but as clusters expand, challenges arise. Proxy programs can strain resources, with memory consumption reaching GB levels and CPU overhead peaking at 30%. Furthermore, this expansion often leads to noticeable delays in microservice access. This call for proposals seeks to address these challenges head-on by exploring innovative solutions within the kernel-native sidecarless service mesh framework. We invite submissions that delve into: Efficient Resource Management: Novel strategies to minimize memory consumption and CPU overhead in proxy programs, ensuring optimal resource utilization. Latency Optimization: Techniques to reduce microservice access latency without compromising on service governance effectiveness. Real-world Implementations: Case studies or examples showcasing successful deployments of kernel-native sidecarless service mesh in diverse environments.

服务网格技术已经在微服务之间的服务治理方面发生了革命,但随着集群的扩大,也带来了挑战。代理程序可能会消耗资源,内存消耗可能达到GB级别,CPU开销可能达到30%。此外,这种扩展通常会导致微服务访问出现明显的延迟。 本次征集旨在通过探索内核本地无边车服务网格框架中的创新解决方案来直面这些挑战。 我们邀请提交以下内容的提案: 高效资源管理:采用新颖策略来最小化代理程序的内存消耗和CPU开销,确保资源的最佳利用。 延迟优化:通过技术手段减少微服务访问延迟,同时不影响服务治理的有效性。 实际应用:展示在不同环境中成功部署内核本地无边车服务网格的案例研究或示例。
Speakers
avatar for ChangYe Wu

ChangYe Wu

Senior software engineer, Huawei Technologies Co., Ltd.
10+ years of OS and network experience, and extensive interest in kernel protocol stack, cloud native, service grid, and EBPF technologies.
Thursday August 22, 2024 11:00am - 11:35am HKT
Level 1 | Hung Hom Room 5
  Open Source Summit Sessions, Operating Systems

11:50am HKT

VeScale: A PyTorch Native LLM Training Framework | veScale:一个PyTorch原生LLM训练框架 - Hongyu Zhu, ByteDance
Thursday August 22, 2024 11:50am - 12:25pm HKT
The era of giant LLM today calls forth distributed training. Despite countless distributed training frameworks that have been published in the past decade, few have excelled at real industry production, as the quality favored the most is often the Ease of Use instead of pure Performance. The Ease of Use lies in two essentials -- PyTorch and Automatic Parallelism, because: i) PyTorch ecosystem dominates and owns 92% of models on HuggingFace, and ii) giant models cannot be trained without complex nD Parallelism. Currently, this Ease of Use is "broken" for industry-level frameworks, as they are either not PyTorch-native (TensorFlow/JAX) or not fully Automated (Megatron/DeepSpeed/torch). We propose a novel framework that combines PyTorch Nativeness and Automatic Parallelism for scaling LLM training with Ease of Use. We only expect developers to write single-device torch code but automatically parallelize it into nD parallelism with all heavy lifting handled transparently.

当今巨型LLM时代呼唤分布式训练。尽管过去十年中已经发布了无数分布式训练框架,但很少有能够在真实产业生产中表现出色,因为最受青睐的质量往往是易用性而不是纯性能。易用性在于两个关键点--PyTorch和自动并行性,因为:i)PyTorch生态系统主导并拥有HuggingFace上92%的模型,ii)巨型模型无法在没有复杂的nD并行性的情况下进行训练。 目前,这种易用性对于产业级框架来说已经“破碎”,因为它们要么不是PyTorch原生的(TensorFlow/JAX),要么不是完全自动化的(Megatron/DeepSpeed/torch)。 我们提出了一个结合了PyTorch原生性和自动并行性的新型框架,以便通过易用性扩展LLM训练。我们只期望开发人员编写单设备torch代码,但自动将其并行化为nD并行性,所有繁重的工作都由框架透明地处理。
Speakers
avatar for Hongyu Zhu

Hongyu Zhu

Machine Learning System Software Engineer, ByteDance
Hongyu is a Machine Learning System Engineer in ByteDance AML group, working on systems and compilers for training workloads. He got his PhD degree from University of Toronto, where he worked with Professor Gennady Pekhimenko. He is generally interested in machine learning compilers... Read More →
Thursday August 22, 2024 11:50am - 12:25pm HKT
Level 1 | Hung Hom Room 3

11:50am HKT

How Volcano Enable Next Wave of Intelligent Applications | 如何让 Volcano 激活下一波智能应用 - William Wang, Huawei Cloud Technologies Co., LTD
Thursday August 22, 2024 11:50am - 12:25pm HKT
According to Gartner predication, 30% of new applications will use AI techonolgy by 2026. However, the popularity of AI applications also faces challenges. This talk will introduce the challenges, the solutions and show how to leverage Volcano to enable the intelligent applications. Volcano is a cloud native batch platform and CNCF's first container batch computing project. It is optimized for AI and Bigdata by providing the following capabilities: - Full lifecycle management for jobs - Scheduling policies for batch workloads - Support for heterogeneous hardware - Performance optimization for high performance workloads This year Volcano contributors have made great progress to help users to address challenges for intelligent application. A number of new features are on the way to accelerate the GPU/Ascend NPU training efficiency, optimize resource utilization for large scale clusters and provides fine-grained scheduling.

根据Gartner的预测,到2026年将有30%的新应用程序将使用人工智能技术。然而,人工智能应用的普及也面临挑战。 本次讲座将介绍这些挑战、解决方案,并展示如何利用Volcano实现智能应用。 Volcano是一个云原生批处理平台,也是CNCF的第一个容器批处理计算项目。它通过提供以下功能来优化人工智能和大数据: - 作业的全生命周期管理 - 批处理工作负载的调度策略 - 支持异构硬件 - 高性能工作负载的性能优化 今年,Volcano的贡献者取得了巨大进展,帮助用户解决智能应用的挑战。许多新功能正在开发中,以加速GPU/Ascend NPU训练效率,优化大规模集群的资源利用率,并提供细粒度调度。
Speakers
avatar for William Wang

William Wang

Architect, Huawei Cloud Technologies Co., LTD
William(LeiBo) Wang is an architect of Huawei Cloud. And He is responsible for planning and implementing cloud native scheduling system on HUAWEI CLOUD. He is also the tech lead of CNCF Volcano project, focusing on large-scale cluster resource management, batch scheduling, BigData... Read More →
Thursday August 22, 2024 11:50am - 12:25pm HKT
Level 1 | Hung Hom Room 6

11:50am HKT

Beyond the Basics: Towards Making Thanos Production-Ready | 超越基础:朝着使Thanos达到生产就绪状态的方向前进 - Benjamin Huo & Junhao Zhang, QingCloud Technologies
Thursday August 22, 2024 11:50am - 12:25pm HKT
As one of the most popular and powerful Prometheus long-term storage projects, Thanos is widely adopted by the community. But to use Thanos in production, there are still a lot of day-2 operations that need to be automated. In this talk, KubeSphere maintainers will share their experiences in using and maintaining Thanos in production including: - Kubernetes native definition of all Thanos components - Tenant isolation of ingestion, rule evaluation, compaction - Tenant-based autoscaling mechanism of Thanos Ingester, Ruler, and Compactor - The time-based partition of Thanos store - Tenant-based data lifetime management - The sharding mechanism of the global ruler to handle massive recording rules and alerting rules evaluation workload - The gateway & agent proxy mechanism for read/write with tenant access control - The basic_auth, built-in query UI, and external remote write and query support of the gateway - The tls support between Thanos components - The 3-tier config management

作为最受欢迎和强大的Prometheus长期存储项目之一,Thanos被社区广泛采用。但要在生产环境中使用Thanos,仍然需要自动化许多第二天的运维工作。在这次演讲中,KubeSphere的维护者将分享他们在生产环境中使用和维护Thanos的经验,包括: - 所有Thanos组件的Kubernetes本地定义 - 数据摄入、规则评估、压缩的租户隔离 - 基于租户的Thanos Ingester、Ruler和Compactor的自动扩展机制 - Thanos存储的基于时间的分区 - 基于租户的数据生命周期管理 - 全局规则分片机制,用于处理大量录制规则和警报规则评估工作负载 - 用于读写的网关和代理机制,带有租户访问控制 - 网关的basic_auth、内置查询UI以及外部远程写入和查询支持 - Thanos组件之间的tls支持 - 三层配置管理
Speakers
avatar for Benjamin Huo

Benjamin Huo

Manager of the Architect and Observability Team, QingCloud Technologies, QingCloud Technologies
Benjamin Huo leads QingCloud Technologies' Architect team and Observability Team. He is the founding member of KubeSphere and the co-author of Fluent Operator, Kube-Events, Notification Manager, OpenFunction, and most recently eBPFConductor. He loves cloud-native technologies especially... Read More →
avatar for Junhao Zhang

Junhao Zhang

Senior Software Engineer, QingCloud Technologies
Junhao Zhang, Senior Development Engineer at QingCloud Technologies, is responsible for the research and development of container platform monitoring, alerting, and other cloud-native services. With many years of industry experience, he has previously held positions at companies such... Read More →
Thursday August 22, 2024 11:50am - 12:25pm HKT
Level 2 | Grand Ballroom 1-2
  KubeCon + CloudNativeCon Sessions, Observability

11:50am HKT

Building a High-Performance Time Series Database from Scratch: Optimization Strategies | 从零开始构建高性能时序数据库:优化策略 - Aliaksandr Valialkin, VictoriaMetrics
Thursday August 22, 2024 11:50am - 12:25pm HKT
Application Performance Monitoring and Kubernetes monitoring in their current state are pretty expensive. The average VictoriaMetrics installation is processing 2-4 million samples/s on the ingestion path, and 20-40 million samples/s on the read path. The biggest installations account for 100 million samples/s on the ingestion path. This requires being very clever with data pipelines to keep them efficient and scalable by adding more resources. In this session, we'll explore essential optimizations to maintain database speed such as string interning, caching results, goroutine management and utilizing sync.Pool for efficient resource management. These techniques help strike a balance between performance and resource consumption. This talk focuses on practical strategies for enhancing database speed.

在当前状态下,应用程序性能监控和Kubernetes监控非常昂贵。平均VictoriaMetrics安装在摄入路径上处理2-4百万样本/秒,在读取路径上处理20-40百万样本/秒。最大的安装在摄入路径上占据了1亿样本/秒。这需要通过对数据管道进行非常聪明的优化,通过增加更多资源来保持其高效和可扩展性。在本场演讲中,我们将探讨保持数据库速度的基本优化,如字符串内部化、缓存结果、goroutine管理和利用sync.Pool进行有效的资源管理。这些技术有助于在性能和资源消耗之间取得平衡。本次演讲侧重于增强数据库速度的实用策略。
Speakers
avatar for Hui Wang

Hui Wang

Software Engineer, VictoriaMetrics
I'm working on monitoring at VictoriaMetrics. My passion is cloud-native technologies and opensource.
avatar for Aliaksandr Valialkin

Aliaksandr Valialkin

CTO, VictoriaMetrics
Aliaksandr is a co-founder and the principal architect of VictoriaMetrics. He is also a well-known author of the popular performance-oriented libraries: fasthttp, fastcache and quicktemplate. He holds a Master’s Degree in Computer Software Engineering. He decided to found VictoriaMetrics... Read More →
Thursday August 22, 2024 11:50am - 12:25pm HKT
Level 1 | Hung Hom Room 2

11:50am HKT

Redefining Service Mesh: Leveraging EBPF to Optimize Istio Ambient Architecture and Performance | 重新定义服务网格:利用eBPF优化Istio环境架构和性能 - Yuxing Zeng, Alibaba Cloud
Thursday August 22, 2024 11:50am - 12:25pm HKT
Istio Ambient separates the L4/L7 functions found in the traditional sidecar model and introduces the ztunnel component, which implement the L4 network load balancing and secure zero-trust. However, as ztunnel is deployed at the node level with DaemonSet, any malfunction or anomaly in ztunnel may impact the traffic of all mesh-related pods under that node. Furthermore, performance tests of Ambient Mesh have not delivered the anticipated outcomes; ztunnel often becomes a performance bottleneck. These factors make it challenging to apply Ambient Mesh in production environments. it appears that we require a more optimized and practical implementation solution. This session will share: 1. An introduction to the architecture of Istio Ambient Mesh, along with current known issues with the existing implement. 2. using eBPF to implement zero-trust and L4 network traffic capabilities, enhancing the stability of the Mesh network, and significantly improving overall performance.

Istio Ambient将传统的边车模型中发现的L4/L7功能分离,并引入了ztunnel组件,实现了L4网络负载均衡和安全的零信任。然而,由于ztunnel部署在节点级别的DaemonSet上,ztunnel中的任何故障或异常可能会影响该节点下所有与网格相关的Pod的流量。此外,Ambient Mesh的性能测试并未达到预期的结果;ztunnel经常成为性能瓶颈。这些因素使得在生产环境中应用Ambient Mesh变得具有挑战性。看起来我们需要一个更优化和实用的实现解决方案。 本次会话将分享: 1. Istio Ambient Mesh架构的介绍,以及现有实现中已知的问题。 2. 使用eBPF实现零信任和L4网络流量功能,增强Mesh网络的稳定性,并显著提高整体性能。
Speakers
avatar for Jesse Zeng

Jesse Zeng

Technical Expert, Alibaba Cloud
Yuxing Zeng is a technical expert in the Container Service Team at Alibaba Cloud. He is also a Istio Member、Envoy Contributor. He has rich experience in cloud native fields such as Kubernetes、Istio、 Envoy, etc.
Thursday August 22, 2024 11:50am - 12:25pm HKT
Level 1 | Hung Hom Room 1
  KubeCon + CloudNativeCon Sessions, Connectivity

11:50am HKT

Unlocking Scalability and Simplifying Multi-Cloud Management with Karmada and PipeCD | 使用Karmada和PipeCD解锁可扩展性并简化多云管理 - Khanh Tran, CyberAgent, Inc. & Hongcai Ren, Huawei
Thursday August 22, 2024 11:50am - 12:25pm HKT
In the new AI coming age, it has become inevitable for any organizations to embrace the multi-cloud approach. Managing applications across multiple clouds can present various challenges, including resilience, performance, security, cost, and deployment management. How well did you prepare yourself and your services for that new coming age? This presentation will introduce Karmada and PipeCD, two powerful tools designed to support organizations in effectively addressing these challenges and achieving seamless multi-cloud management. Karmada is a multi-cloud container orchestration, while PipeCD is a multi-cloud continuous delivery solution. Both tools are built based on extensive experience in managing applications at scale across multiple clouds. We will delve into the key features and benefits of Karmada and PipeCD, and how they can simplify multi-cloud management. Together, we can unlock the true potential of multi-cloud systems and empower organizations to thrive in the era of AI.

在新的人工智能时代,任何组织都不可避免地需要采用多云方法。在多个云上管理应用程序可能会带来各种挑战,包括弹性、性能、安全性、成本和部署管理。您为新时代做好了多少准备?本次演讲将介绍Karmada和PipeCD,这两款强大的工具旨在支持组织有效应对这些挑战,实现无缝的多云管理。Karmada是一个多云容器编排工具,而PipeCD是一个多云持续交付解决方案。这两款工具都是基于在多个云上管理应用程序的丰富经验构建的。我们将深入探讨Karmada和PipeCD的关键特性和优势,以及它们如何简化多云管理。让我们一起释放多云系统的真正潜力,赋予组织在人工智能时代蓬勃发展的力量。
Speakers
avatar for Hongcai Ren

Hongcai Ren

Senior Software Engineer, Huawei
Hongcai Ren(@RainbowMango) is the CNCF Ambassador, who has been working on Kubernetes and other CNCF projects since 2019, and is the maintainer of the Kubernetes and Karmada projects.
avatar for Khanh Tran

Khanh Tran

Software Engineer, CyberAgent, Inc.
Khanh is a maintainer of the PipeCD project. He is currently employed at CyberAgent Inc, and responsible for the CI/CD system across the organization. As a member of the developer productivity team, his primary focus is on automation and anything that enhances the development process... Read More →
Thursday August 22, 2024 11:50am - 12:25pm HKT
Level 1 | Hung Hom Room 7
  KubeCon + CloudNativeCon Sessions, Platform Engineering

11:50am HKT

Security Threat Model Analysis and Protection Practice in Edge Computing Scenarios | 边缘计算场景中的安全威胁模型分析和保护实践 - Yue Bao, Huawei & Huan Wei, HarmonyCloud
Thursday August 22, 2024 11:50am - 12:25pm HKT
Cloud native is rapidly developing towards multi-cloud, hybrid cloud and edge computing, which are becoming key trends in cloud native development. However, in the edge computing scenario, the traditional VPC-based security model is difficult to ensure safe production. There are more and more challenges faced, including weak edge security mechanisms, vulnerable service interfaces exposed to the outside network, vulnerable end device access protocols, and supply chain security risks. In 2023, KubeEdge completed its security audit. This talk will presents the work around the audit, including the threat model, fuzzing efforts and Tips about how to get started with contributing to KubeEdges continued security. Since the completion of the audit, KubeEdge has worked on several initiatives to improve the security of its consumers, and the talk will cover these. One of these initiatives was SLSA L3 compliance, and the presentation will present what has been done and how it helps the community.

云原生正迅速发展为多云、混合云和边缘计算,这些正在成为云原生开发的关键趋势。然而,在边缘计算场景中,传统的基于VPC的安全模型很难确保安全生产。面临的挑战越来越多,包括边缘安全机制薄弱、暴露于外部网络的易受攻击的服务接口、易受攻击的终端设备访问协议以及供应链安全风险。 2023年,KubeEdge完成了安全审计。本次演讲将介绍围绕审计的工作,包括威胁模型、模糊测试工作以及如何开始为KubeEdge持续安全做出贡献的提示。 自完成审计以来,KubeEdge已经开展了多项改进其消费者安全性的倡议,本次演讲将涵盖这些内容。其中一个倡议是SLSA L3合规性,演示将展示已经完成的工作以及它如何帮助社区。
Speakers
avatar for Huan Wei

Huan Wei

Chief Architect, HarmonyCloud
Chief architect of HarmonyCloud. He designs and implements private cloud construction for many large enterprise customers. Huan has 10+ years of experience on software design and development across a variety of industries and technology bases, including cloud computing, micro service... Read More →
avatar for Yue Bao

Yue Bao

Senior Software Engineer, Huawei Cloud Computing Technology Co., Ltd.
Yue Bao serves as a software engineer of Huawei Cloud. She is now working 100% on open source and the member of KubeEdge maintainers, focusing on lightweight edge and edge api-server for KubeEdge. Before that, Yue worked on Huawei Cloud Intelligent EdgeFabric Service and participated... Read More →
Thursday August 22, 2024 11:50am - 12:25pm HKT
Level 1 | Hung Hom Room 5
  Open Source Summit Sessions, Supply Chain Security

12:25pm HKT

Lunch 🍜 | 午餐
Thursday August 22, 2024 12:25pm - 1:50pm HKT
Thursday August 22, 2024 12:25pm - 1:50pm HKT
Level 2 | Grand Ballroom 3-4

12:30pm HKT

Project Pavilion Tour with Jorge Castro, CNCF | 与 Jorge Castro 进行的 CNCF 项目展厅之旅
Thursday August 22, 2024 12:30pm - 12:50pm HKT
Explore the Project Pavilion, a hub of innovation and discovery! Take part in daily tours, interact with project maintainers at their kiosks, gain insights on community engagement and KCD event organization, and learn more about certification opportunities to showcase your expertise.

Join cloud veteran Jorge Castro as he takes you on a guided tour of our cloud native projects. This tour will include an introduction to the Pavilion, making introductions, interacting with maintainers, and ensuring you end up talking to the right projects!

Meeting Point: Please meet Jorge over at the Project Pavilion at the sign "CNCF Project Team Here to Help!"
Thursday August 22, 2024 12:30pm - 12:50pm HKT
Level 2 | Grand Ballroom 3-4 | Project Pavilion

1:50pm HKT

Gateway API and Beyond: Introducing Envoy Gateway's Gateway API Extensions | 网关API及更多:介绍Envoy网关的网关API扩展 - Huabing Zhao, Tetrate
Thursday August 22, 2024 1:50pm - 2:25pm HKT
Envoy Gateway, a new member of the Envoy project family, efficiently manages Envoy-based application gateways. In strict adherence to the Kubernetes Gateway API, it amplifies its functionalities by leveraging custom resource definitions (CRDs) in areas where the Gateway API hasn't yet ventured. This presentation will delve into the Gateway API extensions of Envoy Gateway, specifically focusing on ClientTrafficPolicy, BackendTrafficPolicy, and SecurityPolicy. We'll explore their practical applications in managing and securing edge traffic for cloud-native applications. Additionally, we'll discuss a strategic approach for potentially integrating these extensions into the formal Gateway API specifications.

Envoy Gateway是Envoy项目家族的新成员,有效地管理基于Envoy的应用网关。严格遵循Kubernetes Gateway API,通过利用自定义资源定义(CRDs),在Gateway API尚未涉足的领域增强其功能。本次演示将深入探讨Envoy Gateway的Gateway API扩展,特别关注ClientTrafficPolicy、BackendTrafficPolicy和SecurityPolicy。我们将探讨它们在管理和保护云原生应用程序边缘流量方面的实际应用。此外,我们将讨论一个战略方法,可能将这些扩展集成到正式的Gateway API规范中。
Speakers
avatar for Huabing Zhao

Huabing Zhao

Engineer, Tetrate
Huabing Zhao is a software engineer at Tetrate and a CNCF ambassador. He has developed a managed service mesh product on the cloud and assisted a lot of users in deploying Istio service mesh in production. He also founded Aeraki Mesh, a CNCF sandbox project that facilitates non-HTTP... Read More →
Thursday August 22, 2024 1:50pm - 2:25pm HKT
Level 1 | Hung Hom Room 6

1:50pm HKT

Choose Your Own Adventure: The Struggle for Security | 选择你的冒险:安全之战 - Whitney Lee, VMware Tanzu & Viktor Farcic, Upbound
Thursday August 22, 2024 1:50pm - 2:25pm HKT
Our hero, a running application in a Kubernetes production environment, knows they are destined for greater things! They are serving end users, but currently, they are also endangering those users, the system, and themselves! But the struggle for security is HARD, filled with system design choices concerning secrets management; cluster-level and runtime policies; and securing pod-to-pod communications. It is up to you, the audience, to guide our hero and help them grow from a vulnerable, unprotected application to their final form⎯an app that is more secure against invasion. In their third ‘Choose Your Own Adventure’-style talk, Whitney and Viktor will present choices that an anthropomorphized app must make as they try to protect themselves against every kind of exploit. Throughout the presentation, the audience (YOU!) will vote to decide our hero app's path! Can we navigate CNCF projects to safeguard our app, system, and users against attack before the session time elapses?

我们的英雄是一个在Kubernetes生产环境中运行的应用程序,他知道自己注定要成为更伟大的存在!他正在为最终用户提供服务,但目前却也在危及这些用户、系统和自己!但是安全的斗争是艰难的,充满了关于秘钥管理、集群级别和运行时策略以及保护Pod之间通信的系统设计选择。 观众们,你们将扮演引导我们英雄并帮助他们从一个脆弱、无保护的应用程序成长为更加安全抵御入侵的终极形态的角色。在这场第三场“选择你自己的冒险”风格的演讲中,Whitney和Viktor将呈现一个拟人化应用程序必须做出的选择,以试图保护自己免受各种利用。在整个演示过程中,观众(就是你!)将投票决定我们英雄应用程序的道路!在演讲结束之前,我们能否通过探索CNCF项目来保护我们的应用程序、系统和用户免受攻击呢?
Speakers
avatar for Viktor Farcic

Viktor Farcic

Developer Advocate, Upbound
Viktor Farcic is a lead rapscallion at Upbound, a member of the CNCF Ambassadors, Google Developer Experts, CDF Ambassadors, and GitHub Stars groups, and a published author. He is a host of the YouTube channel DevOps Toolkit and a co-host of DevOps Paradox.
avatar for Whitney Lee

Whitney Lee

Developer Advocate, VMware Tanzu
Whitney is a lovable goofball and a CNCF Ambassador who enjoys understanding and using tools in the cloud native landscape. Creative and driven, Whitney recently pivoted from an art-related career to one in tech. You can catch her lightboard streaming show ⚡️ Enlightning on Tanzu.TV... Read More →
Thursday August 22, 2024 1:50pm - 2:25pm HKT
Level 1 | Hung Hom Room 2
  KubeCon + CloudNativeCon Sessions, Cloud Native Novice

1:50pm HKT

Implement Auto Instrumentation Under GraalVM Static Compilation on OTel Java Agent | GraalVM 静态编译下 OTel Java Agent 的自动增强方案与实现 - Zihao Rao & Ziyi Lin, Alibaba Cloud
Thursday August 22, 2024 1:50pm - 2:25pm HKT
GraalVM static compilation has a significant effect on improving Java application startup speed and runtime memory usage. It is very valuable for the Java to flourish in Cloud Native ecosystem. However, the automatic instrumentation originally provided based on Java Agent will become invalid after static compilation. We designed a static instrumentation solution in GraalVM to solve above problem. This speech will introduce the overall design idea of the solution and related test results in OTel Java Agent.

GraalVM静态编译对于提升Java应用的启动速度和运行时内存占用有着显著的效果,对于Java在云生态中的蓬勃发展有着十分宝贵的价值。然而,原本基于Java Agent提供的自动插桩功能在静态编译之后将会失效。针对上述问题我们在GraalVM中设计了静态插桩方案,本演讲将介绍该方案的整体设计思路以及在OTel Java Agent中的相关测试结果。
Speakers
avatar for Zihao Rao

Zihao Rao

Software Engineer, Alibaba Cloud
Zihao is a software engineer at Alibaba Cloud. Over the past few years, he has participated in several well-known open source projects, he is steering committee member of Spring Cloud Alibaba project, and is a triager for OpenTelemetry Java Instrumentation now.
avatar for Ziyi Lin

Ziyi Lin

Senior Software Engineer, Alibaba Cloud
Author of book "Static compilation for Java in GraalVM: the principles and practice". ACM SIGSOFT distinguished paper award winner (ICSE'23). Committor of Apache incubating Teaclave Java TEE SDK(https://github.com/apache/incubator-teaclave-java-tee-sdk). Active contributor of GraalVM(https://github.com/pulls?q=is%3Apr+org%3Aoracle+author%3Aziyilin... Read More →
Thursday August 22, 2024 1:50pm - 2:25pm HKT
Level 2 | Grand Ballroom 1-2
  KubeCon + CloudNativeCon Sessions, Observability

1:50pm HKT

Testing and Release Patterns for Crossplane | 跨平面的测试和发布模式 - Yury Tsarev & Steven Borrelli, Upbound
Thursday August 22, 2024 1:50pm - 2:25pm HKT
Crossplane has become the foundation of many Internal Developer Platforms (IDPs). A requirement for any IDP in production is the ability to make changes and upgrades to the platform with confidence. This talk will cover testing and release patterns based on our experience building production-ready environments across a range of Crossplane users. We’ll cover the lifecycle of a Crossplane Composition upgrade, from local commit to pull request to target customer environment, end-to-end testing tools, handling API changes, and how to control updates to customer environments. For quite a while, testing Crossplane Compositions meant relying exclusively on costly end-to-end layers. In this talk, we're unveiling new unit testing capabilities that allow you to evaluate and test your Composition code in complete isolation.

Crossplane已成为许多内部开发者平台(IDPs)的基础。在生产中,任何IDP的要求都是能够有信心地对平台进行更改和升级。 本次演讲将涵盖基于我们在跨多个Crossplane用户构建生产就绪环境的经验,讨论测试和发布模式。 我们将介绍Crossplane Composition升级的生命周期,从本地提交到拉取请求再到目标客户环境,端到端测试工具,处理API更改以及如何控制对客户环境的更新。 相当长一段时间以来,测试Crossplane Compositions意味着完全依赖昂贵的端到端层。在本次演讲中,我们将揭示新的单元测试功能,使您能够在完全隔离的环境中评估和测试您的Composition代码。
Speakers
avatar for Steven Borrelli

Steven Borrelli

Principal Solutions Architect, Upbound
Steven is a Principal Solutions Architect for Upbound, where he helps customers adopt Crossplane.
avatar for Yury Tsarev

Yury Tsarev

Principal Solutions Architect, Upbound
Yury is an experienced software engineer who strongly focuses on open-source, software quality and distributed systems. As the creator of k8gb (https://www.k8gb.io) and active contributor to the Crossplane ecosystem, he frequently speaks at conferences covering topics such as Control... Read More →
Thursday August 22, 2024 1:50pm - 2:25pm HKT
Level 1 | Hung Hom Room 7
  KubeCon + CloudNativeCon Sessions, Platform Engineering

1:50pm HKT

Unified Management, Continuity, Compliance in Multi-Clouds with Service Mesh | 在多云环境中通过服务网格实现统一管理、连续性和合规性 - Kebe Liu, DaoCloud
Thursday August 22, 2024 1:50pm - 2:25pm HKT
In multi-cloud and hybrid cloud architectures, enterprises face challenges like inter-cloud communication, traffic management, application orchestration, data security, and compliance. Service mesh technology offers a unified approach for managing service interactions, enhancing security, and ensuring data compliance. Istio, a leading service mesh project, is particularly effective in multi-cloud and hybrid cloud environments. It provides seamless network connectivity across various architectures, ensuring reliable and secure communication. Additionally, integrating Istio with Karmada enables efficient application scheduling across these complex environments. Karmada allows for smooth orchestration of workloads across different cloud platforms, enhancing the flexibility and scalability of cloud-native applications. I aim to share practical insights and experiences, especially from China, to inspire and provide strategic perspectives in navigating these technological landscapes.

在多云和混合云架构中,企业面临诸如云间通信、流量管理、应用编排、数据安全和合规性等挑战。服务网格技术提供了统一的管理服务交互方式,增强安全性,并确保数据合规性。 作为领先的服务网格项目,Istio在多云和混合云环境中特别有效。它提供了跨不同架构的无缝网络连接,确保可靠和安全的通信。此外,将Istio与Karmada集成,可以实现在这些复杂环境中高效的应用调度。Karmada允许在不同云平台上平稳地编排工作负载,增强云原生应用的灵活性和可扩展性。 我旨在分享实用的见解和经验,特别是来自中国,以激发并提供在这些技术领域中导航的战略视角。
Speakers
avatar for Kebe Liu

Kebe Liu

Senior software engineer, DaoCloud
Member of Istio Steering Committee, focused on cloud-native and Istio, eBPF and other areas in recent years. Founder of Merbridge project.
Thursday August 22, 2024 1:50pm - 2:25pm HKT
Level 1 | Hung Hom Room 1
  KubeCon + CloudNativeCon Sessions, Connectivity

1:50pm HKT

OS Migration Solution on Cloud | 云上操作系统迁移解决方案 - Jianlin Lv, eBay
Thursday August 22, 2024 1:50pm - 2:25pm HKT
Each Linux distribution has a lifecycle; this refers to when the OS developers stop providing updates or any form of support. Continuing to use EOL Linux poses risks such as security vulnerabilities, compatibility issues, and lack of official support. Cloud providers face the challenge of quickly and safely migrating OS to a supported distribution. There are several challenges involved in the process of migrating OS: 1. Ensuring the safety of application data, which is especially significant during OS migrations between different Linux distributions; 2. Customizing the OS based on the Linux distribution, which includes changes to the kernel, deb packages, specific configurations, and tools; 3. How to quickly rollout new OS to the production environment. Achieving the goal of transitioning over 100,000 physical nodes each month without affecting customer operations and minimizing node downtime. This talk will detail the issues encountered in OS migration and the proposed solutions.

每个Linux发行版都有一个生命周期;这指的是当操作系统开发者停止提供更新或任何形式的支持时。继续使用EOL Linux会带来风险,如安全漏洞、兼容性问题和缺乏官方支持。 云服务提供商面临着快速且安全地将操作系统迁移到受支持的发行版的挑战。 在迁移操作系统的过程中涉及到几个挑战: 1. 确保应用数据的安全性,在不同Linux发行版之间迁移操作系统时尤为重要; 2. 根据Linux发行版定制操作系统,包括对内核、deb软件包、特定配置和工具的更改; 3. 如何快速将新操作系统推出到生产环境。实现每月迁移超过10万个物理节点的目标,同时不影响客户运营并最小化节点停机时间。 本次演讲将详细介绍操作系统迁移中遇到的问题和提出的解决方案。
Speakers
avatar for Jianlin Lv

Jianlin Lv

Senior Linux Kernel Development Engineer, eBay
Jianlin Lv currently works at eBay CCOE as a Senior Kernel Engineer, responsible for the maintenance and release of eBay TessOS. He has long been involved in the development and maintenance of open-source software and operating systems and has contributed code to multiple open-source... Read More →
Thursday August 22, 2024 1:50pm - 2:25pm HKT
Level 1 | Hung Hom Room 5
  Open Source Summit Sessions, Operating Systems

1:50pm HKT

Model Openness Framework: The Path to Openness, Transparency and Collaboration in Machine Learning Models | 模型开放框架:机器学习模型中开放性、透明度和协作的路径 - Ibrahim Haddid & Cailean Osborne, The Linux Foundation
Thursday August 22, 2024 1:50pm - 3:15pm HKT
Generative AI (GAI) offers unprecedented opportunities for research and innovation, but its commercialization has raised concerns about transparency, reproducibility, and safety. Many open GAI models lack the necessary components for full understanding and reproducibility, and some use restrictive licenses whilst claiming to be "open-source"'. To address these concerns, the Generative AI Commons at the LF AI & Data Foundation has proposed the Model Openness Framework (MOF), a ranked classification system that rates machine learning models based on their completeness and openness, following principles of open science, open source, open data, and open access. The MOF requires specific components of the model development lifecycle to be included and released under appropriate open licenses. This framework aims to prevent misrepresentation of models claiming to be open, guide researchers and developers in providing all model components under permissive licenses, and help individuals and organizations identify models that can be safely adopted without restrictions.

In this talk, we will discuss the MOF, showcase a demonstration of the Model Openness Tool (the tool that implements the framework), and discuss the benefits the MOF offers to both model producers and consumers. We strongly believe that a wide adoption of the MOF will foster a more open AI ecosystem, benefiting research, innovation, and adoption of state-of-the-art models.

生成AI(GAI)为研究和创新提供了前所未有的机会,但其商业化引发了对透明度、可复现性和安全性的担忧。许多开放的GAI模型缺乏完全理解和可复现性所需的组件,而一些则使用限制性许可证,却声称是“开源”的。为了解决这些问题,LF AI & Data Foundation的生成AI Commons提出了模型开放性框架(MOF),这是一个排名分类系统,根据其完整性和开放性评估机器学习模型,遵循开放科学、开源、开放数据和开放获取的原则。MOF要求模型开发生命周期的特定组件必须包含并发布在适当的开放许可证下。该框架旨在防止声称开放的模型被误解,指导研究人员和开发者在宽松许可证下提供所有模型组件,并帮助个人和组织识别可以安全采纳而无需限制的模型。

在本次讲话中,我们将讨论MOF,并展示模型开放性工具(实施该框架的工具)的演示,探讨MOF对模型生产者和消费者所带来的好处。我们坚信广泛采用MOF将促进更加开放的AI生态系统,有利于研究、创新和最新模型的采用。
Speakers
avatar for Ibrahim Haddad

Ibrahim Haddad

Executive Director, LF AI & Data Foundation
.
avatar for Cailean Osborne

Cailean Osborne

Researcher, Linux Foundation
Cailean is a Researcher at the Linux Foundation and a PhD Candidate in Social Data Science at the Oxford Internet Institute, University of Oxford. His interests are in OSS, the digital commons, and public interest computing. Previously, Cailean worked as the International Policy Lead... Read More →
Thursday August 22, 2024 1:50pm - 3:15pm HKT
Level 1 | Hung Hom Room 3

2:40pm HKT

Find Your Own Personal Tutor for the Study of Kubernetes | 为学习Kubernetes找到适合您的个人导师 - Hoon Jo, Megazone
Thursday August 22, 2024 2:40pm - 3:15pm HKT
Kubernetes novice users ask questions to stackoverflow or community or friends :) when they encounter the problem. However it needs to explain my environment and the background information. Even though it is not a guaranteed answer from someone. Thus I suggest to use K8sGPT with ollama to leverage the lack of knowledge at this moment. Furthermore, k8sGPT provides interactive mode that is able to ask continuing questions until I receive enough answers. Plus it could be helpful to ask other language who is not familiar with English. (Mostly it is big concern from the beginning of the stage) I highly recommend using K8sGPT to study who is a newcomer for soft landing in Kubernetes world.

在KubeCon上,我们将讨论Kubernetes新手用户在遇到问题时通常会向stackoverflow、社区或朋友提问的情况。然而,我们需要解释我的环境和背景信息。虽然并不能保证会得到答案,但我建议使用K8sGPT与ollama来弥补当前知识的不足。此外,k8sGPT提供交互模式,可以持续提问直到我得到足够的答案。此外,对于不熟悉英语的人来说,询问其他语言可能会有所帮助(这在刚开始阶段时是一个大问题)。我强烈推荐使用K8sGPT来帮助新手顺利进入Kubernetes世界。
Speakers
avatar for Hoon Jo

Hoon Jo

Cloud Solutions Architect | Cloud Native Engineer,, Megazone
Hoon Jo is Cloud Solutions Architect as well as Cloud Native engineer at Megazone. He has many times of speaker experience for cloud native technologies. And spread out Cloud Native Ubiquitous in the world. He wrote 『Python for System/Network Administrators』 (Wikibooks, 2017... Read More →
Thursday August 22, 2024 2:40pm - 3:15pm HKT
Level 1 | Hung Hom Room 2
  KubeCon + CloudNativeCon Sessions, Cloud Native Novice

2:40pm HKT

Kelemetry: Global Control Plane Tracing for Kubernetes | Kelemetry:面向Kubernetes控制面的全局追踪系统 - Wei Shao & Jonathan Chan, ByteDance
Thursday August 22, 2024 2:40pm - 3:15pm HKT
Debugging Kubernetes system issues is complicated: different controllers manipulate objects independently, sometimes triggering changes in other controllers. Unlike traditional RPC-based services, the relationship between components is not explicit; identifying which component causes an issue could be like finding a needle in a haystack. Components expose their own fragmented data, often limited to the lifecycle of a single request and fail to illustrate the bigger picture of asynchronous causal events. This talk introduces Kelemetry, a global tracing system for the Kubernetes control plane using scattered data sources from audit log, events, informers and component traces. Through several demonstrations of troubleshooting online problems, we will see how Kelemetry reveals the state transition of related objects over a long timespan and reconstructs the causal hierarchy of events to provide intuitive insight into the What, When and Why of everything going on in a Kubernetes system.

调试Kubernetes系统问题是复杂的:不同的控制器独立地操作对象,有时会触发其他控制器的变化。与传统的基于RPC的服务不同,组件之间的关系并不明确;确定哪个组件引起了问题就像在一堆草堆中找针一样困难。组件展示它们自己的碎片化数据,通常仅限于单个请求的生命周期,并未展示异步因果事件的整体情况。 本次演讲介绍了Kelemetry,这是一个利用审计日志、事件、通知器和组件跟踪的分散数据源的Kubernetes控制平面全局跟踪系统。通过几次在线问题排查演示,我们将看到Kelemetry如何揭示相关对象在长时间跨度内的状态转换,并重建事件的因果层次结构,以提供对Kubernetes系统中发生的一切的直观洞察。
Speakers
avatar for Wei Shao

Wei Shao

Senior Software Engineer, ByteDance
Wei Shao is a tech lead on the Orchestration & Scheduling team at ByteDance, and a maintainer of KubeWharf projects. Wei has 6+ years of experience in the cloud native area, focusing on resource management and performance-enhanced systems in K8s. Wei led the development of multiple... Read More →
avatar for Jonathan Chan

Jonathan Chan

Software engineer, ByteDance
Jonathan is a software engineer at ByteDance working on Kubernetes related infrastructure such as observability systems and cluster federation. He is also a passionate contributor to a number of open source projects.
Thursday August 22, 2024 2:40pm - 3:15pm HKT
Level 2 | Grand Ballroom 1-2
  KubeCon + CloudNativeCon Sessions, Observability

2:40pm HKT

NanoVisor: Revolutionizing FaaS Cold Start Performance with Secure, Lightweight Container Runtime | NanoVisor:通过安全、轻量级容器运行时改变FaaS冷启动性能 - Tianyu Zhou, Ant Group
Thursday August 22, 2024 2:40pm - 3:15pm HKT
Function as a Service(FaaS) is booming, but cold start time, the time it takes to create a new container for a function, remains a significant bottleneck. This not only impacts user experience with noticeable delays, but also incurs unnecessary costs due to wasted resources. NanoVisor, a groundbreaking container runtime built on gVisor, tackles the challenge of slow cold start time in FaaS. It achieves this by a series of optimizations specifically designed for FaaS: lightweight containerd interaction for faster setup, read-only filesystem for enhanced efficiency, and a sandbox fork mechanism that replaces the heavy container creation for significant performance gains. These empower NanoVisor to create secure, sandboxed containers ready for function execution within an astonishing 5ms,

Function as a Service(FaaS)正在蓬勃发展,但冷启动时间,即为函数创建新容器所需的时间,仍然是一个重要的瓶颈。这不仅影响用户体验,导致明显的延迟,还因浪费资源而产生不必要的成本。NanoVisor是一种基于gVisor构建的开创性容器运行时,解决了FaaS中慢冷启动时间的挑战。它通过一系列专为FaaS设计的优化来实现:轻量级的containerd交互以加快设置速度,只读文件系统以提高效率,以及一个替代繁重容器创建的沙箱分叉机制,以获得显著的性能提升。这些优化使NanoVisor能够在惊人的5毫秒内创建安全的、沙箱化的容器,每个实例的内存开销不到1MB,每个节点的QPS为1.5K。它已成功应用于蚂蚁集团的生态系统,包括支付宝云基地和SOFA Function,以及CI/CD加速。
Speakers
avatar for Tianyu Zhou

Tianyu Zhou

System Engineer, Ant Group
Tianyu Zhou, a system engineer at Ant Group. I graduated from Zhejiang University with a master's degree in cyberspace security. My research interests include kernel, system security and container security.
Thursday August 22, 2024 2:40pm - 3:15pm HKT
Level 1 | Hung Hom Room 7
  KubeCon + CloudNativeCon Sessions, Emerging + Advanced

2:40pm HKT

Panel: Fragmentation of the Scheduling in Kubernetes and Challenges for AI/ML Workloads | 圆桌:Kubernetes调度社区碎片化现状及如何应对AI/ML工作负载带来的挑战 - Kante Yin, DaoCloud; Li Tao, Independent; William Wang, Huawei Cloud Technologies Co., LTD; 秋萍 戴, daocloud; Yuquan Ren, B
Thursday August 22, 2024 2:40pm - 3:15pm HKT
Scheduler is one of the most frequently customized components in Kubernetes, owing to its expandability. However, too many schedulers lead to decision paralysis among users, which has been discussed extensively in the past KubeCons. To help mitigate the confusion of users, four maintainers from various community (Godel-Scheduler, Koordinator, Kubernetes SIG-Scheduling and Volcano) are invited to profile the background and usecases behind these projects. Also the panel will discuss the gap between upstream Kubernetes and downstream projects and try to abstract the common patterns or functionalities which can be pushed to the upstream to avoid reimplementing the wheel, and what should still be defined loosely to preserve the expandability. Moreover, with the rise of AI, scheduling AI workloads in Kubernetes poses a significant challenge, the panel will discuss where we're right now and where we're head for, as well as the opportunities of cooperations.

调度器是Kubernetes中最经常定制的组件之一,这归功于其可扩展性。然而,过多的调度器会导致用户决策瘫痪,这在过去的KubeCon中已经被广泛讨论过。为了帮助减轻用户的困惑,我们邀请了来自各个社区(Godel-Scheduler、Koordinator、Kubernetes SIG-Scheduling和Volcano)的四位维护者来介绍这些项目背后的背景和用例。 此外,本小组讨论将探讨上游Kubernetes和下游项目之间的差距,并尝试提炼出可以推送到上游的常见模式或功能,以避免重新实现轮子,以及什么应该保持松散定义以保留可扩展性。 此外,随着人工智能的兴起,在Kubernetes中调度AI工作负载面临着重大挑战,本小组讨论将探讨我们目前的状况以及我们未来的发展方向,以及合作的机会。
Speakers
avatar for Yuquan Ren

Yuquan Ren

Cloud Native Architect, ByteDance
Yuquan Ren has 10+ years of working experience in the cloud-native field, contributing extensively to open-source projects such as Kubernetes. Currently, he is a tech leader at ByteDance, primarily focusing on the field of orchestration and scheduling.
avatar for Kante Yin

Kante Yin

Senior Software Engineer, DaoCloud
Kante is a senior software engineer and an open source enthusiast. He's currently working at the Kubernetes platform team at DaoCloud based in Shanghai, mostly around scheduling, resource management and inference. He also works on upstream Kubernetes as SIG-Scheduling Maintainer and... Read More →
avatar for Tao Li

Tao Li

Koordinator Co-founder&Maintainer, N/A
Tao Li is a seasoned Senior Software Engineer with a specialization in K8s scheduling. With extensive practical experience in large-scale K8s cluster scheduling technology, Tao has been deeply participated in the research and development of K8s scheduling systems both within Alibaba... Read More →
avatar for 秋萍 戴

秋萍 戴

product mananger, daocloud
QiuPing Dai is a senior Technology Product Manager at DaoCloud for 5 years and involved in Cloud Computing ( including Kubernetes Computing, Storage, Network) development work. Before that, Qiuping worked at IBM for Cloud Computing. QiuPing is interested in Storage, Network , Scheduling... Read More →
Thursday August 22, 2024 2:40pm - 3:15pm HKT
Level 1 | Hung Hom Room 1
  KubeCon + CloudNativeCon Sessions, Emerging + Advanced

2:40pm HKT

WebAssembly on the Server | 服务端的WebAssembly - Vivian Hu, Second State
Thursday August 22, 2024 2:40pm - 3:15pm HKT
As CNCF Annual Survey 2022 key findings described, “Containers are the new normal, and WebAssembly (Wasm) is the future.” Wasm is playing an important role in cloud native area. Before Wasm, Linux containers are commonly used to run these compiled applications in the cloud — eg a Rust or C++ app is compiled to x86_64 machine code and runs inside a Linux container. Wasm provides a more secure, much lighter, faster, and more portable alternative to Linux containers for this type of performance-minded server-side applications. Currently, CNCF hosts three Wasm-focused projects, like WasmEdge, WasmCould, and runwasi. This talk will discuss WebAssembly on the server side. You will learn the integration between Wasm and the existing container tools, use cases of WebAssembly on the server side. Going forward, we will also discuss the role of Wasm in the LLM applications.

根据CNCF年度调查2022的关键发现,“容器是新常态,WebAssembly(Wasm)是未来。” Wasm在云原生领域发挥着重要作用。 在Wasm出现之前,Linux容器通常用于在云中运行这些编译应用程序 - 例如,Rust或C++应用程序被编译为x86_64机器代码,并在Linux容器内运行。相比于Linux容器,Wasm为这类性能导向的服务器端应用程序提供了更安全、更轻量、更快速和更可移植的替代方案。 目前,CNCF托管了三个以Wasm为重点的项目,如WasmEdge、WasmCould和runwasi。本次演讲将讨论服务器端的WebAssembly。您将了解Wasm与现有容器工具的集成,以及服务器端WebAssembly的用例。此外,我们还将讨论Wasm在LLM应用程序中的作用。
Speakers
avatar for Xiaowei

Xiaowei

Product Manager, Second State
Vivian Hu is a Product Manager at Second State and a columnist at InfoQ. She is a founding member of the WasmEdge project. She organizes Rust and WebAssembly community events in Asia.
Thursday August 22, 2024 2:40pm - 3:15pm HKT
Level 1 | Hung Hom Room 6
  KubeCon + CloudNativeCon Sessions, Cloud Native Novice

2:40pm HKT

Open Sourcing the Future of Z: Unleashing Innovation on the Mainframe | 开源Z的未来:释放大型机上的创新 - Dong Ma & Chen ji, IBM; Mike Friesenegger, SUSE
Thursday August 22, 2024 2:40pm - 3:15pm HKT
The IBM Z platform, known for its security, reliability, and high-volume transaction processing, has long been a cornerstone of enterprise computing. However, the traditional closed-source approach to Z development has limited innovation and collaboration. This talk explores the growing movement towards open-source software for Z, examining the technical and strategic considerations. Discuss the challenges of the closed-source model for Z, highlight successful examples of open mainframe projects like Feilong project. Discuss technical challenges of developing open-source software for Z, offering potential solutions and strategies to overcome these hurdles. Discuss the benefits of open-source Z development for developers.

IBM Z平台以其安全性,可靠性和高交易处理量而闻名,长期以来一直是企业计算的支柱。然而,传统的封闭源码开发方式限制了创新和合作。本次演讲探讨了向Z开放源码软件的不断发展,审视了技术和战略考虑因素。讨论了Z封闭源码模型的挑战,重点介绍了像Feilong项目这样的开源主机项目的成功案例。讨论了为Z开发开源软件的技术挑战,提供了潜在解决方案和克服这些障碍的策略。讨论了开源Z开发对开发人员的好处。
Speakers
avatar for Dong Ma

Dong Ma

Software Engineer, IBM
Dong Ma is a Software Engineer at IBM, Open Mainframe Project and CD Foundation Ambassador. He now works on IBM Cloud Infrastructure Center, offering on-premises cloud deployments on the IBM Z and IBM LinuxONE platforms. He’s been an active technical contributor to OpenStack since... Read More →
avatar for ji chen

ji chen

IBM Senior Technical Staff Member, IBM
Ji Chen is a software architect working on zSystem and LinuxONE platform in IBM ,contribute to various CNCF projects such as Kepler, Cluster-API, Cloud Provider etc
avatar for Mike Friesenegger

Mike Friesenegger

Solutions Architect, SUSE
Mike is a solutions architect in the SUSE Integrated Solutions team. He works closely with a number of key hardware partners to identify, test and document joint solutions that help SUSE create unique value in the marketplace. One of his specialties include Linux on IBM Z and LinuxONE... Read More →
Thursday August 22, 2024 2:40pm - 3:15pm HKT
Level 1 | Hung Hom Room 5

3:15pm HKT

Coffee Break ☕ | 茶歇
Thursday August 22, 2024 3:15pm - 3:35pm HKT
Thursday August 22, 2024 3:15pm - 3:35pm HKT
Level 2 | Grand Ballroom 3-4

3:25pm HKT

Project Pavilion Tour with Jorge Castro, CNCF | 与 Jorge Castro 进行的 CNCF 项目展厅之旅
Thursday August 22, 2024 3:25pm - 3:45pm HKT
Explore the Project Pavilion, a hub of innovation and discovery! Take part in daily tours, interact with project maintainers at their kiosks, gain insights on community engagement and KCD event organization, and learn more about certification opportunities to showcase your expertise.

Join cloud veteran Jorge Castro as he takes you on a guided tour of our cloud native projects. This tour will include an introduction to the Pavilion, making introductions, interacting with maintainers, and ensuring you end up talking to the right projects!

Meeting Point: Please meet Jorge over at the Project Pavilion at the sign "CNCF Project Team Here to Help!"
Thursday August 22, 2024 3:25pm - 3:45pm HKT
Level 2 | Grand Ballroom 3-4 | Project Pavilion

3:35pm HKT

Empower Large Language Models (LLMs) Serving in Production with Cloud Native AI Technologies | 利用云原生人工智能技术在生产环境中赋能大型语言模型(LLMs) - Lize Cai, SAP & Yang Che, Alibaba Cloud Intelligence
Thursday August 22, 2024 3:35pm - 4:10pm HKT
LLMs have heightened public expectations of generative models. However, as noted in the Gartner report, running AI applications in production poses significant challenges. To tackle the challenges, we have redesigned and optimized the software capabilities of Cloud Native AI Technologies. By extending KServe to handle OpenAI's streaming requests, it can accommodate the inference load of LLM. With Fluid and Vineyard, It shows a result of reducing Llama-30B model loading time from 10 minutes to under 25 seconds. However, the above optimizations do not stop there. Since LLM loading is not a high-frequency operation,It is crucial to utilize cronHPA for timed auto-scaling in order to achieve a balance between cost and performance, and to evaluate the cost-effectiveness of the scaling process. As KServe and Fluid's reviewer and maintainer, we share our insights on the challenges in the session. We will showcase effective use of Cloud Native AI and share our experiences in production.

LLM让公众对生成式大模型的期望提高。然而,正如Gartner报告所指出的,将AI应用程序投入生产中存在重大挑战。为了解决这些挑战,我们重新设计和优化了云原生AI技术的软件能力。通过扩展KServe以处理OpenAI的流式请求,它可以容纳LLM的推理负载。通过Fluid和Vineyard,我们成功将Llama-30B模型的加载时间从10分钟缩短到不到25秒。然而,上述优化并不止于此。由于LLM加载不是高频操作,利用cronHPA进行定时自动扩展至关重要,以实现成本和性能之间的平衡,并评估扩展过程的成本效益。作为KServe和Fluid的审阅者和维护者,我们在本场演讲中分享了对挑战的见解。我们将展示云原生AI的有效使用,并分享我们在生产中的经验。
Speakers
avatar for Yang Che

Yang Che

senior engineer, Alibaba Cloud Intelligence
Yang Che, is a senior engineer of Alibaba Cloud. He works in Alibaba cloud container service team, and focuses on Kubernetes and container related product development. Yang also works on building elastic machine learning platform on those technologies. He is an active contributor... Read More →
avatar for Lize Cai

Lize Cai

Senior Software Engineer, SAP
Lize is a senior software engineer at SAP, based in Singapore. With a strong product mindset, Lize has extensive experience in building enterprise-grade machine learning platforms. A passionate advocate for open source technology, Lize actively contributes to various projects, including... Read More →
Thursday August 22, 2024 3:35pm - 4:10pm HKT
Level 1 | Hung Hom Room 3

3:35pm HKT

Kubernetes Community Panel: A Decade of Evolution and Future Trends | Kubernetes维护者圆桌:十年演变与未来趋势 - Paco Xu & Mengjiao Liu, DaoCloud; Qiming Teng, Freelance; Klaus Ma, Nvidia; Pengfei Ni, Microsoft
Thursday August 22, 2024 3:35pm - 4:10pm HKT
Join us in celebrating the 10th anniversary of Kubernetes with a panel featuring some of the community's most influential contributors and maintainers from China. Over the past decade, Kubernetes has grown to the cornerstone of cloud-native infra, thanks to the dedication and innovation of its community members. In this panel, we will talk about our journeys with Kubernetes, share stories and experience, and discuss the future of Kubernetes in the next decade. Our panelists include current and previous owners, tech leads and maintainers. Feel free to join the panel to share your perspectives on the past and next decade of the Kubernetes community and ask anything about the community.

加入我们,与中国社区最具影响力的贡献者和维护者一起庆祝Kubernetes的十周年。在过去的十年里,由于社区成员的奉献和创新,Kubernetes已经发展成为云原生基础设施的基石。在这个专题讨论中,我们将谈论与Kubernetes的旅程,分享故事和经验,并讨论Kubernetes在未来十年的发展。我们的专题讨论嘉宾包括现任和前任所有者、技术负责人和维护者。欢迎加入专题讨论,分享您对Kubernetes社区过去和未来十年的看法,并提出任何关于社区的问题。
Speakers
avatar for Pengfei Ni

Pengfei Ni

Principal Software Engineer, Microsoft
Pengfei Ni is a Principal Software Engineer at Microsoft Azure and a maintainer of the Kubernetes project. With extensive experience in Cloud Computing, Kubernetes, and Software Defined Networking (SDN), he has delivered presentations at various conferences, including KubeCon, ArchSummit... Read More →
avatar for 徐俊杰 Paco

徐俊杰 Paco

Open Source Team Lead, DaoCloud
Paco is co-chair of KubeCon+CloudNativeCon China 2024, and a member of Kubernetes Steering Committee. He is the leader of open-source team in DaoCloud. He is also a KCD Chengdu 2022 organizer, and a speaker in KubeCon EU 2023 & 2024, and KubeCon China 2021. Paco is a kubeadm maintainer... Read More →
avatar for Qiming Teng

Qiming Teng

Architect, Freelance
Qiming has been a passionate open source contributor for more than 10 years. He was an active contributor to the OpenInfra community and the CNCF community. His interest spans from operating systems, programming languages to cloud platforms. His current research fields include the... Read More →
avatar for Mengjiao Liu

Mengjiao Liu

Software Engineer, DaoCloud
Mengjiao Liu is a Software Engineer at DaoCloud. She contributes to Kubernetes and serves as the WG Structured Logging Lead and SIG Instrumentation Reviewer, focusing on enhancing logging quality. Additionally, she actively participates in SIG Docs as a Chinese owner and English reviewer... Read More →
avatar for Klaus Ma

Klaus Ma

Principal Software Engineer, Nvidia
eam leader, system architect, designer, software developer with 10+ years of experience across a variety of industries and technology bases, including cloud computing, machine learning, bigdata and financial services. Founding Volcano & kube-batch, Kubernetes SIG-Scheduling co-Leader... Read More →
Thursday August 22, 2024 3:35pm - 4:10pm HKT
Level 1 | Hung Hom Room 2

3:35pm HKT

KubeSkoop: Deal with the Complexity of Network Issues and Monitoring with eBPF | KubeSkoop:使用eBPF处理网络问题和监控的复杂性 - Yutong Li, Alibaba Cloud & Bingshen Wang, AlibabaCloud
Thursday August 22, 2024 3:35pm - 4:10pm HKT
Troubleshooting network issues has always been one of the most difficult parts, especially on Kubernetes. Containerization and microservice results in a denser network topology and more dependencies on various layers of network stack modules, and the new network technology and architecture introduced by AI also provided a significant challenge in observability and diagnosis. We developed KubeSkoop, the networking monitoring and diagnosis suite for Kubernetes. With the eBPF technology, it provides a deep monitoring and tracing of Kubernetes network, to help users quickly locate the network jitter problem happened in the cluster. It also provides the network connectivity check ability, which can help users solve network connectivity issues by one click. This topic will introduce as follows: ● What makes Kubernetes networking complex. ● Introduction to KubeSkoop. ● How we use eBPF to monitor container networking. ● The practices of KubeSkoop in large-scale production environment.

网络问题的故障排除一直是最困难的部分之一,尤其是在Kubernetes上。容器化和微服务导致了更密集的网络拓扑结构,以及对各个网络堆栈模块的更多依赖,人工智能引入的新网络技术和架构也在可观察性和诊断方面提出了重大挑战。 我们开发了KubeSkoop,这是专为Kubernetes设计的网络监控和诊断套件。利用eBPF技术,它提供了对Kubernetes网络的深度监控和跟踪,帮助用户快速定位集群中发生的网络抖动问题。它还提供了网络连接性检查功能,可以帮助用户通过一键解决网络连接问题。 本主题将介绍以下内容: ● 什么使Kubernetes网络变得复杂。 ● KubeSkoop的介绍。 ● 我们如何使用eBPF来监控容器网络。 ● KubeSkoop在大规模生产环境中的实践。
Speakers
avatar for wang bingshen

wang bingshen

Senior Engineer, AlibabaCloud
Bingshen Wang is a Senior Engineer in Alibaba Could, a maintainer of KubeSkoop/Terway/OpenYurt, and a contributor of Kubernetes/Containerd. He mainly focuses on container networking and runtime, and has many years of experience around managing Alibaba Cloud Kubernetes clusters. He... Read More →
avatar for Tony Li

Tony Li

Software Engineer, Alibaba Cloud
Yutong Li is a Software Engineer at Alibaba Cloud. He is working on designing and maintaining container network for Alibaba Cloud Container Service, and open source Kubernetes networking diagnose tool KubeSkoop.
Thursday August 22, 2024 3:35pm - 4:10pm HKT
Level 2 | Grand Ballroom 1-2
  KubeCon + CloudNativeCon Sessions, Observability

3:35pm HKT

OpAMP: Scaling OpenTelemetry with Flexibility | OpAMP:灵活扩展OpenTelemetry - Husni Alhamdani, Censhare & Herbert Sianturi, Krom Bank
Thursday August 22, 2024 3:35pm - 4:10pm HKT
In this session, we will delve into how OpAMP (Open Agent Management Protocol) revolutionizes the management of large fleets of data collection Agents and its pivotal role in scaling OpenTelemetry deployments with unparalleled flexibility. Discover how OpAMP empowers organizations to remotely manage diverse Agents, irrespective of vendor, through its vendor-agnostic protocol. Learn how OpAMP facilitates status reporting, telemetry reporting, centralized management, allowing for tailored configurations and efficient monitoring of individual Agents or types of Agents, management of downloadable Agent-specific packages, and robust connection credentials management. Join us to unleash the potential of OpAMP and revolutionize your OpenTelemetry scalability strategy.

在这场演讲中,我们将深入探讨OpAMP(开放式代理管理协议)如何革新大规模数据收集代理的管理,并在扩展OpenTelemetry部署中发挥关键作用,具有无与伦比的灵活性。 发现OpAMP如何赋予组织远程管理各种代理的能力,无论供应商如何,通过其供应商无关的协议。了解OpAMP如何促进状态报告、遥测报告、集中管理,允许定制配置和有效监控单个代理或代理类型,管理可下载的特定代理软件包,以及强大的连接凭证管理。 加入我们,释放OpAMP的潜力,革新您的OpenTelemetry可扩展性策略。
Speakers
avatar for Husni Alhamdani

Husni Alhamdani

Senior Site Reliability Engineer, Censhare
Husni is a CNCF Ambassador, and a Site Reliability Engineer at Censhare, where he is responsible for building and maintaining infrastructure platforms. In addition to these responsibilities, he primarily focuses on architecting Cloud-Native solutions. He also graduated from the LFX... Read More →
avatar for Herbert Sianturi

Herbert Sianturi

Senior DevOps Engineer, Krom Bank
Herbert Sianturi serves as a Senior DevOps Engineer at Krom Bank Indonesia, where he roles spearheads efforts in enhancing the quality of end-to-end application lifecycle and applying open source platform as a base. With years of expertise in container orchestration and cloud computing... Read More →
Thursday August 22, 2024 3:35pm - 4:10pm HKT
Level 1 | Hung Hom Room 6
  KubeCon + CloudNativeCon Sessions, Observability

3:35pm HKT

Optimize and Accelerate Cloud AI Infrastructure with Autoscaling | 通过自动缩放优化和加速云AI基础设施 - Yuan Mo, Alibaba Cloud
Thursday August 22, 2024 3:35pm - 4:10pm HKT
With the rise of generative AI technology, more and more applications are starting to integrate with the capabilities of generative AI. However, the high costs of training and inference can be daunting for developers. In this talk, we will discuss the issues and solutions that need additional consideration when using elastic scaling in generative AI scenarios, including: ● How to enhance the elastic startup efficiency of generative AI ● How to address the efficiency of inference when separating compute and storage in generative AI ● How to reduce the costs of training and inference ● How to solve the interruption problem in AI training scenarios using Spot instances ● How to address the issue of capacity elasticity in LLM scenarios Finally, we will introduce the practical experience of the world's leading generative AI service provider: HaiYi (seaart.ai), allowing more developers to understand the architectural methods of elastic cloud AI infrastructure.

随着生成式人工智能技术的兴起,越来越多的应用程序开始与生成式人工智能的能力集成。然而,训练和推理的高成本可能会让开发人员望而却步。在这次演讲中,我们将讨论在生成式人工智能场景中使用弹性扩展时需要额外考虑的问题和解决方案,包括: ● 如何提高生成式人工智能的弹性启动效率 ● 如何在生成式人工智能中分离计算和存储时解决推理效率的问题 ● 如何降低训练和推理的成本 ● 如何使用Spot实例解决AI训练场景中的中断问题 ● 如何解决LLM场景中的容量弹性问题 最后,我们将介绍世界领先的生成式人工智能服务提供商海艺(seaart.ai)的实际经验,让更多开发人员了解弹性云AI基础设施的架构方法。
Speakers
avatar for Yuan Mo

Yuan Mo

Staff Engineer, Alibaba Cloud
Senior technical expert at Alibaba Cloud, the maintainer of the Kubernetes elastic component autoscaler, the founder of the cloud-native gaming community and OpenKruiseGame, and has given several talks at kubecon before. Focus on the cloud-native transformation of the gaming industry... Read More →
Thursday August 22, 2024 3:35pm - 4:10pm HKT
Level 1 | Hung Hom Room 7
  KubeCon + CloudNativeCon Sessions, Platform Engineering

3:35pm HKT

Revolutionizing Scientific Simulations with Argo Workflows | 用Argo工作流彻底改变科学模拟 - ShaungKun Tian, Alibaba Cloud & 建翔 孙, 北京深势科技有限公司
Thursday August 22, 2024 3:35pm - 4:10pm HKT
DP Technology provides scientific simulation platforms for research in biomedicine, energy, materials and other industries. Science simulation workflows are inherently complex and resource-intensive, and manual deployment is often prone to errors. After adopting Argo workflows to orchestrate science simulation, we get productivity 100% improvement. In this talk, we will introduce why chose Argo Workflow, how to orchestrate large-scale tasks of science simulation, how to make whole system scalability and reliability. Specially, we will share best practice about how manage super large workflow (thousands of tasks), how to do reasonable workflow retry, how to use memorization to reduce runtime and compute cost, how to interact with HPC systems. We also made contributions to Argo community to enhance functionalities and improve reliability. Additionally, we'll introduce DFlow, our open-source Python SDK designed for the seamless orchestration of scientific simulations with Argo Workflows.

DP Technology为生物医药、能源、材料等行业的研究提供科学模拟平台。科学模拟工作流程本质上复杂且资源密集,手动部署往往容易出错。采用Argo工作流程来编排科学模拟后,我们的生产力提高了100%。在本次演讲中,我们将介绍为什么选择Argo工作流程,如何编排大规模科学模拟任务,如何实现整个系统的可扩展性和可靠性。特别是,我们将分享如何管理超大型工作流程(数千个任务),如何合理重试工作流程,如何使用记忆化来减少运行时间和计算成本,如何与HPC系统交互。我们还为Argo社区做出了贡献,以增强功能性和提高可靠性。此外,我们还将介绍DFlow,我们的开源Python SDK,旨在与Argo工作流程无缝协同编排科学模拟。
Speakers
avatar for 建翔 孙

建翔 孙

软件工程师, 北京深势科技有限公司
I once built a machine learning platform at Kuaishou, and currently, I am involved in scheduling scientific computing tasks at DP Technology, as well as constructing workflow platforms. I specialize in the field of cloud-native development.
Thursday August 22, 2024 3:35pm - 4:10pm HKT
Level 1 | Hung Hom Room 1
  KubeCon + CloudNativeCon Sessions, Platform Engineering

3:35pm HKT

Phippy’s Field Guide to Wasm | Phippy的Wasm指南 - Karen Chu, Fermyon & Matt Butcher, Fermyon Technologies
Thursday August 22, 2024 3:35pm - 4:10pm HKT
The creators of the original Illustrated Children’s Guide to Kubernetes have written a fourth book, this time focused on the emerging technology that is WebAssembly, one of the fastest growing cloud native trends. As with previous books, we broach a complex technical topic with a fun and friendly format designed for all skill levels. On their camping trip with Blossom the Wasm Possum, Phippy and Zee’s adventures illustrate the basics of Wasm, introduce key terminology, and frame how it compliments existing cloud technologies like containers and Kubernetes. In the first half of the talk, we will do a reading of the book in Mandarin. We will then follow up (in English) with a technical overview of Wasm, latest updates to the ecosystem, and details on where to find the community.

原《插图儿童 Kubernetes 指南》的创作者们已经写了第四本书,这次的焦点是新兴技术 WebAssembly,这是增长最快的云原生趋势之一。与之前的书籍一样,我们以有趣友好的格式涉及复杂的技术主题,适合各种技能水平的读者。在他们与 Wasm 负鼠 Blossom 一起露营的旅行中,Phippy 和 Zee 的冒险展示了 Wasm 的基础知识,介绍了关键术语,并阐述了它如何与容器和 Kubernetes 等现有云技术相辅相成。在讲座的前半部分,我们将用普通话朗读这本书。然后我们将用英语进行技术概述,介绍 Wasm 生态系统的最新更新,并详细介绍社区的位置。
Speakers
avatar for Karen Chu

Karen Chu

Head of Community, Fermyon
Karen Chu is the Head of Community at Fermyon Technologies. Having participated in the cloud native community since 2015, she is a CNCF Ambassador, Helm community manager/maintainer, emeritus Kubernetes Code of Conduct Committee member, meet-up organizer, and conference organizer... Read More →
avatar for Matt Butcher

Matt Butcher

CEO, Fermyon Technologies
Matt Butcher (CEO) is a founder of Fermyon. He is one of the original creators of Helm, Brigade, CNAB, OAM, Glide, and Krustlet. He has written or co-written many books, including "Learning Helm" and "Go in Practice." He is a co-creator of the "Illustrated Children’s Guide to Kubernetes... Read More →
Thursday August 22, 2024 3:35pm - 4:10pm HKT
Level 1 | Hung Hom Room 5

4:00pm HKT

Peer Group Mentoring | 同侪小组辅导会议
Thursday August 22, 2024 4:00pm - 5:00pm HKT
The community collectively has an immense depth of knowledge and expertise which we can explore and learn from at this collaborative event. Whether you’re new, or not, to open source and the cloud native community, we invite you to attend the Peer Group Mentoring Session. You’ll have the chance to meet with experienced open source veterans. You will be paired with 2- 8 other people to explore technical, community and career questions together.


社区集体拥有丰富的知识和专业技能,我们可以在这个合作活动中探索并学习。无论您是新手还是老手,对开源和云原生社区都不陌生,我们邀请您参加同侪小组辅导会议。您将有机会与经验丰富的开源老将见面。您将与其他2-8人配对,一起探讨技术、社区和职业问题。


Sign-up to be a Mentee
学员注册

Sign-up to be a Mentor
辅导员注册
Thursday August 22, 2024 4:00pm - 5:00pm HKT
Level 1 | Hung Hom Room 4
  Experiences | 社交活动

4:25pm HKT

Effortless Scalability: Orchestrating Large Language Model Inference with Kubernetes | 无缝扩展性:使用Kubernetes编排大型语言模型推理 - Joinal Ahmed & Nirav Kumar, Navatech Group
Thursday August 22, 2024 4:25pm - 5:00pm HKT
In the dynamic landscape of AI/ML, deploying and orchestrating large open-source inference models on Kubernetes has become paramount. This talk delves into the intricacies of automating the deployment of heavyweight models like Falcon and Llama 2, leveraging Kubernetes Custom Resource Definitions (CRDs) to manage large model files seamlessly through container images. The deployment is streamlined with an HTTP server facilitating inference calls using the model library. This session will explore eliminating manual tuning of deployment parameters to fit GPU hardware by providing preset configurations. Learn how to auto-provision GPU nodes based on specific model requirements, ensuring optimal utilization of resources. We'll discuss empowering users to deploy their containerized models effortlessly by allowing them to provide a pod template in the workspace custom resource inference field. The controller dynamically, in turn, creates deployment workloads utilizing all GPU nodes.

在AI/ML不断发展的领域中,在Kubernetes上部署和编排大型开源推理模型变得至关重要。本次演讲将深入探讨自动化部署像Falcon和Llama 2这样的重型模型的复杂性,利用Kubernetes自定义资源定义(CRDs)通过容器镜像无缝管理大型模型文件。部署通过HTTP服务器简化,以便使用模型库进行推理调用。 本场演讲将探讨通过提供预设配置来消除手动调整部署参数以适应GPU硬件的需求。了解如何根据特定模型要求自动配置GPU节点,确保资源的最佳利用。我们将讨论如何赋予用户轻松部署其容器化模型的能力,允许他们在工作区自定义资源推理字段中提供一个pod模板。控制器动态地创建部署工作负载,利用所有GPU节点。
Speakers
avatar for Joinal Ahmed

Joinal Ahmed

AI Architect, Navatech Group
Joinal is a seasoned Data Science expert passionate about rapid prototyping, community involvement, and driving technology adoption. With a robust technical background, he excels in leading diverse teams through ML projects, recruiting and mentoring talent, optimizing workflows, and... Read More →
avatar for Nirav Kumar

Nirav Kumar

Head of AI and Engineering, Navatech Group
Nirav Kumar is a leader in the field of Artificial Intelligence with over 13 years of experience in data science and machine learning. As Head of AI and Engineering at Navatech Group, he spearheads cutting-edge research and development initiatives aimed at pushing the boundaries of... Read More →
Thursday August 22, 2024 4:25pm - 5:00pm HKT
Level 1 | Hung Hom Room 3

4:25pm HKT

KubeVirt Community Update | KubeVirt社区更新 - Haolin Zhang, Arm
Thursday August 22, 2024 4:25pm - 5:00pm HKT
KubeVirt has been going through some growth spurts of late. As the project matures, so too must the community. We'll go through some of these changes, such as our recent changes building out our SIG process and how this helps our contributor ladder, changes to our design proposal process and how to track the lifecycle of a feature and accountability. As we move towards graduation, it's certainly an interesting time to be part of the community. We've also had three big releases since the last time we met, so we will walk through some of the key features from those. And for those of you who are still learning about what KubeVirt is, we'll run a demo to show some of the basic uses, and how running virtual machines natively alongside your containers is easier than you think.

KubeVirt最近经历了一些增长阶段。随着项目的成熟,社区也必须发展壮大。我们将讨论一些这些变化,比如最近我们改进了SIG流程以及这如何帮助我们的贡献者阶梯,设计提案流程的变化以及如何跟踪功能的生命周期和责任。随着我们向毕业迈进,成为社区的一部分绝对是一个有趣的时刻。 自上次见面以来,我们还发布了三个重要版本,因此我们将介绍其中一些关键功能。对于那些仍在了解KubeVirt的人,我们将进行演示,展示一些基本用途,以及如何在容器旁边本地运行虚拟机比你想象的更容易。
Speakers
avatar for Haolin Zhang

Haolin Zhang

Senior Software Engineer, Arm
Haolin Zhang, a senior engineer at Arm company. With expertise in cloud computing, he brings deep knowledge in virtualization, containers, and container orchestration. He actively contributes to open-source projects, particularly the Kubevirt project, where he focuses on enabling... Read More →
Thursday August 22, 2024 4:25pm - 5:00pm HKT
Level 1 | Hung Hom Room 6

4:25pm HKT

A Decade of Cloud-Native Journey: The Evolution of Container Technology and the Kubernetes Ecosystem | 十年云原生之旅:容器技术和Kubernetes生态系统的演变 - Jintao Zhang, Kong Inc.
Thursday August 22, 2024 4:25pm - 5:00pm HKT
Over the past decade, cloud-native technologies have revolutionized software development, deployment, and operations. Container technology and the Kubernetes ecosystem, as transformation leaders, have enhanced development agility, and provided enterprises with unmatched scalability, flexibility, and efficiency. This talk navigates the evolution of these technologies, highlighting their impact on the cloud-native landscape. Starting my journey in 2014, I will share insights into the decade-long evolution of Kubernetes, its community, and technology stacks, alongside personal experiences. Attendees will learn about successes, challenges, and future trends, gaining knowledge to navigate their cloud-native transformations.

在过去的十年里,云原生技术已经彻底改变了软件开发、部署和运营。容器技术和Kubernetes生态系统作为变革的领导者,提升了开发的灵活性,并为企业提供了无与伦比的可扩展性、灵活性和效率。本次演讲将探讨这些技术的演变,突出它们对云原生领域的影响。 从2014年开始我的旅程,我将分享关于Kubernetes、其社区和技术堆栈十年演变的见解,以及个人经验。与会者将了解成功、挑战和未来趋势,获得知识来引领他们的云原生转型。
Speakers
avatar for Jintao Zhang

Jintao Zhang

Sr. SE, Kong
Jintao Zhang is a Microsoft MVP, CNCF Ambassador, Apache PMC, and Kubernetes Ingress-NGINX maintainer, he is good at cloud-native technology and Azure technology stack. He worked for Kong Inc.
Thursday August 22, 2024 4:25pm - 5:00pm HKT
Level 1 | Hung Hom Room 2
  KubeCon + CloudNativeCon Sessions, Cloud Native Novice

4:25pm HKT

Observability Supercharger: Build the Traffic Topology Map for Millions of Containers with Zero Code | 可观测性超级增强器:使用零代码为数百万个容器构建流量拓扑图 - Sheng Wei & Teck Chuan Lim, Shopee
Thursday August 22, 2024 4:25pm - 5:00pm HKT
Kubernetes makes container orchestration and management simple and easy. However, with the surge of applications and middleware onboard Kubernetes, it is difficult to analyze and identify the relationship and dependencies between huge amounts of services and middleware. The most general way requires the business side to make code changes to expose more information, which is impossible to cover for all applications. In this session, we will share: * How does Shopee leverage eBPF to build a universal map for a million containers in production environments? * How do we implement distributed tracing for arbitrary third-party middleware with different protocols and usage patterns? * How do we optimize eBPF code and Linux Kernel to minimize the impacts for injected containers? * How did we integrate with BigData and AI Stack to fully utilize the data for abnormal detection and incident troubleshooting?

Kubernetes使容器编排和管理变得简单易行。然而,随着应用程序和中间件在Kubernetes上的激增,分析和识别大量服务和中间件之间的关系和依赖关系变得困难。最常见的方法需要业务方进行代码更改以公开更多信息,这对所有应用程序来说是不可能覆盖的。 在本场演讲中,我们将分享: *Shopee如何利用eBPF在生产环境中为百万个容器构建通用映射? *我们如何为具有不同协议和使用模式的任意第三方中间件实现分布式跟踪? *我们如何优化eBPF代码和Linux内核以最小化对注入容器的影响? *我们如何与大数据和人工智能堆栈集成,充分利用数据进行异常检测和故障排除?
Speakers
avatar for Teck Chuan Lim

Teck Chuan Lim

Engineer, Shopee
Been working with Shopee since graduation in 2018. I am a long standing core team member of the engineering infrastructure team and took charge to drive Shopee's engineering infrastructure ecosystem from DevOps to DataOps. As of the moment, I am taking charge to drive forward towards... Read More →
Thursday August 22, 2024 4:25pm - 5:00pm HKT
Level 2 | Grand Ballroom 1-2
  KubeCon + CloudNativeCon Sessions, Observability

4:25pm HKT

The Two Sides of the Kubernetes Enhancement Proposals (KEPs) | Kubernetes Enhancement Proposals(KEPs)的两面性 - Rayan Das, OneTrust LLC & Sreeram Venkitesh, BigBinary
Thursday August 22, 2024 4:25pm - 5:00pm HKT
Kubernetes Enhancement Proposals (KEPs) are pivotal in proposing, communicating, and coordinating new efforts within the Kubernetes project. As members of the Release Team (the team responsible for releasing the next version of Kubernetes) especially Enhancements Team under SIG-Release, we play a vital role in maintaining the active status of enhancements and facilitating communication between stakeholders, be it a deprecation or a feature update. In this talk, we look at the KEP lifecycle from the perspective of the release team, exploring the process (enhancements freeze, code freeze, and the exception process), major themes, and more. Additionally, we will discuss the developer's viewpoint on KEPs, highlighting the process, deadlines, and best practices for proposing, reviewing, and implementing KEPs effectively. Join us to know how KEPs drive innovation and collaboration within the Kubernetes community, empowering contributors to shape the future of Kubernetes development.

Kubernetes Enhancement Proposals(KEPs)在Kubernetes项目中提出、沟通和协调新工作方面起着关键作用。 作为发布团队的成员(负责发布下一个版本的Kubernetes的团队),特别是在SIG-Release下的Enhancements团队,我们在维护增强功能的活跃状态和促进利益相关者之间的沟通方面发挥着重要作用,无论是废弃还是功能更新。 在这次演讲中,我们将从发布团队的角度看待KEP的生命周期,探讨过程(增强功能冻结、代码冻结和异常处理过程)、主要主题等。此外,我们还将讨论开发人员对KEP的观点,重点介绍提出、审查和有效实施KEP的过程、截止日期和最佳实践。 加入我们,了解KEP如何推动Kubernetes社区内的创新和协作,赋予贡献者塑造Kubernetes开发未来的能力。
Speakers
avatar for Rayan Das

Rayan Das

Senior Site Reliability Engineer, OneTrust LLC
As a Senior Site Reliability Engineer, I devote my expertise to work on the infrastructure of OneTrust Privacy Software. Within the Kubernetes community, I've served as the SIG-Release Enhancement Shadow for Kubernetes v1.29, I applied for release shadow for v1.31 as well. Beyond... Read More →
avatar for Sreeram Venkitesh

Sreeram Venkitesh

Software Engineer, BigBinary
Sreeram Venkitesh is a Software Engineer at BigBinary and is an active contributor to Kubernetes. He is active in the Kubernetes release team, where he served as a shadow in the enhancements team from v1.29-v1.30 and is the enhancements sub-team lead for v1.31. He also helps write... Read More →
Thursday August 22, 2024 4:25pm - 5:00pm HKT
Level 1 | Hung Hom Room 7
  KubeCon + CloudNativeCon Sessions, Cloud Native Experience

4:25pm HKT

Uniting Sustainability and Edge Computing: Kepler & Open Horizon on RISC-V and Heterogeneous System | 团结可持续性和边缘计算:Kepler和Open Horizon在RISC-V和异构系统上 - Peng Hui Jiang & David Yao, IBM
Thursday August 22, 2024 4:25pm - 5:00pm HKT
The dynamic landscape of cloud-edge computing demands solutions to mitigate energy consumption and promote sustainability. Our proposal advocates for the integration of Kepler and Open Horizon with CNCF and LF Edge ecosystem to address diverse hardware requirements in Cloud and Edge deployments, including x86, arm, s390, and the emerging RISC-V architectures. Notably, the Chinese market, characterized by edge devices in manufacturing, retail and surveillance domains, stands to benefit significantly from this initiative. By using Kepler’s sophisticated energy estimation capabilities and Open Horizon’s autonomous workload management features, this proposal endeavors to optimize energy efficiency across heterogeneous edge environments. In the session, we will demonstrate one use case to build and integrate Kepler and Open Horizon to work on RISC-V platform, and monitor and optimize distributed and heterogeneous system to build a greener and more resilient cloud-edge computing paradigm.

云边计算的动态景观需要解决能源消耗问题并促进可持续发展。我们的提案主张将Kepler和Open Horizon与CNCF和LF Edge生态系统整合,以解决云和边缘部署中多样化的硬件需求,包括x86、arm、s390和新兴的RISC-V架构。值得注意的是,中国市场以制造、零售和监控领域的边缘设备为特征,这一举措将使其受益匪浅。通过利用Kepler的先进能源估算能力和Open Horizon的自主工作负载管理功能,本提案旨在优化异构边缘环境的能源效率。 在本场演讲中,我们将演示一个使用案例,展示如何构建和整合Kepler和Open Horizon在RISC-V平台上运行,并监控和优化分布式和异构系统,以构建更环保、更具弹性的云边计算范式。
Speakers
avatar for Peng Hui Jiang

Peng Hui Jiang

Architect, IBM
Peng Hui Jiang is working for IBM as Senior Software Engineer to build and operate Public Cloud services. He has rich experience in Cloud, Database, and Security. He is CNCF Kepler Maintainer and Apache CouchDB committer and Master Inventor in IBM holding more than 200 patents or... Read More →
avatar for 勇 姚

勇 姚

Program Director, IBM Cloud Platform, IBM
David Yao is the Program Director of IBM Cloud Platform in IBM China Development Lab, developing and managing the entire product development lifecycle and team for the dynamic cloud and edge environment. Passionate on learning open technology, building and transforming an open and... Read More →
Thursday August 22, 2024 4:25pm - 5:00pm HKT
Level 1 | Hung Hom Room 1
  KubeCon + CloudNativeCon Sessions, Observability

4:25pm HKT

Enforceable Supply Chain Security Policy with OPA Gatekeeper and Ratify | 通过OPA Gatekeeper和Ratify执行可强制执行的供应链安全策略 - Feynman Zhou, Microsoft & Dahu Kuang, Alibaba Cloud
Thursday August 22, 2024 4:25pm - 5:00pm HKT
Container supply chain threats are on the rise; to mitigate these threats, enterprises and open-source project maintainers are exploring new safeguards. Signing and verifying images, enforcing policies to block untrusted deployment, generating SBOM, provenance attestation, and vulnerability scanning are ways to keep attackers from compromising software. To safeguard the software supply chain with Gatekeeper policy, we built Ratify for Gatekeeper which acts as an external data provider and returns verification data that can be processed by Gatekeeper. Ratify as a verification engine enables users to enforce security policies through the verification of image signature, vulnerability reports and SBOM. We’ll demonstrate how you can establish trust for container images by enforcing security policies with Gatekeeper and Ratify. You can admit for deployment only the images that comply with your admission control policy, resulting in a more trustworthy container supply chain.

容器供应链威胁正在上升;为了减轻这些威胁,企业和开源项目维护者正在探索新的保障措施。签名和验证图像、强制执行政策以阻止不受信任的部署、生成SBOM、来源验证和漏洞扫描是防止攻击者损害软件的方法。 为了通过Gatekeeper策略保护软件供应链,我们为Gatekeeper构建了Ratify,它作为外部数据提供者返回验证数据,Gatekeeper可以处理这些数据。 Ratify作为验证引擎,使用户能够通过验证图像签名、漏洞报告和SBOM来执行安全策略。 我们将演示如何通过Gatekeeper和Ratify强制执行安全策略来建立对容器图像的信任。您可以仅允许符合入场控制策略的图像进行部署,从而实现更可信赖的容器供应链。
Speakers
avatar for Feynman Zhou

Feynman Zhou

Product Manager, Microsoft
Feynman is a product manager for Microsoft Azure. He is also a maintainer of the CNCF Notary Project, ORAS, and Ratify. Feynman has been contributing to multiple CNCF projects for six years and now focusing on the software supply chain security area. Feynman is also a writer, a public... Read More →
Thursday August 22, 2024 4:25pm - 5:00pm HKT
Level 1 | Hung Hom Room 5
  Open Source Summit Sessions, Supply Chain Security

5:15pm HKT

Navigating the Ethical Horizon: Pioneering Responsible AI with the Generative AI Commons | 穿越伦理地平线:与生成式AI共同开创负责任的AI - Anni Lai, Futurewei
Thursday August 22, 2024 5:15pm - 5:50pm HKT
Join me to explore Responsible AI's vital role in shaping technology ethically. We'll navigate ethical dilemmas and societal impacts, emphasizing the urgency for frameworks prioritizing human well-being. At the core is the Responsible AI Framework by Generative AI Commons, guiding developers, researchers, and policymakers. Through transparency, fairness, accountability, and inclusivity, it empowers stakeholders to uphold ethical standards across the AI lifecycle. Let's journey towards an AI-powered future that's not just innovative but also ethically responsible.

加入我,探索负责任人工智能在塑造技术道德方面的重要作用。我们将探讨伦理困境和社会影响,强调制定以人类福祉为重点的框架的紧迫性。核心是生成式人工智能共同体的负责任人工智能框架,指导开发人员、研究人员和政策制定者。通过透明度、公平性、问责制和包容性,它赋予利益相关者在整个人工智能生命周期中维护伦理标准的能力。让我们一起走向一个不仅创新而且道德负责的人工智能驱动的未来。
Speakers
avatar for Anni Lai

Anni Lai

Head of Open Source Operations, Chair of Generative AI Commons, LF AI & Data, Futurewei
Anni drives Futurewei’s open source (O.S.) governance, process, compliance, training, project alignment, and ecosystem building. Anni has a long history of serving on various O.S. boards such as OpenStack Foundation, LF CNCF, LF OCI, LF Edge, and is on the LF OMF board and LF Europe... Read More →
Thursday August 22, 2024 5:15pm - 5:50pm HKT
Level 1 | Hung Hom Room 3

5:15pm HKT

KubeEdge DeepDive: Extending Kubernetes to the Edge with Real-World Industry Use Case | KubeEdge深入探讨:将Kubernetes扩展到边缘,实现真实行业用例 - Yue Bao, Huawei Cloud Computing Technology Co., Ltd. & Hongbing Zhang, DaoCloud
Thursday August 22, 2024 5:15pm - 5:50pm HKT
In this session, KubeEdge project maintainers will provide an overview of KubeEdge's architecture, explore how KubeEdge with its industry-specific use cases. The session will kick off with a brief introduction to edge computing and its growing importance in IoT and distributed systems. The maintainers will then delve into the core components and architecture of KubeEdge, showcasing how it extends the capabilities of Kubernetes to manage edge computing workloads efficiently. Drawing on a range of industry use cases, including smart cities, industrial IoT, edge AI, robotics, and retail, the maintainers will share success stories and insights from organizations that have deployed KubeEdge in their edge environments, highlighting the tangible benefits and transformational possibilities it offers. The session will provide a detailed introduction to the certified KubeEdge conformance test. The maintainers will also share the advancements in technology and community governance in KubeEdge.

在这场演讲中,KubeEdge项目的维护者将介绍KubeEdge的架构,探讨KubeEdge与行业特定用例的关系。会议将以简要介绍边缘计算及其在物联网和分布式系统中日益重要的作用开始。维护者将深入探讨KubeEdge的核心组件和架构,展示它如何扩展Kubernetes的能力,以有效管理边缘计算工作负载。维护者将借助一系列行业用例,包括智慧城市、工业物联网、边缘人工智能、机器人和零售,分享已在其边缘环境中部署KubeEdge的组织的成功故事和见解,突出其提供的切实利益和变革可能性。会议将详细介绍认证的KubeEdge一致性测试。维护者还将分享KubeEdge技术和社区治理方面的进展。
Speakers
avatar for Yue Bao

Yue Bao

Senior Software Engineer, Huawei Cloud Computing Technology Co., Ltd.
Yue Bao serves as a software engineer of Huawei Cloud. She is now working 100% on open source and the member of KubeEdge maintainers, focusing on lightweight edge and edge api-server for KubeEdge. Before that, Yue worked on Huawei Cloud Intelligent EdgeFabric Service and participated... Read More →
avatar for Hongbing Zhang

Hongbing Zhang

Chief Operating Officer, DaoCloud
Hongbing Zhang is Chief Operating Officer of DaoCloud. He is a veteran in open source areas, he founded IBM China Linux team in 2011 and organized team to make significant contributions in Linux Kernel/openstack/hadoop projects. Now he is focusing on cloud native domain and leading... Read More →
Thursday August 22, 2024 5:15pm - 5:50pm HKT
Level 1 | Hung Hom Room 2

5:15pm HKT

Understanding the Buzz Around Cilium: Introduction and in Production at Alibaba | 深入了解Cilium背后的热潮:在阿里巴巴的介绍和生产中 - Bo Kang Li, AlibabaCloud & Liyi Huang, Cisco
Thursday August 22, 2024 5:15pm - 5:50pm HKT
Cilium is CNCF's most widely adopted CNI, being the default choice for all major cloud providers. You'll hear how the project grew from a simple networking CNI to cover observability and security too..This talk dives into the bytecode behind all of the buzz around the project. It will cover: An introduction to Cilium and how it works Network policy, kube proxy replacement, and bandwidth manager deep dive Why and how Alibaba Cloud integrated Cilium as their CNI The observability and security capabilities Hubble and Tetragon bring to the project Where Cilium is heading next By weaving together both theoretical knowledge and hands on experience running Cilium in production, the audience will walk away with a strong understanding of what Cilium provides for networking, observability, and security.

Cilium是CNCF最广泛采用的CNI,是所有主要云提供商的默认选择。您将了解到该项目是如何从一个简单的网络CNI发展到覆盖可观察性和安全性的。本次演讲将深入探讨该项目背后的所有字节码。内容包括: Cilium的介绍及其工作原理 网络策略、kube代理替换和带宽管理器深入挖掘 阿里云为何以及如何集成Cilium作为他们的CNI Hubble和Tetragon为该项目带来的可观察性和安全性能力 Cilium未来的发展方向 通过将理论知识和实际运行Cilium的经验结合起来,观众将对Cilium在网络、可观察性和安全性方面提供的功能有深入的了解。
Speakers
avatar for BoKang Li

BoKang Li

Senior Engineer, AlibabaCloud
BoKang Li is a senior engineer on Container Service for Kubernetes team of Alibaba Cloud. He has a primary focus on container networks and connectivity solutions, and has gained extensive experience in use container networking.
avatar for Liyi Huang

Liyi Huang

solution architect, Isovalent at Cisco
solution architect @Isovalent part of Cisco
Thursday August 22, 2024 5:15pm - 5:50pm HKT
Level 1 | Hung Hom Room 6

5:15pm HKT

Addressing the #1 Threat to the Web: Authorization | 应对网络的头号威胁:授权 - Jimmy Zelinskie, authzed
Thursday August 22, 2024 5:15pm - 5:50pm HKT
As more folks deploy cloud-native architectures and technologies, store ever larger amounts of data, and build ever more complex software suites, the complexity required to correctly and securely authorize requests only becomes exponentially more difficult. Broken authorization now tops OWASP's Top 10 Security Risks for Web Apps. Their recommendation? Adopt an ABAC or ReBAC authorization model. This talk establishes the problems with the status quo, explains the core concepts behind ReBAC, and introduces SpiceDB, a widely adopted open source system inspired by the system internally powering Google: Zanzibar.

随着越来越多的人部署云原生架构和技术,存储越来越多的数据,并构建越来越复杂的软件套件,正确和安全地授权请求所需的复杂性变得指数级增加。 破解授权现在已经成为OWASP Web应用程序安全风险前十名之首。他们的建议是采用ABAC或ReBAC授权模型。本次演讲将阐明现状存在的问题,解释ReBAC背后的核心概念,并介绍SpiceDB,这是一个广泛采用的开源系统,受到Google内部系统Zanzibar的启发。
Speakers
avatar for Jimmy Zelinskie

Jimmy Zelinskie

cofounder, authzed
Jimmy Zelinskie is a software engineer and product leader with a goal of democratizing software via open source development. He's currently CPO of authzed where he's focused on bringing hyperscaler best-practices in authorization to the industry at large. At CoreOS, he helped pioneer... Read More →
Thursday August 22, 2024 5:15pm - 5:50pm HKT
Level 1 | Hung Hom Room 7
  KubeCon + CloudNativeCon Sessions, Security

5:15pm HKT

OpenTelemetry Amplified: Full Observability with EBPF-Enabled Distributed Tracing | OpenTelemetry放大:使用eBPF启用的分布式跟踪实现全面的可观测性 - Kai Liu, Alibaba Cloud & Wanqi Yang, Sun Yat
Thursday August 22, 2024 5:15pm - 5:50pm HKT
Within the cloud-native ecosystem, OpenTelemetry (otel) has established itself as the de facto standard for cross-language and cross-platform observability. By providing comprehensive tracing, metrics, and logging solutions for various programming languages, otel has empowered developers and operators with deep insights into complex systems. In recent years, otel has further expanded its observability frontiers by introducing innovative capabilities in the Linux kernel space using eBPF. However, this innovative journey has encountered new challenges, particularly in reducing the invasiveness in certain programming languages and correlating observability data between kernel and user spaces. This session chronicles Alibaba Cloud’s journey through these challenges. By leveraging eBPF technology, we've pioneered innovative solutions that redefine the landscape of system observability, presenting an integrated, less invasive approach for real-time insights into distributed systems.

在云原生生态系统中,OpenTelemetry(otel)已经成为跨语言和跨平台可观测性的事实标准。通过为各种编程语言提供全面的跟踪、度量和日志解决方案,otel为开发人员和运维人员提供了对复杂系统的深入洞察。近年来,otel通过在Linux内核空间引入eBPF的创新能力,进一步拓展了其可观测性边界。 然而,这种创新之旅遇到了新的挑战,特别是在减少某些编程语言中的侵入性和在内核和用户空间之间相关联可观测性数据方面。 本场演讲将记录阿里云在这些挑战中的旅程。通过利用eBPF技术,我们开创了重新定义系统可观测性景观的创新解决方案,提供了一种集成的、不那么侵入性的方法,实时洞察分布式系统。
Speakers
avatar for Kai Liu

Kai Liu

Senior Software Developer, Alibaba Cloud
Liu Kai, a senior software development engineer in the Cloud Native Observability team of Alibaba Cloud. With years of practical experience and insights in the field of monitoring and observability, Liu Kai continuously delves into the realm of observability solutions, including architectural... Read More →
avatar for Wanqi Yang

Wanqi Yang

Student, Sun Yat-sen University
Wanqi Yang received the B.S. degree in Computer Science and Technology from Sun Yat-Sen University, Guangzhou, China. She is currently working toward the PhD degree in Computer Science and Technology at School of Computer Science and Engineering, Sun Yat-Sen University. Her research... Read More →
Thursday August 22, 2024 5:15pm - 5:50pm HKT
Level 1 | Hung Hom Room 1
  KubeCon + CloudNativeCon Sessions, Observability

5:15pm HKT

Working with Raw Disk Drives in Kubrenetes — YDB's Experience | 在Kubernetes中使用原始磁盘驱动器——YDB的经验 - Ivan Blinkov, YDB
Thursday August 22, 2024 5:15pm - 5:50pm HKT
YDB is an open-source distributed database management system that, for performance reasons, uses raw disk drives (block devices) to store all data, without any filesystem. It was relatively straightforward to manage such setup in the bare-metal world of the past, but the dynamic nature of cloud-native environments introduced new challenges to keep this performance benefit. In this talk, we'll explore how to leverage Kubernetes and the Operator design pattern to modernize how stateful distributed database clusters are managed without changing the primary approach to how the data is physically stored.

YDB是一个开源的分布式数据库管理系统,为了性能考虑,使用原始磁盘驱动器(块设备)存储所有数据,而不使用任何文件系统。在过去的裸金属世界中管理这样的设置相对比较简单,但云原生环境的动态特性引入了新的挑战,以保持这种性能优势。在这次演讲中,我们将探讨如何利用Kubernetes和运算符设计模式来现代化管理有状态的分布式数据库集群,而不改变数据物理存储的主要方法。
Speakers
avatar for Ivan Blinkov

Ivan Blinkov

VP, Product and Open-Source, YDB
Ivan Blinkov is a seasoned technical leader specializing in data storage and processing. Over the last decade, he was involved in the development of several database management systems, two of which are open-source: ClickHouse in the past and, more recently, YDB.
Thursday August 22, 2024 5:15pm - 5:50pm HKT
Level 2 | Grand Ballroom 1-2

5:15pm HKT

LFX Mentorship Showcase (Open to All Attendees; No Additional Fee or Registration Required) | LFX导师展示(对所有与会者开放;无需额外费用或注册)
Thursday August 22, 2024 5:15pm - 5:50pm HKT
  • Contribute to the Linux Kernel - Juntong Deng
The showcase is my complete experience in the Linux kernel Bug Fixing Fall 2023 Mentorship. This included applying the LFX Mentorship, learning how to contribute to the Linux kernel under the mentoring of Shuah Khan, and fixing multiple bugs in the Linux kernel. At the end I also added new features to the Linux kernel that can help us find the cause of memory bugs. I want to encourage more people to apply for the LFX Mentorship and encourage more people to contribute to open source software.

  • Introducing Support to Spatial and Geographic Data into Vitess' Datasharding Engine - Ayman Nawaz
Join us for an enlightening session as we delve into the integration of spatial and geographic data into Vitess' renowned data sharding engine. In today's interconnected world, businesses face increasing demands to manage and analyze vast amounts of location-based information efficiently. Traditional database solutions often struggle to scale and perform optimally when dealing with spatial data, presenting significant challenges for organizations seeking to leverage geographic insights.

  • From Query to Insight:  A Look Inside Thanos Query Observability Features - Nishchay Veer
If you're someone curious to gain insights of your application's performance and behavior with the Thanos's enhanced PromQL engine and its groundbreaking observability capabilities then this session is for you. In this session we will explore the foundational work that has paved the way for query observability within Thanos.

Our speaker will take you behind the scenes, showcasing the integration of query telemetry, which includes crucial insights such as time consumed per operator. Learn how this feature transforms Thanos into a robust platform for monitoring, allowing you to gain deeper visibility into your data and queries.

So whether you're a seasoned monitoring professional or just starting out, tune in to get some valuable insights into the world of query observability and how it can empower your monitoring and observability practices. Don't miss out on the chance to stay ahead in observability technology and see firsthand how Thanos is leading the charge.


  • From Mentorship to Mastery: Navigating My Way from Mentorship to Full-Time Kernel Developer - Anup Sharma
Join me as I go into my transformative journey from participating in the Linux Kernel Bugfix Mentorship Program to securing a role as a full-time kernel developer. Throughout this session, I will share the invaluable lessons and experiences garnered during my mentorship, which equipped me with a diverse skill set essential for navigating the complex landscape of the Linux kernel.
 
From mastering the conversion of device bindings to DT schema to improved my understanding of networking code, I'll uncover the pivotal moments that shaped my growth. Additionally, I'll illuminate the techniques I acquired for utilizing semantic patching tools and crafting driver code alongside corresponding device tree bindings.


  • From Novice to LitmusChaos Maintainer: Learn to Make Valuable Contributions to Open Source Projects through LFX Mentorship - Namkyu Park
When individuals ask for advice on how they can contribute to open source, experts often suggest fixing documentation as a starting point. However, what comes next? This talk aims to provide a practical answer to that question. He will share his experiences as a newcomer to open source and offer guidance on how to begin your open-source contribution journey. Specifically, He will discuss the LFX Mentorship Program, which is a structured relationship between a mentor and a mentee designed to assist the mentee in achieving their open-source contributions. As a mentee in the program, he contributed to the CNCF's incubating project, LitmusChaos, and became a maintainer. He will also provide tips on how to find issues and emphasize the open-source community.

  • Building Web Applications in Open Source - Mohit Mohit
During my LFX mentorship, I delved into the realm of building web applications in open source, a journey that culminated in the creation of a dynamic website. From conceptualization to deployment, I navigated the process with precision and creativity. Utilizing tools like Figma for design, React for frontend development, and Tailwind CSS for styling, I brought my vision to life with seamless functionality and aesthetic appeal. Leveraging the power of GitHub workflows, I ensured smooth deployment, marking the successful integration of design, coding, and deployment phases into a cohesive open-source project.

  • Enable Fine-grained Pod Security Admission in Kyverno - Liang Deng
Pod Security Admission (PSA) is a built-in solution that applies different isolation levels of Pod Security Standards for Pods. With the release of Kubernetes v1.25, PSA has entered stable.

Regarding the current shortcomings of PSA, we use Kyverno to extend PSA for finer-grained and flexible policy control.

In the showcase, we will introduce Pod Security Admission and its current shortcomings. Following that, we will discuss how we can achieve fine-grained Pod Security Admission through Kyverno. Lastly, we will explain the effects it can have in real-world applications.


  • Multiplayer Kubernetes: GitOps with Friends - Yash Sharma
Discover the transformative capabilities of Cloud Native Playground, powered by Meshery, an open-source, cloud-native manager. Experience the self-service engineering platform, simplifying provisioning, configuration, and management of your cloud-native infrastructure, enabling seamless operation of multi-Kubernetes deployments.
With Cloud Native Playground, embrace the power of GitOps and collaborative workflows. Free yourself from YAML intricacies as Meshery's extensible platform enables visual and collaborative GitOps, fostering multi-user collaboration. Explore the Cloud Native Computing Foundation's graduated, incubation, and sandbox projects, along with many other popular open source projects, to enhance your capabilities and leverage the full potential of the ecosystem.


Join me to witness firsthand how Meshery revolutionizes Kubernetes operations, enabling seamless orchestration across multiple environments made possible by GitOps principles and multi-user collaboration.


  • Getting Started with Linux Kernel Development - Ghanshyam Agrawal
We will understand how to use regular expressions to find todos/ fixmes in the kernel, how to utilise syzkaller and how are patches submitted.

  • Harnessing the Power of Open Source & Structural Hacking - Rohit T
During my time as a LFX Mentee, I worked under Kubescape to set up an automated documentation publishing pipeline. From generating documentation to reflecting those changes on the website hosted on another repo, I learnt a lot about Open Source, working async, structu
Speakers
avatar for Mohit Mohit

Mohit Mohit

Mohit is a seasoned fullstack developer and open-source enthusiast who previously worked at companies including Napstack solutions, Deloitte and The Linux Foundation as a intern, where they spearhead innovative projects at the intersection of technology and community-driven development... Read More →
avatar for Rohit T

Rohit T

Software Engineer Intern, Dr. Reddy's Laboratories
Meet Rohit, a visionary engineering student from Sreenidhi Institute of Science and Technology in Hyderabad, India. With a passion for creating sustainable solutions that empower communities, Rohit is leading the charge as Technical and Strategy Head of the START Club. His past experience... Read More →
avatar for Nishchay Veer

Nishchay Veer

Student
Nishchay Veer is a  final year undergraduate at the Dr. B.R. Ambedkar National Institute of Technology Jalandhar, India (NIT Jalandhar), pursuing a Bachelor of Technology degree in Computer Science and Engineering. He has been an LFX Summer Term Mentee'23 at Thanos, CNCF. Nishchay... Read More →
avatar for Namkyu Park

Namkyu Park

Maintainer, LitmusChaos
Namkyu Park is a CNCF Ambassador and a Software Developer. He worked at several startups in South Korea. He has completed Linux Foundation Mentorship Programme(LitmusChaos) as a mentee and is currently a mentor and maintainer of LitmusChaos. He has previously spoken at GopherCon Korea... Read More →
avatar for Liang Deng

Liang Deng

Software Engineer Intern, Kuaishou Technology
I am a graduate student at Zhejiang University Software Engineering Lab, currently working as a Software Engineer Intern at Kuaishou Container Cloud team. I have also interned at ByteDance and MSRA. I am passionate about open source, especially in the cloud-native domain. Currently... Read More →
AN

Ayman Nawaz

Incoming Software Engineer at Microsoft; LFX mentee in 2023's spring cohort; Open source contributor at Accord Project
avatar for Ghanshyam Agrawal

Ghanshyam Agrawal

Senior Software Engineer, Edstem Technologies Pvt Ltd
A Linux Kernel enthusiast. Also skilled in Python and Linux System Management and Web Development.
Thursday August 22, 2024 5:15pm - 5:50pm HKT
Level 1 | Hung Hom Room 5
  Open Source Summit Sessions, LFX Mentorship
 
Friday, August 23
 

8:00am HKT

Registration + Badge Pickup | 会议签到 + 胸牌领取
Friday August 23, 2024 8:00am - 3:00pm HKT
TBA
Friday August 23, 2024 8:00am - 3:00pm HKT
TBA

9:00am HKT

9:05am HKT

Keynote: Deploying LLM Workloads on Kubernetes by WasmEdge and Kuasar | 主论坛演讲: 使用WasmEdge和Kuasar在Kubernetes上部署LLM工作负载 - Tianyang Zhang, Huawei Cloud & Xiaowei Hu, Second State
Friday August 23, 2024 9:05am - 9:20am HKT
LLMs are powerful artificial intelligence models capable of comprehending and generating natural language. However, the conventional methods for running LLMs pose significant challenges, including complex package installations, GPU devices compatibility concerns, inflexible scaling, limited resource monitoring and statistics, and security vulnerabilities on native platforms. WasmEdge introduces a solution enabling the development of swift, agile, resource-efficient, and secure LLMs applications. Kuasar enables running applications on Kubernetes with faster container startup and reduced management overheads. This session will demonstrate running Llama3-8B on a Kubernetes cluster using WasmEdge and Kuasar as container runtimes. Attendees will explore how Kubernetes enhances efficiency, scalability, and stability in LLMs deployment and operations.

LLM是强大的人工智能模型,能够理解和生成自然语言。然而,传统的运行LLM的方法存在重大挑战,包括复杂的软件包安装、GPU设备兼容性问题、不灵活的扩展性、有限的资源监控和统计,以及在本地平台上的安全漏洞。 WasmEdge提出了一种解决方案,可以开发快速、灵活、资源高效和安全的LLM应用程序。Kuasar使应用程序能够在Kubernetes上运行,具有更快的容器启动速度和减少的管理开销。本场演讲将演示如何使用WasmEdge和Kuasar作为容器运行时,在Kubernetes集群上运行Llama3-8B。与会者将探索Kubernetes如何提高LLM部署和运营的效率、可扩展性和稳定性。
Speakers
avatar for Vivian Hu

Vivian Hu

Product Manager, Second State
Vivian Hu is a Product Manager at Second State and a columnist at InfoQ. She is a founding member of the WasmEdge project. She organizes Rust and WebAssembly community events in Asia.
avatar for Tianyang Zhang

Tianyang Zhang

Software Engineer, Huawei Cloud
Working on container runtime at Huawei Cloud. He is the maintainer of Kuasar and the reviewer of Containerd rust-extension repository.
Friday August 23, 2024 9:05am - 9:20am HKT
Level 2 | Grand Ballroom 1-2
  Keynote Sessions | 主论坛演讲, AI + ML

9:25am HKT

9:55am HKT

10:05am HKT

Coffee Break ☕ | 茶歇
Friday August 23, 2024 10:05am - 10:35am HKT
Friday August 23, 2024 10:05am - 10:35am HKT
Level 2 | Grand Ballroom 3-4

10:05am HKT

Solutions Showcase | 解决方案展示
Friday August 23, 2024 10:05am - 1:30pm HKT
Visit our sponsors in the Solutions Showcase to try the latest demos, watch live presentations, talk to experts, check out job opportunities, and score some swag.

请访问我们在解决方案展示区的赞助商,尝试最新的演示,观看现场演示,与专家交谈,了解工作机会,并获得一些赠品。

In order to facilitate networking and business relationships at the event, you may choose to visit a third party’s booth or to access sponsored content. You are never required to visit third-party booths or to access sponsored content. When visiting a booth or participating in sponsored activities, the third party will receive some of your registration data. This data includes your first name, last name, title, company, address, email, standard demographics questions (i.e. job function, industry), and details about the sponsored content or resources you interacted with. If you choose to interact with a booth or access sponsored content, you are explicitly consenting to receipt and use of such data by the third-party recipients, which will be subject to their own privacy policies.

为了促进活动中的网络和业务关系,您可以选择访问第三方的展位或者获取赞助内容。我们不会强制要求您参观第三方展位或获取赞助内容。当您访问展位或参与赞助活动时,第三方将收到您的一些注册数据。这些数据包括您的名字、姓氏、职位、公司、地址、电子邮件、标准人口统计问题(例如工作职能、行业)以及您与赞助内容或资源互动的详细信息。如果您选择与展位互动或获取赞助内容,您明确同意第三方接收和使用此类数据,这将受到他们自己的隐私政策的约束
Friday August 23, 2024 10:05am - 1:30pm HKT
Level 2 | Grand Ballroom 3-4

10:05am HKT

10:15am HKT

Project Pavilion Tour with Jorge Castro, CNCF | 与 Jorge Castro 进行的 CNCF 项目展厅之旅
Friday August 23, 2024 10:15am - 10:35am HKT
Explore the Project Pavilion, a hub of innovation and discovery! Take part in daily tours, interact with project maintainers at their kiosks, gain insights on community engagement and KCD event organization, and learn more about certification opportunities to showcase your expertise.

Join cloud veteran Jorge Castro as he takes you on a guided tour of our cloud native projects. This tour will include an introduction to the Pavilion, making introductions, interacting with maintainers, and ensuring you end up talking to the right projects!

Meeting Point: Please meet Jorge over at the Project Pavilion at the sign "CNCF Project Team Here to Help!"
Friday August 23, 2024 10:15am - 10:35am HKT
Level 2 | Grand Ballroom 3-4 | Project Pavilion

10:35am HKT

Breaking Boundaries: TACC as an Unified Cloud-Native Infra for AI + HPC | 打破界限:TACC作为AI + HPC统一云原生基础设施 - Peter Pan, DaoCloud & Kaiqiang Xu, Hong Kong University of Science and Technology
Friday August 23, 2024 10:35am - 11:10am HKT
Large AI models are driving significant investment in GPU clusters. Yet, managing these clusters is hard: Slurm-based HPC setups lack of management granularity and stability, while Kubernetes poses usability challenges for AI users. This talk introduces TACC, an AI infra management solution that bridges the advantages of both K8S and Slurm setups. This is a joint-work from computer system researchers at HKUST and leading CNCF contributors at DaoCloud. TACC manages a large-scale cluster at HKUST that supports over 500 active researchers since 2020. In this talk, we share our five-year journey with TACC, covering: * [User Experience] A seamless UI for job submissions and management, supporting both container and Slurm format, all on the same backbone * [Resource Management] Multi-tenant allocation with configurable strategies, using CNCF HAMi and Kueue * [Performance and Scalability] A robust distributed infrastructure with networked storage and RDMA, via CNCF SpiderPool,Fluid...

大型AI模型正在推动GPU集群的重大投资。然而,管理这些集群很困难:基于Slurm的HPC设置缺乏管理粒度和稳定性,而Kubernetes对AI用户存在可用性挑战。 本次演讲介绍了TACC,这是一种AI基础设施管理解决方案,可以结合K8S和Slurm设置的优势。这是香港科技大学的计算机系统研究人员与DaoCloud领先的CNCF贡献者共同合作的成果。 TACC自2020年以来管理着香港科技大学支持超过500名活跃研究人员的大规模集群。在本次演讲中,我们分享了与TACC一起的五年历程,涵盖以下内容: * [用户体验] 无缝的UI界面用于作业提交和管理,支持容器和Slurm格式,均在同一基础上 * [资源管理] 多租户分配与可配置策略,使用CNCF HAMi和Kueue * [性能和可扩展性] 强大的分布式基础设施,具有网络存储和RDMA,通过CNCF SpiderPool,Fluid...
Speakers
avatar for Peter Pan

Peter Pan

VP of R&D Engineering, DaoCloud
├ DaoCloud R&D Engineering VP├ CNCF wg-AI (AI Working-Group) member├ Maintainer of a few CNCF projects (GithubID: panpan0000): CloudTTY, KuBean, HwameiStor├ Public Tech Events:└─ 2023 KubeCon SH Speaker (https://sched.co/1PTFI)└─ 2023 KubeCon EU Program Committee... Read More →
avatar for Kaiqiang Xu

Kaiqiang Xu

Researcher, Hong Kong University of Science and Technology
Hong Kong University of Science and Technology
Friday August 23, 2024 10:35am - 11:10am HKT
Level 1 | Hung Hom Room 3

10:35am HKT

Containerd: Project Update and Deep Dive | Containerd:项目更新和深入探讨 - Akhil Mohan, VMware & Iceber Gu, DaoCloud
Friday August 23, 2024 10:35am - 11:10am HKT
Containerd as a mature, 7-year old project is moving from eight major releases into a new era: containerd 2.0. We’ll dive into all the new exciting features in 2.0, like Sandbox API, Transfer Service and Node Resource Interface, and help users understand what these new features enable for their use case. We’ll also provide an upgrade checklist and highlight changes users need to make before upgrading to the 2.0 release, since the 2.0 release will remove features marked as deprecated in past releases. We’ll also cover new updates on the API go module and the refactoring that went in to make containerd Go client stable. Guidance will be provided on using any supported release to support new Kubernetes releases. We're excited to share the progress of the containerd project. Come join us and ask your containerd questions with the handful of on-site containerd maintainers.

Containerd作为一个成熟的、已有7年历史的项目,正在从八个主要版本迈向一个新时代:containerd 2.0。我们将深入探讨2.0中所有新的令人兴奋的功能,如沙盒API、传输服务和节点资源接口,并帮助用户了解这些新功能为他们的使用案例带来了什么。我们还将提供升级检查表,并强调用户在升级到2.0版本之前需要做出的更改,因为2.0版本将删除在过去版本中标记为弃用的功能。我们还将介绍API go模块的新更新以及为使containerd Go客户端稳定而进行的重构。我们将提供指导,以使用任何支持的版本来支持新的Kubernetes版本。我们很高兴分享containerd项目的进展。快来加入我们,与现场的containerd维护人员一起提出你的containerd问题。
Speakers
avatar for Wei Cai(Iceber Gu)

Wei Cai(Iceber Gu)

Software Engineer, DaoCloud
Senior open source enthusiast, focused on cloud runtime, multi-cloud and WASM. I am a CNCF Ambassador and founded Clusterpedia and promoted it as a CNCF Sandbox project. I also created KasmCloud to promote the integration of WASM with Kubernetes and contribute it to the WasmCloud... Read More →
avatar for Akhil Mohan

Akhil Mohan

Software Engineer, VMware by Broadcom
Akhil works as a Senior Member of Technical Staff at VMware by Broadcom. An active contributor to projects in cloud native and container ecosystem. Akhil is a reviewer for containerd and a maintainer of kubernetes publishing-bot. He works mostly on container runtimes and kubernetes... Read More →
Friday August 23, 2024 10:35am - 11:10am HKT
Level 1 | Hung Hom Room 6

10:35am HKT

Leveraging Multi-Cluster Architecture for Resilient and Elastic Hybrid Cloud at Xiaohongshu - Feng Xiong, 小红书 & Hongcai Ren, Huawei
Friday August 23, 2024 10:35am - 11:10am HKT
At Xiaohongshu, the scale and number of K8S clusters have grown significantly, leading to increased complexity in cluster and resource management.

To address challenges, such as slow resource turnover, limited automation, and inefficiency, Xiaohongshu has adopted Karmada for the unified platform. This approach enhances cross-cluster application distribution, elasticity, and efficient cross-cluster scheduling while effectively managing multi-cloud infrastructure.

This session focuses on the following key points:

  • Key challenges of hyperscale infrastructure
  • Evaluation of K8s-based multi-cluster solutions and considerations
  • Practice of efficiently distributing applications and securely migrating existing applications
  • Practice of enhance elasticity in a multi-cluster environment
  • Achievements, problems met and resolved
Speakers
avatar for Hongcai Ren

Hongcai Ren

Senior Software Engineer, Huawei
Hongcai Ren(@RainbowMango) is the CNCF Ambassador, who has been working on Kubernetes and other CNCF projects since 2019, and is the maintainer of the Kubernetes and Karmada projects.
avatar for Feng Xiong

Feng Xiong

小红书
Senior Technical Expert at Xiaohongshu, and Head of Cloud Native Serverless Infrastructure. He began engaging with Kubernetes in 2017 and possesses extensive experience in product development and industry implementation related to cloud computing, containers, Serverless, and edge... Read More →
Friday August 23, 2024 10:35am - 11:10am HKT
Level 1 | Hung Hom Room 7

10:35am HKT

A Year in the Life of a Developer in the Era of Developer Portals: Navigating Backstage | 开发者在开发者门户时代的一年生活:导航Backstage - Helen Greul, Spotify
Friday August 23, 2024 10:35am - 11:10am HKT
In today's rapidly evolving landscape of software development, the role of developer portals has become indispensable. This presentation delves into the experiences of developers over the course of a year, exploring the transformative impact of Backstage developer portal on their workflows, collaboration, and overall productivity based on case studies from existing adopters of Backstage. Through a comprehensive exploration of real-world scenarios, this talk offers insights into the daily challenges faced by developers and how Backstage empowers them to overcome these hurdles. From streamlined onboarding processes to simplified access to internal services and documentation, attendees will gain a deeper understanding of the multifaceted benefits that Backstage brings to developer teams. Moreover, we'll discuss best practices for leveraging Backstage to foster a culture of innovation, collaboration, and continuous improvement.

在当今快速发展的软件开发领域,开发者门户的作用变得不可或缺。本次演讲将深入探讨开发者在一年时间内的经验,通过现有Backstage采用者的案例研究,探讨Backstage开发者门户对他们的工作流程、协作和整体生产力的转变影响。 通过对现实场景的全面探讨,本次演讲将为参与者提供洞察开发者面临的日常挑战,以及Backstage如何赋予他们克服这些障碍的能力。从简化入职流程到简化访问内部服务和文档,参与者将更深入地了解Backstage为开发团队带来的多方面好处。此外,我们还将讨论利用Backstage促进创新、协作和持续改进文化的最佳实践。
Speakers
avatar for Helen Greul

Helen Greul

Head of Engineering for Backstage, Spotify
Helen is an engineering leader, speaker and a strong advocate for creating developer ecosystems that empower teams to thrive. Her journey has taken her from hands-on coding to steering engineering and platform teams, providing her with a holistic perspective on the challenges and... Read More →
Friday August 23, 2024 10:35am - 11:10am HKT
Level 1 | Hung Hom Room 1
  KubeCon + CloudNativeCon Sessions, Platform Engineering

10:35am HKT

Deep Dive Into Windows CSI Driver HostProcess Containers | 深入探讨Windows CSI驱动程序HostProcess容器 - Andy Zhang (OSTC) & Weizhi Chen, Microsoft
Friday August 23, 2024 10:35am - 11:10am HKT
Currently, most Windows CSI drivers depend on Windows csi-proxy because various privileged operations cannot be done from a containerized application running on a Windows node. Beginning in Kubernetes 1.23, HostProcess container is supported and it can run directly on the host as a regular process. Switching to HostProcess container deployment will make Windows CSI driver development and deployment easier. This session will cover the history and implementation details of Windows csi-proxy project, why csi-proxy is needed on Windows CSI driver starting in kubernetes 1.18, and why we removed this csi-proxy dependency from Kubernetes 1.26. We will explore the key learnings and gotchas we resolved while migrating Windows CSI driver development from csi-proxy dependent deployment to HostProcess container deployment. After attending this session, you will understand why and how to migrate your Windows applications to gain the benefits of using HostProcess containers.

目前,大多数Windows CSI驱动程序依赖于Windows csi-proxy,因为各种特权操作无法从在Windows节点上运行的容器化应用程序中执行。从Kubernetes 1.23开始,支持HostProcess容器,它可以直接在主机上作为常规进程运行。切换到HostProcess容器部署将使Windows CSI驱动程序的开发和部署变得更加简单。本场演讲将涵盖Windows csi-proxy项目的历史和实施细节,解释为什么从Kubernetes 1.18开始在Windows CSI驱动程序中需要csi-proxy,以及为什么我们在Kubernetes 1.26中删除了这种csi-proxy依赖性。我们将探讨在将Windows CSI驱动程序开发从依赖于csi-proxy的部署迁移到HostProcess容器部署时解决的关键问题和注意事项。参加本场演讲后,您将了解为什么以及如何将您的Windows应用程序迁移到使用HostProcess容器以获得更多好处。
Speakers
avatar for Andy Zhang (OSTC)

Andy Zhang (OSTC)

Principal Software Engineer, Microsoft
Andy Zhang is the storage lead in Azure Kubernetes Service team at Microsoft, maintainer of multiple Kubernetes projects, including Windows csi-proxy project, Azure CSI drivers, SMB, NFS, iSCSI CSI drivers, etc. Andy focuses on improving the experience of using storage in Kuberne... Read More →
avatar for Weizhi Chen

Weizhi Chen

Senior Software Engineer, Microsoft
Work at Microsoft AKS team on Kubernetes. Focus on k8s storage drivers on Azure.
Friday August 23, 2024 10:35am - 11:10am HKT
Level 2 | Grand Ballroom 1-2

10:35am HKT

Optimize LLM Workflows with Smart Infrastructure Enhanced by Volcano | 通过Volcano增强的智能基础设施优化LLM工作流程 - Xin Li, qihoo360 & William Wang, Huawei Cloud Technologies Co., LTD
Friday August 23, 2024 10:35am - 11:10am HKT
As Large Language Models (LLMs) revolutionize various aspects of our lives, many companies build their cloud native AI platforms to train and fine-tune the LLM. However, managing large-scale LLM training and inference platforms presents even more critical challenges, such as training efficiency, fault tolerance, resource fragmentation, operational costs and topology-aware scheduling on rack and supernode. In this session, the speaker will share insights from their experience using a Kubernetes-based smart infrastructure, enhanced by the Volcano, to manage thousands of GPUs and handle monthly workloads involving thousands of LLM training and inference jobs in qihoo360. This talk will cover: Fault detection, fast job recovery and self-healing drastically improving efficiency.Dealing with long downtime in LLM training on heterogeneous GPU. Intelligent GPU workload scheduling to reduce resource fragmentation and costs. Topology-aware scheduling on rack/supernode to accelerate LLM training.

随着大型语言模型(LLMs)革新我们生活的各个方面,许多公司构建他们的云原生人工智能平台来训练和微调LLM。然而,管理大规模LLM训练和推理平台面临更为关键的挑战,如训练效率、容错性、资源碎片化、运营成本和机架和超级节点上的拓扑感知调度。在这场演讲上,演讲者将分享他们在使用基于Kubernetes的智能基础设施(由Volcano增强)管理数千个GPU并处理qihoo360中涉及数千个LLM训练和推理作业的月度工作负载的经验。本次演讲将涵盖:故障检测、快速作业恢复和自愈大幅提高效率。处理异构GPU上LLM训练的长时间停机。智能GPU工作负载调度以减少资源碎片化和成本。机架/超级节点上的拓扑感知调度以加速LLM训练。
Speakers
avatar for Xin Li

Xin Li

Senior Engineer of Server Development, qihoo360
Xin Li is a seasoned senior back-end developer and an approver for the Volcano project. With a keen focus on Kubernetes and AI. The infrastructure he is responsible for provides support for the training and inference of 360GPT.Moreover, Li Xin delves deeply into optimizing distributed... Read More →
Friday August 23, 2024 10:35am - 11:10am HKT
Level 1 | Hung Hom Room 2
  KubeCon + CloudNativeCon Sessions, AI + ML

10:35am HKT

Empower WebAssembly and Container Both on RISC-V | 在RISC-V上加强WebAssembly和容器 - Tiejun Chen, VMware
Friday August 23, 2024 10:35am - 11:10am HKT
RISC-V has got noticed from many areas apparently. But in the real world there are the existing challenges for running workload on RISC-V based targets. From cloud to edge you can see the trend of deploying workloads on such sandboxed microservice platforms - containers, k8s, etc. Actually the underlying sandbox technologies are also evolving with something new like WebAssembly that's been considered as the future computing. In the real world we start running WebAssembly as an alternative lightweight runtime side-by-side with Containers and VMs. Here we'd like to review if-how we can build this multi-runtime platform on RISC-V where WebAssembly and container coexists. We will enable to deploy {WebAssembly, Docker} to RISC-V Linux running on a real RISC-V target, and further enable other open source utilities to RISC-V Linux distribution in order to help fit workload into WebAssembly and containers on RISC-V for next explore accelerating open software ecosystem on RISC-V.

RISC-V 显然已经引起了许多领域的关注。但在现实世界中,在基于 RISC-V 的目标上运行工作负载存在着现有的挑战。从云端到边缘,您可以看到在这种沙箱化微服务平台上部署工作负载的趋势 - 容器、k8s 等。实际上,底层的沙箱技术也在不断发展,出现了一些新技术,比如被认为是未来计算的 WebAssembly。在现实世界中,我们开始将 WebAssembly 作为一种轻量级运行时的替代方案与容器和虚拟机并存。在这里,我们想要审查如何在 RISC-V 上构建这种多运行时平台,其中 WebAssembly 和容器共存。我们将使 {WebAssembly,Docker} 能够部署到运行在真实 RISC-V 目标上的 RISC-V Linux,并进一步使其他开源实用工具能够适配到 RISC-V Linux 发行版,以帮助将工作负载适配到 RISC-V 上的 WebAssembly 和容器,以便探索加速 RISC-V 上开放软件生态系统的可能性。
Speakers
avatar for Tiejun Chen

Tiejun Chen

Sr. Technical Lead, VMware
Tiejun Chen was Sr. technical leader. He ever worked several tech companies such as VMware, Intel, Wind River Systems and so on, involved in - cloud native, edge computing, ML/AI, RISC-V, WebAssembly, etc. He ever made many presentations at AI.Dev NA 2023, kubecon China 2021, Kube... Read More →
Friday August 23, 2024 10:35am - 11:10am HKT
Level 1 | Hung Hom Room 5

11:25am HKT

Next Steps for the Ingress-NGINX Project | Ingress-NGINX项目的下一步计划 - Jintao Zhang, Kong Inc.
Friday August 23, 2024 11:25am - 12:00pm HKT
The Ingress-NGINX project is the most widely used ingress-controller project globally. As a community-maintained open-source project, with the release of Gateway API GA, more and more people are paying attention to what the next plan for the Ingress-NGINX project is and what updates have been made recently.

Ingress-NGINX项目是全球范围内最广泛使用的Ingress控制器项目。作为一个由社区维护的开源项目,随着Gateway API GA的发布,越来越多的人开始关注Ingress-NGINX项目的下一个计划以及最近的更新。
Speakers
avatar for Jintao Zhang

Jintao Zhang

Sr. SE, Kong
Jintao Zhang is a Microsoft MVP, CNCF Ambassador, Apache PMC, and Kubernetes Ingress-NGINX maintainer, he is good at cloud-native technology and Azure technology stack. He worked for Kong Inc.
Friday August 23, 2024 11:25am - 12:00pm HKT
Level 1 | Hung Hom Room 6

11:25am HKT

Beyond Statefulset: Containerize Your Enterprise Stateful Applications in Practice | 超越StatefulSet:实践中将企业有状态应用容器化 - Mingshan Zhao, Alibaba Cloud & Vec Sun, xiaohongshu
Friday August 23, 2024 11:25am - 12:00pm HKT
Kubernetes provides StatefulSet to manage stateful services, but it is far from enough to run enterprise stateful applications in practice. For example: how does Zookeeper accomplish leader election, and how does MQ implement configuration hot loading? How to do daily operation and maintenance of the database? Many practitioners resort to operators that manages  pod directly e.g. KubeBlocks,  for specific applications e.g. database, yet  they are not general enough for  other stateful applications.   OpenKruise provides several stateful features that are missing in native StatefulSet, such as in-place resource and volume resizing, progressive Configmap & Secret hot update and container operation channel. Teams from Alibaba and Xiaohongshu will share their lessons to build operators and platforms for general stateful apps and containerize database and middleware with a scale of hundreds of thousands of pods.

Kubernetes提供了StatefulSet来管理有状态服务,但实际上要运行企业级有状态应用还远远不够。例如:Zookeeper如何完成领导者选举,MQ如何实现配置热加载?如何进行数据库的日常运维?许多从业者借助直接管理pod的运营商,例如KubeBlocks,针对特定应用程序,例如数据库,但它们并不足够通用以适用于其他有状态应用程序。 OpenKruise提供了一些在原生StatefulSet中缺失的有状态功能,例如原地资源和卷大小调整,渐进式Configmap和Secret热更新以及容器操作通道。来自阿里巴巴和小红书的团队将分享他们构建运营商和平台以适用于通用有状态应用程序,并将数据库和中间件容器化的经验,规模达数十万个pod。
Speakers
avatar for Mingshan Zhao

Mingshan Zhao

Senior R&D Engineer, Alibaba Cloud
Senior R&D Engineer of AliCloud, Maintainer of OpenKruise community, has long been engaged in the research and development of cloud native, containers, scheduling and other fields; core R&D member of Alibaba's one million container scheduling system, and many years of experience in... Read More →
avatar for Vec Sun

Vec Sun

software engineer, xiaohongshu
Sunweixiang has previously worked in the Alibaba Cloud container team as software engineer and is a contributor to the OpenKruise community's main, Karmada, and other communities. He is deeply involved in container application orchestration, multi-cluster.
Friday August 23, 2024 11:25am - 12:00pm HKT
Level 1 | Hung Hom Room 1
  KubeCon + CloudNativeCon Sessions, Platform Engineering

11:25am HKT

Evolution of SPDK Vhost-FS Solution to Accelerate File Access in VMs and Secure Containers | SPDK Vhost-FS解决方案的演进,加速虚拟机中的文件访问并保护容器 - Changpeng Liu, Intel
Friday August 23, 2024 11:25am - 12:00pm HKT
Virtio-fs is a shared file system between virtual machines or secure containers and host, Storage Performance Development Kit(SPDK) vhost-fs is the backend implementation of virtio-fs in userspace, in this presentation, we will summarize typical storage solutions that use SPDK vhost-fs and components to build the storage stack, then go through the evolution of SPDK vhost-fs from BlobFS to latest FSDEV module, advanced features such as interrupt mode and thread modeling for data processing in SPDK vhost-fs are also covered.

Virtio-fs是虚拟机或安全容器与主机之间共享文件系统,Storage Performance Development Kit(SPDK) vhost-fs是virtio-fs在用户空间的后端实现。在本次演讲中,我们将总结使用SPDK vhost-fs和组件构建存储栈的典型存储解决方案,然后介绍SPDK vhost-fs从BlobFS到最新的FSDEV模块的演变过程,还将涵盖SPDK vhost-fs中用于数据处理的高级功能,如中断模式和线程建模。
Speakers
avatar for Changpeng Liu

Changpeng Liu

Cloud Solution Architect, Intel
Changpeng is a Cloud Solution Architect at Intel. He has been working on Storage Performance Development Kit since 2014. Currently, Changpeng is a core maintainer for the SPDK. His areas of expertise include NVMe, I/O Virtualization, and storage offload on IPU.
Friday August 23, 2024 11:25am - 12:00pm HKT
Level 2 | Grand Ballroom 1-2

11:25am HKT

LLM's Anywhere: Browser Deployment with Wasm & WebGPU | LLM随处可用:使用Wasm和WebGPU进行浏览器部署 - Joinal Ahmed, Navatech Group & Nikhil Rana, Google Cloud
Friday August 23, 2024 11:25am - 12:00pm HKT
In today's interconnected world, deploying and accessing machine learning (ML) models efficiently poses significant challenges. Traditional methods rely on cloud GPU clusters and constant internet connectivity. However, WebAssembly (Wasm) and WebGPU technologies are revolutionizing this landscape. This talk explores leveraging Wasm and WebGPU for deploying Single Layer Models (SLMs) directly within web browsers, eliminating the need for extensive cloud GPU clusters and reducing reliance on constant internet access. We showcase practical examples and discuss how Wasm enables efficient cross-platform ML model execution, while WebGPU optimizes parallel computation within browsers. Join us to discover how this fusion empowers developers and users alike with unprecedented ease and efficiency in browser-based ML, while reducing dependence on centralized cloud infrastructure and internet connectivity constraints.

在当今互联世界中,高效部署和访问机器学习(ML)模型面临着重大挑战。传统方法依赖于云GPU集群和持续的互联网连接。然而,WebAssembly(Wasm)和WebGPU技术正在彻底改变这一局面。本次演讲探讨了如何利用Wasm和WebGPU在Web浏览器中直接部署单层模型(SLMs),消除了对庞大云GPU集群的需求,减少了对持续互联网访问的依赖。我们展示了实际示例,并讨论了Wasm如何实现高效的跨平台ML模型执行,以及WebGPU如何优化浏览器内的并行计算。加入我们,发现这种融合如何赋予开发人员和用户在基于浏览器的ML中前所未有的便利和效率,同时减少对集中式云基础设施和互联网连接的依赖。
Speakers
avatar for Joinal Ahmed

Joinal Ahmed

AI Architect, Navatech Group
Joinal is a seasoned Data Science expert passionate about rapid prototyping, community involvement, and driving technology adoption. With a robust technical background, he excels in leading diverse teams through ML projects, recruiting and mentoring talent, optimizing workflows, and... Read More →
avatar for Nikhil Rana

Nikhil Rana

AI Consultant, Google Cloud
Nikhil is an applied data science professional with over a decade of experience in developing and implementing Machine learning, Deep Learning, and NLP-based solutions for a variety of industries like Finance, FMCG, etc. He is a passionate advocate for the use of data science to solve... Read More →
Friday August 23, 2024 11:25am - 12:00pm HKT
Level 1 | Hung Hom Room 3
  KubeCon + CloudNativeCon Sessions, AI + ML

11:25am HKT

New Advances for Cross-Platform AI Applications in Docker | Docker中跨平台AI应用程序的新进展 - Michael Yuan, Second State
Friday August 23, 2024 11:25am - 12:00pm HKT
The talk proposes to delve into novel methods for enhancing cross-platform GPU/AI workloads within container ecosystems, with a specific emphasis on Docker's incorporation of the WebGPU standard. This standard empowers containerized applications to utilize host GPUs and additional AI accelerators via a flexible API. Consequently, there's no longer a necessity to construct Docker images tailored to individual GPU vendors and their proprietary drivers. The presentation will feature a demonstration highlighting how the WasmEdge project capitalizes on the WebGPU standard to craft portable LLM inference applications in Rust. Additionally, Docker's seamless management and orchestration of these applications will be showcased.

本次演讲旨在探讨增强容器生态系统中跨平台GPU/AI工作负载的新方法,特别强调Docker对WebGPU标准的整合。该标准使容器化应用程序能够通过灵活的API利用主机GPU和额外的AI加速器。因此,不再需要构建针对个别GPU供应商及其专有驱动程序的Docker镜像。演示将展示WasmEdge项目如何利用WebGPU标准在Rust中创建可移植的LLM推理应用程序。此外,还将展示Docker对这些应用程序的无缝管理和编排能力。
Speakers
avatar for Michael Yuan

Michael Yuan

Product Manager, Second State
Dr. Michael Yuan is a maintainer of WasmEdge Runtime (a project under CNCF) and a co-founder of Second State. He is the author of 5 books on software engineering published by Addison-Wesley, Prentice-Hall, and O'Reilly. Michael is a long-time open-source developer and contributor... Read More →
Friday August 23, 2024 11:25am - 12:00pm HKT
Level 1 | Hung Hom Room 2
  KubeCon + CloudNativeCon Sessions, AI + ML

11:25am HKT

Rollout Patterns: Smoothly Migrating and Rolling Out Your Microservices | 部署模式:平稳迁移和部署您的微服务 - Tim Xiao, DaoCloud & Wu Chenhui, AS.Watson TechLab
Friday August 23, 2024 11:25am - 12:00pm HKT
At Watsons, most of their services are built on Dubbo. Now, they aim to utilize delivery tools like Argo CD and Argo Rollouts to automatically and securely deliver their services. However, they have encountered complexities beyond what Argo Rollouts assumes. We will summarize these patterns and demonstrate how to handle them, including: - Pattern 1: One service at a time. - Pattern 2: Multiple services, each forward-compatible. - Pattern 3: Multiple services with version dependency.

在Watsons,他们的大多数服务都是基于Dubbo构建的。现在,他们希望利用Argo CD和Argo Rollouts等交付工具来自动和安全地交付他们的服务。然而,他们遇到了超出Argo Rollouts假设的复杂性。我们将总结这些模式,并演示如何处理它们,包括: - 模式1:一次一个服务。 - 模式2:多个服务,每个都是向前兼容的。 - 模式3:具有版本依赖性的多个服务。
Speakers
avatar for 旸 肖

旸 肖

Developer, DaoCloud
Served as DevOps platform Principle Engineer in DaoCloud, participated in community projects including argo-cd, argo-rollouts, kubevela and other community projects, and has more than 5 years of kubernetes platform development experience.
avatar for Wu Chenhui

Wu Chenhui

architecture, AS.Watson TechLab
I have nearly 30 years of experience in software development and architecture design, and 5 years of experience in k8s, responsible for k8s related architecture design of Watsons Group
Friday August 23, 2024 11:25am - 12:00pm HKT
Level 1 | Hung Hom Room 7

11:25am HKT

How Does KubeEdge Build the Tunnel Which Is Secure, Trusted, and Adaptable to Edge Networks | KubeEdge如何构建适应边缘网络的安全可信隧道 - Wei Hu, DaoCloud
Friday August 23, 2024 11:25am - 12:00pm HKT
Edge Computing makes the connection broader, faster and more agile, meanwhile it also brings the threat of cyberattacks to the edge of the network, which also puts forward higher requirements for the safety at the edge side. In addition, due to any forms like Internet, 5G, WIFI and other forms are possible, the network environment will be complex and the quality can't be guaranteed in the edge scnee. Therefore, supporting weak network environments which is also a challenge at edge site. KubeEdge is a cloud-edge collaborative architecture project for Kubernetes native edge computing. KubeEdge uses its own trusted tunnel to ensure the security of data transmission, it verifies, encrypts and authenticates all communications in this tunnel. This tunnel ensures data accessibility through QoS and provides a QUIC protocol to improve the performance of network reordering in weak networks. We will share how the tunnel of KubeEdge achieves these goals in this session.

边缘计算使连接更广泛、更快速、更灵活,同时也将网络威胁带到了边缘,这也对边缘安全提出了更高的要求。此外,由于互联网、5G、WIFI等各种形式可能存在,边缘场景中的网络环境将变得复杂,质量无法保证。因此,支持弱网络环境也是边缘场景中的一个挑战。 KubeEdge是一个针对Kubernetes原生边缘计算的云边协作架构项目。KubeEdge使用自己的可信隧道来确保数据传输的安全性,它验证、加密和认证该隧道中的所有通信。该隧道通过QoS确保数据可访问性,并提供QUIC协议来改善弱网络中的网络重排序性能。在本场演讲中,我们将分享KubeEdge隧道如何实现这些目标。
Speakers
avatar for 炜 胡

炜 胡

Senior Software Engineer, DaoCloud
Wei Hu is a Senior Software Engineer at DaoCloud, currently working on Edge Computing Team. He is a maintainer of KubeEdge project and a regular contributor to it. He has rich experience in cloud-edge collaboration. He has given several speeches on the topic of edge computing at other... Read More →
Friday August 23, 2024 11:25am - 12:00pm HKT
Level 1 | Hung Hom Room 5
  Open Source Summit Sessions, Networking + Edge Computing

12:00pm HKT

Lunch 🍜 | 午餐
Friday August 23, 2024 12:00pm - 1:20pm HKT
Friday August 23, 2024 12:00pm - 1:20pm HKT
Level 2 | Grand Ballroom 3-4

1:20pm HKT

Constructing the 10x Efficiency of Cloud-Native AI Infrastructure | 如何让你的 AI 底座效能提升 10 倍? - Peter Pan, DaoCloud & 秋萍 戴, daocloud
Friday August 23, 2024 1:20pm - 1:55pm HKT
Enterprises keep invested in AI. But once GPU are installed in a data center, a challenge arises: how to construct an "AI cloud" atop bare-metal. Even when K8S is recognized as the foundational infrastructure for AI, But K8S only is merely the initial step. Organizations may face challenges: - Maximizing GPU utilization - Unifying multi-arch accelerators/GPUs (k8s DRA) - Organization quotas and cost management - Resource isolation among organizations - Smarter scheduling, tiered GPU allocation, task prioritization.. - Sharing GPU clusters between VMs & containers - Harnessing the full potential of high-speed networks , Storage optimization and dataset orchestration Leveraging open source stacks in Linux Foundation and CNCF, we've experience in building AI clouds for IDC or internal usage. We can share experiences to empower communities' journey towards constructing the 10x efficiency of cloud-native AI. Refer to `Additional resources` chapter for more details

企业继续投资于人工智能。但是一旦在数据中心安装了GPU,就会面临一个挑战:如何在裸金属之上构建一个“AI云”。即使K8S被认为是AI的基础基础设施,但K8S只是一个起步。 组织可能面临的挑战包括: - 最大化GPU利用率 - 统一多架构加速器/GPU(k8s DRA) - 组织配额和成本管理 - 组织之间的资源隔离 - 更智能的调度,分层GPU分配,任务优先级... - 在虚拟机和容器之间共享GPU集群 - 充分利用高速网络的潜力,优化存储和数据集编排 利用Linux基金会和CNCF中的开源堆栈,我们在为IDC或内部使用构建AI云方面有经验。我们可以分享经验,以赋予社区构建云原生AI的效率提升10倍的旅程。 有关更多详细信息,请参考“附加资源”章节。
Speakers
avatar for Peter Pan

Peter Pan

VP of R&D Engineering, DaoCloud
├ DaoCloud R&D Engineering VP├ CNCF wg-AI (AI Working-Group) member├ Maintainer of a few CNCF projects (GithubID: panpan0000): CloudTTY, KuBean, HwameiStor├ Public Tech Events:└─ 2023 KubeCon SH Speaker (https://sched.co/1PTFI)└─ 2023 KubeCon EU Program Committee... Read More →
avatar for 秋萍 戴

秋萍 戴

product mananger, daocloud
QiuPing Dai is a senior Technology Product Manager at DaoCloud for 5 years and involved in Cloud Computing ( including Kubernetes Computing, Storage, Network) development work. Before that, Qiuping worked at IBM for Cloud Computing. QiuPing is interested in Storage, Network , Scheduling... Read More →
Friday August 23, 2024 1:20pm - 1:55pm HKT
Level 1 | Hung Hom Room 2

1:20pm HKT

Write Once Run Anywhere, but for GPUs | GPU 时代的“一次编写,到处运行” - Michael Yuan, Second State
Friday August 23, 2024 1:20pm - 1:55pm HKT
With the popularity of LLM apps, there is an increasing demand for running and scaling AI workloads in the cloud and on edge devices. Rust and Wasm offer a solution by providing a portable bytecode that abstracts hardware complexities. LlamaEdge is a lightweight, high-performance and cross-platform LLM inference runtime. Written in Rust and built on WasmEdge, LlamaEdge provides a standard WASI-NN API to developers. Developers only need to write against the API and compile to Wasm. The Wasm file can run on any device, where WasmEdge translates and routes Wasm calls to the underlying native libraries such as llama.cpp. This talk will discuss the design and implementation of LlamaEdge and show how it enables cross-platform LLM app development and deployment. We will also walk through several code examples from a basic sentence completion app, to a chat bot, to an RAG agent app with external knowledge in vector databases, to a Kubernetes managed app across a heterogeneous cluster.

随着LLM应用程序的流行,云端和边缘设备上运行和扩展AI工作负载的需求不断增加。Rust和Wasm通过提供一个抽象硬件复杂性的可移植字节码来提供解决方案。 LlamaEdge是一个轻量级、高性能和跨平台的LLM推理运行时。使用Rust编写,并构建在WasmEdge上,LlamaEdge为开发人员提供了一个标准的WASI-NN API。开发人员只需针对API编写代码并编译为Wasm。Wasm文件可以在任何设备上运行,WasmEdge将Wasm调用转换并路由到底层的本地库,如llama.cpp。 本次演讲将讨论LlamaEdge的设计和实现,并展示它如何实现跨平台的LLM应用程序开发和部署。我们还将从基本的句子补全应用程序、聊天机器人,到具有外部知识的矢量数据库中的RAG代理应用程序,再到跨异构集群的Kubernetes管理应用程序,演示几个代码示例。
Speakers
avatar for Michael Yuan

Michael Yuan

Product Manager, Second State
Dr. Michael Yuan is a maintainer of WasmEdge Runtime (a project under CNCF) and a co-founder of Second State. He is the author of 5 books on software engineering published by Addison-Wesley, Prentice-Hall, and O'Reilly. Michael is a long-time open-source developer and contributor... Read More →
Friday August 23, 2024 1:20pm - 1:55pm HKT
Level 1 | Hung Hom Room 3

1:20pm HKT

Inplace-Update: The Past, Present and Future | 原地更新:过去、现在和未来 - Zhang Zhen & Mingshan Zhao & Yuxing Yuan, Alibaba Cloud
Friday August 23, 2024 1:20pm - 1:55pm HKT
Inplace-update is a controversial technique that is considered bad practice by many cloud native fans. Nevertheless , Inplace-update is used by many practitioners to containerize stateful apps and to greatly speed up rolling progress of stateless apps. In recent versions of k8s, additional support of Inplace-update had emerges, e.g. volume resizing and vertical pod scaling. It is a pity that these features are not integrated with in-tree workload. OpenKruise integrates inplace-update as one of core features of advance workload, and have accumulated many real use cases. We will share the challenges to implement inplace update in k8s such as enhancing pod lifecycle hooks and support inplace-update triggered by metadata changes. In addition, we will share the in-progress effort to integrate recent k8s enhancement into kruise workload so as to enable more cases for inplace update. Finally, we will discuss about the possibility to bring inplace-update feature in in-tree

Inplace-update 是一种备受争议的技术,被许多云原生爱好者认为是不良实践。然而,许多从业者仍然使用 Inplace-update 来容器化有状态应用程序,并大大加快无状态应用程序的滚动进展。在最近的 k8s 版本中,出现了对 Inplace-update 的额外支持,例如卷调整大小和垂直 Pod 缩放。遗憾的是,这些功能尚未与 in-tree 工作负载集成。 OpenKruise 将 Inplace-update 集成为高级工作负载的核心功能之一,并积累了许多真实用例。我们将分享在 k8s 中实现 Inplace-update 的挑战,例如增强 Pod 生命周期钩子和支持通过元数据更改触发 Inplace-update。此外,我们将分享将最近的 k8s 增强集成到 Kruise 工作负载中的正在进行的努力,以便为 Inplace-update 启用更多案例。最后,我们将讨论将 Inplace-update 功能引入 in-tree 工作负载的可能性。
Speakers
avatar for Zhen Zhang

Zhen Zhang

staff engineer, Alibaba Cloud
Zhen Zhang has been working on the cluster management of software applications. he is driving the new cloud native innovation in Alibaba and focus mainly on the application management domain. He is one of main maintainer in OpenKruise project.
avatar for Mingshan Zhao

Mingshan Zhao

Senior R&D Engineer, Alibaba Cloud
Senior R&D Engineer of AliCloud, Maintainer of OpenKruise community, has long been engaged in the research and development of cloud native, containers, scheduling and other fields; core R&D member of Alibaba's one million container scheduling system, and many years of experience in... Read More →
avatar for Yuxing Yuan

Yuxing Yuan

高级开发工程师, Alibaba Cloud
Cloud-native focus with an AI interest.
Friday August 23, 2024 1:20pm - 1:55pm HKT
Level 1 | Hung Hom Room 6

1:20pm HKT

Build Container Runtime Based on Sandbox API of Containerd | 基于Containerd的Sandbox API构建容器运行时 - Shaobao Feng, Huawei Cloud & Cai Wei, DaoCloud
Friday August 23, 2024 1:20pm - 1:55pm HKT
Sandbox API is released in containerd 1.7 and will be stable in containerd 2.0. It provides a clean way to implement a sandbox oriented container runtime. Container is more a set of API specifications than a single technology now, with the introduction of different kinds of isolation techiques as sandboxes, We need a clear and abstract definition of Sandbox API, to make it easy to integrate different kinds of sandboxing techiniques to become a container runtime. In this sharing, We will: 1. Make an introduction of Sandbox API of containerd, and why we need it. 2. Show how we build our container runtimes based on the Sandobx API and the benefits comes with it. 3. We will show the demostration of different kinds of sandboxed containers created by Kuasar, a container runtime framework based on the new Sandbox API, currently supports sandboxes of VMM, UserMode Kernel, WebAssembly and Runc.

在KubeCon的会议描述中,我们将介绍Sandbox API在containerd 1.7中发布,并将在containerd 2.0中稳定。它提供了一种清晰的方式来实现面向沙箱的容器运行时。随着不同类型的隔离技术(如沙箱)的引入,容器现在更多地是一组API规范,而不是单一技术。我们需要对Sandbox API进行清晰和抽象的定义,以便轻松集成不同类型的沙箱技术,使其成为容器运行时。 在这次分享中,我们将: 1. 介绍containerd的Sandbox API,以及为什么我们需要它。 2. 展示我们如何基于Sandbox API构建我们的容器运行时以及带来的好处。 3. 我们将展示由基于新Sandbox API的容器运行时框架Kuasar创建的不同类型的沙箱容器的演示,目前支持VMM、UserMode Kernel、WebAssembly和Runc的沙箱。
Speakers
avatar for Wei Cai(Iceber Gu)

Wei Cai(Iceber Gu)

Software Engineer, DaoCloud
Senior open source enthusiast, focused on cloud runtime, multi-cloud and WASM. I am a CNCF Ambassador and founded Clusterpedia and promoted it as a CNCF Sandbox project. I also created KasmCloud to promote the integration of WASM with Kubernetes and contribute it to the WasmCloud... Read More →
avatar for Shaobao Feng

Shaobao Feng

Principal Engineer, Huawei Cloud
Shaobao is Principal Engineer working on Huawei Cloud, with his work focusing on the Serverless Platforms. He has been a leader in building secure container runtime of the first Serverless Kubernetes on public cloud. He is the main code contributor and maintainer of the open source... Read More →
Friday August 23, 2024 1:20pm - 1:55pm HKT
Level 1 | Hung Hom Room 1
  KubeCon + CloudNativeCon Sessions, Platform Engineering

1:20pm HKT

JuiceFS CSI in Multi-Thousand Node Kubernetes Clusters for LLM Pre-Training | JuiceFS CSI在LLM预训练中用于几千节点Kubernetes集群 - Weiwei Zhu, juicedata
Friday August 23, 2024 1:20pm - 1:55pm HKT
The rapid advancement of artificial intelligence technologies, especially the development of large language models (LLMs), has led to a sharp increase in the amount of data that enterprises need to process. Managing large-scale data clusters in Kubernetes environments presents several challenges, including storage performance, complex access control management and system stability. JuiceFS is a distributed POSIX file system designed for cloud. It was open-sourced in 2021( 9.8k stars) To deliver an optimal experience in Kubernetes, JuiceFS developed JuiceFS CSI Driver. In addition, JuiceFS CSI introduced several new designs to support large-scale, complex AI training tasks such as the mount pod mode and the sidecar mode for serverless environments. Outline: - LLM Storage challenges - JuiceFS CSI Driver Architectural - Mount pod mode\Sidecar mode - Practical experience - Future

人工智能技术的快速发展,特别是大型语言模型(LLMs)的发展,导致企业需要处理的数据量急剧增加。在Kubernetes环境中管理大规模数据集群面临着多个挑战,包括存储性能、复杂的访问控制管理和系统稳定性。 JuiceFS是一种为云设计的分布式POSIX文件系统。它于2021年开源(拥有9.8k星)。为了在Kubernetes中提供最佳体验,JuiceFS开发了JuiceFS CSI驱动程序。此外,JuiceFS CSI引入了几项新设计,以支持大规模、复杂的人工智能训练任务,如挂载Pod模式和用于无服务器环境的Sidecar模式。 大纲: - LLM存储挑战 - JuiceFS CSI驱动程序架构 - 挂载Pod模式\Sidecar模式 - 实践经验 - 未来
Speakers
avatar for Weiwei Zhu

Weiwei Zhu

Full stack engineer, juicedata
She is a full-stack engineer of Juicedata.Inc, maintainer of JuiceFS CSI driver and Fluid. She is responsible for development and maintenance of JuiceFS in the Cloud-Native ecosystem, completed the implementation and practice of JuiceFS in Kubernetes, and continued to improve the... Read More →
Friday August 23, 2024 1:20pm - 1:55pm HKT
Level 2 | Grand Ballroom 1-2

1:20pm HKT

What if Your System Experiences an Outage? Let's Build a Resilient Systems with Chaos Engineering | 如果您的系统遇到故障怎么办?让我们通过混沌工程构建弹性系统 - NamKyu Park, LitmusChaos
Friday August 23, 2024 1:20pm - 1:55pm HKT
This session explores how LitmusChaos improves the resilience of cloud-native applications by injecting chaos. It also showcases the streamlined management of chaos engineering software through Backstage. Cloud-native applications can be complex to navigate and secure. Our session will present strategies to identify vulnerabilities using GitOps and monitoring, integrated seamlessly into your system. Learn how Backstage and LitmusChaos can enhance your application's resilience with ease! The session starts with chaos orchestration and analysis using LitmusChaos, followed by a live demo highlighting the utilization of LitmusChaos' Backstage plugin and others like Prometheus and ArgoCD. Learn how these plugins, when integrated with Backstage, effectively manage all components necessary for executing chaos engineering.

本场演讲探讨了LitmusChaos如何通过注入混沌来提高云原生应用程序的弹性。它还展示了通过Backstage简化混沌工程软件的管理。 云原生应用程序可能很复杂,难以导航和保护。我们的会议将介绍使用GitOps和监控来识别漏洞的策略,无缝集成到您的系统中。了解如何使用Backstage和LitmusChaos轻松增强您的应用程序的弹性! 本场演讲从使用LitmusChaos进行混沌编排和分析开始,然后展示了使用LitmusChaos的Backstage插件以及其他插件如Prometheus和ArgoCD的实时演示。了解这些插件与Backstage集成后,如何有效管理执行混沌工程所需的所有组件。
Speakers
avatar for Namkyu Park

Namkyu Park

Maintainer, LitmusChaos
Namkyu Park is a CNCF Ambassador and a Software Developer. He worked at several startups in South Korea. He has completed Linux Foundation Mentorship Programme(LitmusChaos) as a mentee and is currently a mentor and maintainer of LitmusChaos. He has previously spoken at GopherCon Korea... Read More →
Friday August 23, 2024 1:20pm - 1:55pm HKT
Level 1 | Hung Hom Room 7

1:20pm HKT

Java Me Smarter: Unleashing AI Power with Quarkus | Java让我更聪明:用Quarkus释放人工智能的力量 - Daniel Oh, Red Hat
Friday August 23, 2024 1:20pm - 1:55pm HKT
Feeling stuck in a rut with traditional Java development? This session injects a shot of AI innovation to supercharge your Java skills! Daniel will dive into Quarkus, a modern Java framework perfectly suited for building microservices, and explore how it seamlessly integrates with cutting-edge AI functionalities. Get ready to: - Boost Your Java IQ: Learn how Quarkus streamlines development and empowers you to build scalable, high-performance microservices. - Unleash the AI Powerhouse: Discover how to leverage AI capabilities within your Java applications. Daniel will explore real-world use cases, from intelligent data analysis and machine learning to chatbots and recommendation engines. - AI Made Easy: See how Quarkus simplifies the integration of AI models and services into your Java codebase, making AI development more accessible than ever. - Witness the Future: Uncover the exciting possibilities that emerge when you combine the power of Java with AI.

在传统的Java开发中感到困境?本次会话将为您的Java技能注入一剂AI创新的强心剂!Daniel将深入探讨Quarkus,这是一个现代化的Java框架,非常适合构建微服务,并探索它如何与尖端的AI功能无缝集成。准备好了吗: - 提升您的Java智商:了解Quarkus如何简化开发,让您能够构建可扩展、高性能的微服务。 - 发挥AI强大功能:发现如何在您的Java应用程序中利用AI功能。Daniel将探讨真实的用例,从智能数据分析和机器学习到聊天机器人和推荐引擎。 - AI变得简单:看看Quarkus如何简化AI模型和服务与您的Java代码库的集成,使AI开发变得比以往更加易于访问。 - 见证未来:揭示当您将Java的力量与AI相结合时所产生的令人兴奋的可能性。
Speakers
avatar for Daniel Oh

Daniel Oh

Senior Principal Developer Advocate, Red Hat
Daniel Oh is a Java Champion and Senior Principal Developer Advocate at Red Hat, passionately promoting the development of cloud-native microservices and serverless functions using cloud-native runtimes. As a CNCF ambassador, he actively contributes to various open-source cloud projects... Read More →
Friday August 23, 2024 1:20pm - 1:55pm HKT
Level 1 | Hung Hom Room 5

2:10pm HKT

Unveiling the Future: Nurturing Openness in AI Development | 揭示未来:培育人工智能开放性发展 - Anni Lai, Futurewei & Mer Joyce, Do Big Good LLC
Friday August 23, 2024 2:10pm - 2:45pm HKT
In the rapidly evolving landscape of AI, the concept of openness emerges as a cornerstone for ethical, accountable, and sustainable development. This talk delves into the significance of fostering openness in AI endeavors, exploring two groundbreaking efforts: the Open Source AI Definition led by the Open Source Initiative (OSI) and Model Openness Framework (MOF) introduced by LF AI & Data Generative AI Commons. Through the lens of the OSI's definition co-design process, we'll navigate the evolving landscape of Open Source AI, deciphering its potential to democratize access to cutting-edge technology while fortifying principles of inclusivity and collaboration. We'll unravel the transformative potential of the MOF to foster transparency and trust in AI models. By elucidating the core tenets of the framework and the definition, we'll illuminate pathways for advancing responsible AI development.

在人工智能快速发展的领域中,开放性的概念成为道德、负责任和可持续发展的基石。本次演讲深入探讨了在人工智能努力中培育开放性的重要性,探索了两项开创性的工作:由开源倡议组织(OSI)领导的开源人工智能定义和LF AI & Data生成人工智能共同体引入的模型开放性框架(MOF)。 通过OSI定义的共同设计过程,我们将探索开源人工智能不断发展的领域,解读其潜力,使人们能够民主化获得尖端技术,同时巩固包容性和合作原则。 我们将揭示MOF在促进人工智能模型透明度和信任方面的转变潜力。通过阐明框架和定义的核心原则,我们将阐明推进负责任人工智能发展的途径。
Speakers
avatar for Anni Lai

Anni Lai

Head of Open Source Operations, Chair of Generative AI Commons, LF AI & Data, Futurewei
Anni drives Futurewei’s open source (O.S.) governance, process, compliance, training, project alignment, and ecosystem building. Anni has a long history of serving on various O.S. boards such as OpenStack Foundation, LF CNCF, LF OCI, LF Edge, and is on the LF OMF board and LF Europe... Read More →
avatar for Mer Joyce

Mer Joyce

Founder, Do Big Good LLC
Mer Joyce (she/her) is the founder of the co-design firm Do Big Good and is the facilitator of the Open Source Initiative's consultative process to co-design the Open Source AI Definition (OSAID). She has over a decade of international experience at the intersection of research, tech... Read More →
Friday August 23, 2024 2:10pm - 2:45pm HKT
Level 1 | Hung Hom Room 3

2:10pm HKT

Ensuring Success with Kyverno in Production: What You Must Know | 在生产环境中确保Kyverno成功:你必须知道的事项 - Shuting Zhao, Nirmata
Friday August 23, 2024 2:10pm - 2:45pm HKT
Deploying Kyverno in a production environment requires careful planning and consideration of key factors to ensure success. In this talk, we will delve into the essential aspects that organizations must understand when bringing Kyverno into production. From policy creation to enforcement, scalability, performance optimization, and integration with existing workflows, attendees will gain valuable insights into effectively leveraging Kyverno in a live environment. Moreover, the session will cover strategies for maintaining stability, resilience, and security when using Kyverno at scale. Participants will learn about common pitfalls to avoid, tips for troubleshooting, and proactive measures to enhance the overall production readiness of Kyverno deployments. Whether you are new to Kyverno or looking to optimize your existing implementation, this talk will equip you with the knowledge and guidance needed to successfully navigate the complexities of deploying Kyverno in a production setting.

在生产环境中部署Kyverno需要仔细规划和考虑关键因素,以确保成功。在这次讨论中,我们将深入探讨组织在将Kyverno引入生产环境时必须了解的基本方面。从策略创建到执行、可扩展性、性能优化和与现有工作流程的集成,与会者将获得有关如何在实际环境中有效利用Kyverno的宝贵见解。此外,本场演讲还将涵盖在规模化使用Kyverno时保持稳定性、弹性和安全性的策略。参与者将了解要避免的常见陷阱、故障排除的技巧以及增强Kyverno部署整体生产准备性的积极措施。无论您是初次接触Kyverno还是希望优化现有实施,本次讨论将为您提供成功部署Kyverno在生产环境中的复杂性所需的知识和指导。
Speakers
avatar for Shuting Zhao

Shuting Zhao

Staff Engineer, Nirmata
Shuting Zhao is a Kyverno maintainer and a Staff Engineer at Nirmata. Her passion for open source extends beyond her professional role, as she has also taken on the role of mentor for several LXF mentorship programs since March 2021, she enjoys helping others contribute to open source... Read More →
Friday August 23, 2024 2:10pm - 2:45pm HKT
Level 1 | Hung Hom Room 6

2:10pm HKT

Developing a Standard Multi-Cluster Inventory API | 开发标准的多集群Inventory API - Zhiying Lin & Chen Yu, Microsoft; Hongcai Ren, Huawei; Di Xu, Xiaohongshu; Jian Qiu, Redhat
Friday August 23, 2024 2:10pm - 2:45pm HKT
With one year's effort, the kubernetes community has made great progress on final approval of the cluster inventory API project. The project has gained a lot of attention and interest from different companies and open source projects, with many new use cases being explored. This panel discussion brings together maintainers from different multicluster management projects who bootstraps this project. We will share what is cluster inventory API, and how we get there. We will also introduce the ongoing work and emerging use cases on this project, and our vision for the future plan. During the panel discussion, attendees will gain a comprehensive understanding of the use cases, eg, how to support multi-cluster AI workload scheduling using inventory API, and challenges, eg how to migrate a cluster manager tool to another seamlessly. We will shed light on the collaborative efforts to standardize cluster inventory APIs and how it evolves from a small group discussion to the community effort.

经过一年的努力,Kubernetes社区在最终批准集群清单API项目方面取得了巨大进展。该项目受到了不同公司和开源项目的关注和兴趣,许多新的用例正在被探索。本次小组讨论将汇集来自不同多集群管理项目的维护者,他们启动了这个项目。我们将分享什么是集群清单API,以及我们是如何实现的。我们还将介绍该项目的正在进行的工作和新兴用例,以及我们对未来计划的愿景。在小组讨论期间,与会者将全面了解用例,例如如何使用清单API支持多集群AI工作负载调度,以及挑战,例如如何无缝迁移集群管理工具。我们将阐明协作努力以标准化集群清单API,并介绍它是如何从一个小组讨论演变为社区努力的。
Speakers
avatar for Di Xu

Di Xu

Principle Software Engineer, Xiaohongshu
Currently, he serves as a Tech Lead at Xiaohongshu, where he leads a team focused on building a highly reliable and scalable container platform. He is the founder of CNCF Sandbox Project Clusternet. Also, he is a top 50 code contributor in Kubernetes community. He had spoken many... Read More →
avatar for Chen Yu

Chen Yu

Senior Software Engineer, Microsoft
Chen Yu is a senior software engineer at Microsoft with a keen interest in cloud-native computing. He is currently working on Multi-Cluster Kubernetes and contributing to the Fleet project open-sourced by Azure Kubernetes Service.
avatar for Zhiying Lin

Zhiying Lin

PRINCIPAL SOFTWARE ENGINEER, Microsoft
I'm a PRINCIPLE SOFTWARE ENGINEER at micosoft and my main contribution is the Azure Kubernetes Fleet Manager product. I'm one of the main maintainers of open source project Azure/fleet & Azure/fleet-networking.
Friday August 23, 2024 2:10pm - 2:45pm HKT
Level 1 | Hung Hom Room 1
  KubeCon + CloudNativeCon Sessions, Platform Engineering

2:10pm HKT

KuaiShou's 100% Resource Utilization Boost: 100K Redis Migration from Bare Metal to Kubernetes | 快手的100%资源利用率提升:从裸机迁移100K Redis到Kubernetes - XueQiang Wu, ApeCloud & YuXing Liu, Kuaishou
Friday August 23, 2024 2:10pm - 2:45pm HKT
In the past year, Kuaishou successfully migrated nearly 100,000 Redis instances from traditional bare metal environments to the Kubernetes platform, achieving a significant doubling of resource utilization. While ensuring business stability, this large-scale migration faced numerous challenges, including smooth migration execution, finding a balance between increasing deployment density (resource utilization) and ensuring system stability, avoiding interference with other services during coexistence, and addressing specific issues associated with stateful services like databases (including data management, configuration management, ensuring high availability, cross-cluster disaster recovery, etc.). This session will share Kuaishou's large-scale practical experience in Redis cloud-native transformation, in collaboration with the open-source project KubeBlocks, covering aspects such as smooth migration, resource efficiency improvement, and efficient database management.

在过去的一年中,快手成功将近10万个Redis实例从传统裸机环境迁移到Kubernetes平台,实现资源利用率显著翻倍。在确保业务稳定性的同时,这一大规模迁移面临诸多挑战,包括顺利执行迁移、在增加部署密度(资源利用率)和确保系统稳定性之间找到平衡、在共存期间避免与其他服务的干扰,以及解决与数据库等有状态服务相关的特定问题(包括数据管理、配置管理、确保高可用性、跨集群灾难恢复等)。 本场演讲将分享快手在Redis云原生转型方面的大规模实践经验,与开源项目KubeBlocks合作,涵盖顺利迁移、资源效率提升和高效数据库管理等方面。
Speakers
avatar for yuxing liu

yuxing liu

senior software engineer, Kuaishou
I have worked in the cloud-native teams of Alibaba Cloud and Kuaishou, focusing on the cloud-native field and gaining experience in open source, commercialization, and scaling of cloud-native technologies. I am one of the maintainers of the CNCF/Dragonfly project and also one of the... Read More →
avatar for XueQiang Wu

XueQiang Wu

Director of Research and Development, ApeCloud
Former tech leader at Alibaba Cloud PolarDB-X, a cloud-native distributed database, with a wide range of interests and expertise in operating systems, cryptography, distributed systems, and more. Joined the PolarDB-X team in 2017, focusing on the development of high-concurrency, low-latency... Read More →
Friday August 23, 2024 2:10pm - 2:45pm HKT
Level 2 | Grand Ballroom 1-2

2:10pm HKT

Model Service Mesh: A New Paradigm for Large-Scale AI Model Service Deployment and Management | 模型服务网格:大规模AI模型服务部署和管理的新范式 - Xi Ning Wang, Alibaba Cloud & Huailong Zhang, Intel China
Friday August 23, 2024 2:10pm - 2:45pm HKT
As AI/ML models grow in scale and complexity, how to efficiently deploy and manage model service in cloud-native environments has become a significant challenge. This proposal will introduce the Model Service Mesh (MSM), an emerging architectural paradigm designed specifically for large-scale AI model service deployment and management, to address the challenge. This new paradigm focuses on: 1. How to build a highly scalable and reliable model delivery system and the key features include dynamic model service routing, unified management for multi-models within single endpoint, an optimized caching layer, and cache-aware scheduling,etc. 2. How to leverage the MSM to optimize AI models service in lifecycle management, resource utilization improvement, security enhancement, and observability and resilience insurance. In essence, this architecture ensures a scalable, secure, and efficient model service in cloud native environment.

随着人工智能/机器学习模型规模和复杂性的增长,如何在云原生环境中高效部署和管理模型服务已成为一个重大挑战。本提案将介绍模型服务网格(MSM),这是一种专门为大规模人工智能模型服务部署和管理而设计的新兴架构范式,旨在解决这一挑战。这种新范式关注以下几点: 1. 如何构建一个高度可扩展和可靠的模型交付系统,关键特性包括动态模型服务路由、单个端点内多模型的统一管理、优化缓存层和缓存感知调度等。 2. 如何利用MSM优化人工智能模型服务的生命周期管理、资源利用率改善、安全增强以及可观察性和弹性保障。 总的来说,这种架构确保了在云原生环境中可扩展、安全和高效的模型服务。
Speakers
avatar for 王夕宁

王夕宁

Technical Leader, Alibaba Cloud
Wang Xining, senior technical expert of Alibaba Cloud, technical leader of ACK(Kubernetes)/ASM(Service Mesh) , focusing on Kubernetes, service mesh and other cloud native fields. Previously worked in the IBM as tech architect focusing on SOA/Cloud and served as the chairman of the... Read More →
avatar for Huailong Zhang

Huailong Zhang

Cloud Software Engineer, Intel China
Steve(Huailong) Zhang has worked for Alcatel-Lucent, Baidu and IBM to engage in cloud computing research and development. Huailong is currently working for Intel China as a cloud-native software engineer, focusing on cloud-native technical fields, such as kubernetes and service mesh... Read More →
Friday August 23, 2024 2:10pm - 2:45pm HKT
Level 1 | Hung Hom Room 2
  KubeCon + CloudNativeCon Sessions, AI + ML

2:10pm HKT

Opportunities and Challenges of Cloud Native Technology in US Healthtech | 美国健康科技中云原生技术的机遇与挑战 - Katerina Arzhayev, SUSE
Friday August 23, 2024 2:10pm - 2:45pm HKT
In this session I will share the strategic roadmap for Cloud Native Technology companies eyeing expansion into the intricate US healthcare market. Delving into the multifaceted landscape of American healthcare, the session navigates through its complexities, from the dichotomy of public and private sectors to the nuanced regulatory framework dominated by HIPAA and FDA regulations. By illuminating Cloud Native Technology's transformative potential, particularly in fostering interoperability, enhancing telehealth capabilities, and empowering data analytics, the session showcases how innovation can meet the industry's pressing needs. Moreover, it sheds light on the indispensable considerations for market entry, emphasizing regulatory compliance, trust-building with healthcare stakeholders, and the imperative of market localization. Attendees will be equipped with a strategic playbook to navigate the intricate terrain of US healthtech.

在这场演讲上,我将分享云原生技术公司进军美国复杂医疗市场的战略路线。深入探讨美国医疗保健的多层面景观,本场演讲将引导参与者了解其复杂性,从公共和私营部门的对立到以HIPAA和FDA法规为主导的细致监管框架。通过阐明云原生技术的变革潜力,特别是在促进互操作性、增强远程医疗能力和赋能数据分析方面,本场演讲展示了创新如何满足行业迫切需求。此外,它还揭示了进入市场的不可或缺的考虑因素,强调了监管合规性、与医疗保健利益相关者建立信任以及市场本地化的必要性。与会者将获得一份战略指南,帮助他们在美国医疗科技领域的复杂地形中航行。
Speakers
avatar for Katerina Arzhayev

Katerina Arzhayev

Director of Product Management, Healthcare Edge, SUSE
Katerina Arzhayev is experienced in cross-cultural collaboration and technology strategy. She has a proven track record of driving business results through effective communication and strategic planning. Katerina's expertise lies in making highly complicated topics accessible to non-technical... Read More →
Friday August 23, 2024 2:10pm - 2:45pm HKT
Level 1 | Hung Hom Room 7
  KubeCon + CloudNativeCon Sessions, Cloud Native Experience

2:45pm HKT

Coffee Break ☕ | 茶歇
Friday August 23, 2024 2:45pm - 3:15pm HKT
Friday August 23, 2024 2:45pm - 3:15pm HKT
Level 2 | Foyer

3:15pm HKT

Detecting and Overcoming GPU Failures During ML Training | 在ML训练过程中检测和克服GPU故障 - Ganeshkumar Ashokavardhanan, Microsoft & Sarah Belghiti, Wayve
Friday August 23, 2024 3:15pm - 3:50pm HKT
Scaling ML training demands powerful GPU infrastructure, and as model sizes and training scale increases, GPU failures become an expensive risk. From outright hardware faults to subtle performance degradation, undetected GPU problems can sabotage training jobs, inflating costs and slowing development. This talk dives into GPU failure challenges in the context of ML training, particularly distributed training. We will explore the spectrum of GPU issues, and why even minor performance drops can cripple large jobs. Learn how observability (leveraging tools like NVIDIA DCGM) enables proactive problem detection through GPU health checks. Understand principles of fault-tolerant distributed training to mitigate GPU failure fallout. Drawing on cloud provider and autonomous vehicle company experience, we will share best practices for efficient identification, remediation, and prevention of GPU failures. We will also explore cutting-edge ideas like CRIU and task pre-emption for GPU workloads.

随着模型规模和训练规模的增加,机器学习训练需要强大的GPU基础设施,而GPU故障成为一种昂贵的风险。从硬件故障到性能逐渐下降,未被发现的GPU问题可能会破坏训练任务,增加成本并减缓开发速度。本次演讲将深入探讨在机器学习训练中GPU故障所带来的挑战,特别是在分布式训练中。我们将探讨各种GPU问题的范围,以及为什么即使是轻微的性能下降也可能瘫痪大型任务。 了解如何通过观测性(利用诸如NVIDIA DCGM之类的工具)通过GPU健康检查实现问题的主动检测。了解容错分布式训练的原则,以减轻GPU故障的后果。借鉴云服务提供商和自动驾驶汽车公司的经验,我们将分享高效识别、纠正和预防GPU故障的最佳实践。我们还将探讨像CRIU和任务抢占等尖端想法,以应对GPU工作负载。
Speakers
avatar for Ganeshkumar Ashokavardhanan

Ganeshkumar Ashokavardhanan

Software Engineer, Microsoft
Ganesh is a Software Engineer on the Azure Kubernetes Service team at Microsoft, working on node lifecycle, and is the lead for the GPU workload experience on this kubernetes platform. He collaborates with partners in the ecosystem like NVIDIA to support operator models for machine... Read More →
avatar for Sarah Belghiti

Sarah Belghiti

ML Platform Engineer, Wayve
Sarah Belghiti is an ML Platform Engineer at Wayve, a leading developer of embodied intelligence for autonomous vehicles. She works on the infrastructure, scheduling and monitoring of ML workloads. With GPUs becoming an increasingly scarce resource, her focus has been on building... Read More →
Friday August 23, 2024 3:15pm - 3:50pm HKT
Level 1 | Hung Hom Room 3

3:15pm HKT

Dragonfly: Intro, Updates and Ant Group's Practice of Accelerating Model Distribution in Ray Serving | Dragonfly:介绍、更新和蚂蚁集团在Ray Serving中加速模型分发的实践 - Wenbo Qi, Ant Group & Qixiang Chen, AntGroup
Friday August 23, 2024 3:15pm - 3:50pm HKT
Dragonfly provides efficient, stable and secure file distribution and image acceleration based on P2P technology to be the best practice and standard solution in cloud native architectures. In this talk, there is an introduction to dragonfly and the features of the latest version, and AI model distribution practice in AI inference. Additionally, Ray utilizes Dragonfly as its file distribution solution for the large-scale cluster. Subsequently, we will introduce practical problems of model distribution in LLM and multi-media service, and how Ray solves them in Ant Group's production environment.

Dragonfly提供了基于P2P技术的高效、稳定和安全的文件分发和图像加速,成为云原生架构中的最佳实践和标准解决方案。在本次讨论中,将介绍Dragonfly及最新版本的特性,以及在AI推理中的AI模型分发实践。此外,Ray将Dragonfly作为其大规模集群的文件分发解决方案。随后,我们将介绍LLM和多媒体服务中模型分发的实际问题,以及Ray在蚂蚁集团生产环境中如何解决这些问题。
Speakers
avatar for Wenbo Qi

Wenbo Qi

Senior Software Engineer, Ant Group
Wenbo Qi is a software engineer at Ant Group working on Dragonfly. He is a maintainer of the Dragonfly. He hopes to do some positive contributions to open source software and believe that fear springs from ignorance.
avatar for Qixiang Chen

Qixiang Chen

Engineer, AntGroup
Qixiang Chen is a software engineer at Ray team of Ant Group. His research interests include distributed systems and System4ML, and he published several papers in academic conferences and journals. Based on the research experience, he is the main author of Rayfed - a distributed federated... Read More →
Friday August 23, 2024 3:15pm - 3:50pm HKT
Level 1 | Hung Hom Room 6

3:15pm HKT

Expanding Cloud Native Capabilities with WASM: A Case Study of Harbor and WASM Integration | 通过WASM扩展云原生能力:Harbor和WASM集成案例研究 - Chenyu Zhang, AntGroup & Yan Wang, Broadcom
Friday August 23, 2024 3:15pm - 3:50pm HKT
In the cloud-native realm, eBPF's versatility has led to scalable solutions in observability and security by attaching to system event checkpoints without kernel code modification. This concept has paved the way for extending business applications non-invasively and flexibly without altering the original code. In this session, we'll use Harbor, the cloud-native artifact registry, to showcase how WASM (WebAssembly) extends Harbor's functionalities without code modification. Here, Harbor is analogous to the Linux kernel, and WASM to user-provided eBPF programs. Harbor provides mounting points for various events, such as pre-pull requests, enabling users to filter requests with custom WASM programs. This facilitates fine-grained permission control and artifact security auditing before a user pulls the artifacts, with more features to discover.

在云原生领域,eBPF 的多功能性使得它能够通过附加到系统事件检查点而无需修改内核代码,从而实现可扩展的可观测性和安全性解决方案。这一概念为在不改变原始代码的情况下非侵入性和灵活地扩展业务应用程序铺平了道路。 在本场演讲中,我们将使用 Harbor,云原生制品注册表,展示如何使用 WASM(WebAssembly)在不修改代码的情况下扩展 Harbor 的功能。在这里,Harbor 类似于 Linux 内核,而 WASM 则类似于用户提供的 eBPF 程序。Harbor 提供了各种事件的挂载点,例如预拉取请求,使用户能够使用自定义的 WASM 程序过滤请求。这有助于在用户拉取制品之前进行细粒度的权限控制和制品安全审计,还有更多功能等待您去发现。
Speakers
avatar for Yan Wang

Yan Wang

Staff engineer, Broadcom
Yan Wang is a Staff engineer working on VMWare. As one of the core maintainer of CNCF project Harbor and the maintainer of CNCF project distribution, his main work focuses on technology research and innovation in the cloud native field.
avatar for Chenyu Zhang

Chenyu Zhang

Software Engineer, AntGroup
Chenyu Zhang is a software engineer, currently mainly responsible for the development and maintenance of project harbor, and also has some experience in devops and cloud native related technology stacks.
Friday August 23, 2024 3:15pm - 3:50pm HKT
Level 1 | Hung Hom Room 1
  KubeCon + CloudNativeCon Sessions, Platform Engineering

3:15pm HKT

No More Runtime Setup! Let's Bundle, Distribute, Deploy, Scale LLMs Seamlessly with Ollama Operator | 无需运行时设置!让我们使用Ollama Operator轻松捆绑、分发、部署、扩展LLMs - Fanshi Zhang, DaoCloud
Friday August 23, 2024 3:15pm - 3:50pm HKT
Seeking out a way to ship LLMs more seamlessly? Way too complicated to manage, composite, and setup a runtime with Python, C++, CUDA, GPUs when deploying LLMs? Tired of fighting against dependencies, model sizes, syncing deliverable model images across nodes? It's true that people often find it hard to bundle, distribute, deploy, and scale their own LLM workloads, but no worries, here is Ollama Operator, a scheduler, and utilizer for LLM models powered by Modelfile introduced by Ollama. You can now enjoy then unified bundled, runtime powered by llama.cpp with simple lines of CRD definition or the natively included kollama CLI with single command line, bundling, distributing, deploying, scaling of LLMs can never be easily and seamlessly accomplished across OS and environments. Let's dive in and find out what Ollama Operator with Ollama can do to deploy our own large langaugae models, what can we do and combine these features with Modelfile then bring them into the Kubernetes world!

寻找一种更无缝地运输LLM的方式?在部署LLM时,使用Python、C++、CUDA、GPU设置运行时太复杂?厌倦了与依赖、模型大小、在节点间同步可交付模型图像等问题作斗争? 人们常常发现很难捆绑、分发、部署和扩展自己的LLM工作负载,但不用担心,这里有Ollama Operator,一个由Ollama引入的基于Modelfile的LLM模型调度器和利用者。现在,您可以通过简单的CRD定义行或内置的kollama CLI命令行,享受由llama.cpp提供支持的统一捆绑运行时,轻松实现LLM的捆绑、分发、部署和扩展,跨操作系统和环境都可以轻松实现。 让我们深入了解一下Ollama Operator与Ollama能够做些什么来部署我们自己的大型语言模型,我们可以如何结合这些功能与Modelfile,然后将它们带入Kubernetes世界!
Speakers
avatar for Neko Ayaka

Neko Ayaka

Software Engineer, DaoCloud
Cloud native developer, AI researcher, Gopher with 5 years of experience in loads of development fields across AI, data science, backend, frontend. Co-founder of https://github.com/nolebase
Friday August 23, 2024 3:15pm - 3:50pm HKT
Level 1 | Hung Hom Room 2
  KubeCon + CloudNativeCon Sessions, AI + ML

3:15pm HKT

The Challenges of Kubernetes Data Protection - Real Examples and Solutions with Velero | Kubernetes数据保护的挑战- Velero的真实案例和解决方案 - Wenkai Yin, Broadcom & Bruce Zou, Shanghai Jibu Tech
Friday August 23, 2024 3:15pm - 3:50pm HKT
The distributed and dynamic nature of Kubernetes makes data protection challenging to guarantee data availability and durability, below are summaries of the issues we encountered in the real customer environments: 1. Application definition and resources capture 2. Application data consistency 3.Application restore on heterogenous and across-cloud environments We provide the detailed description of these issues in the "Additional resources" section due to the character limitation of the "Description".

Kubernetes的分布式和动态特性使得数据保护变得具有挑战性,以确保数据的可用性和持久性。以下是我们在真实客户环境中遇到的问题摘要: 1. 应用程序定义和资源捕获 2. 应用程序数据一致性 3. 跨异构和跨云环境的应用程序恢复 由于“描述”部分的字符限制,我们将在“附加资源”部分提供这些问题的详细描述。
Speakers
avatar for Bruce Zou

Bruce Zou

Jibu Tech, Co-founder and Development Director, Shanghai Jibu Tech
Over 10 years storage development and architecture experience working at IBM storage system lab, submitted 15+ disclosures and publications; supported 10+ big accounts for high end storage system critical issues. Rich experience in building high available storage systems, leading... Read More →
avatar for Wenkai Yin

Wenkai Yin

Staff Software Engineer, Broadcom
Staff software engineer, focus on cloud-native development. Core maintainers of open source project Harbor and Velero
Friday August 23, 2024 3:15pm - 3:50pm HKT
Level 2 | Grand Ballroom 1-2

3:15pm HKT

The Experience of ChillyRoom Developing & Managing Session-Based Game on K8s with OpenKruiseGame | 在K8s上使用OpenKruiseGame开发和管理基于会话的游戏的ChillyRoom经验 - Qiuyang Liu, Alibaba Cloud & Xinhao Liu, ChillyRoom
Friday August 23, 2024 3:15pm - 3:50pm HKT
In the era of traditional game operation and maintenance, session-based games face huge challenges in terms of delivery efficiency and resource costs. Cloud native technology brings exactly the flexibility and highly automated capabilities that session-based games need. However, due to the game servers' strong stateful characteristics, there are also various difficulties in the process of implementing games on Kubernetes. This talk will focus on the characteristics of session-based games and describe how ChillyRoom uses OpenKruiseGame, which is the subproject of CNCF incubating project OpenKruise, to develop and manage session-based games on Kubernetes, providing developers in the game industry with cloud native implementation experience in automatic network access, elastic scaling of game servers, matching logic development, and room status management, etc.

在传统游戏运维时代,基于会话的游戏在交付效率和资源成本方面面临巨大挑战。云原生技术正好为会话型游戏带来了灵活性和高度自动化能力。然而,由于游戏服务器具有强烈的有状态特性,在实现游戏在 Kubernetes 上的过程中也存在各种困难。 本次演讲将重点关注会话型游戏的特点,并描述 ChillyRoom 如何使用 OpenKruise 的子项目 OpenKruiseGame 来开发和管理基于会话的游戏在 Kubernetes 上,为游戏行业的开发人员提供云原生实现经验,包括自动网络访问、游戏服务器的弹性扩展、匹配逻辑开发和房间状态管理等。
Speakers
avatar for Qiuyang Liu

Qiuyang Liu

Senior R&D Engineer, Alibaba Cloud
Qiuyang Liu, head of cloud native game at Alibaba Cloud Container Service and maintainer of the kruise-game project. He has long been engaged in the research and development of cloud native in the gaming field and is committed to promoting the implementation of cloud native in the... Read More →
avatar for Xinhao Liu

Xinhao Liu

Engineer, ChillyRoom
Xinhao Liu, an engineer with one year experience in game server development at ChillyRoom and three years experience in Linux OS and cloud core network software development in industry. He has a passion for creating flexible, high-performance, high-available and easy-to-maintain game... Read More →
Friday August 23, 2024 3:15pm - 3:50pm HKT
Level 1 | Hung Hom Room 7

3:15pm HKT

The Bang! - When Bad Things Happen to Your Data | 爆炸!- 当数据出问题时 - Kelvin Mun, Veeam Software
Friday August 23, 2024 3:15pm - 3:50pm HKT
Imagine the inevitable has already happened—you’ve had a security breach—and you’re now dealing with the aftermath. Organisations must act fast to ensure business returns to operations quickly while also figuring out how to prevent similar incidents in the future. By adopting new use cases, engineering teams are simultaneously accelerating the deployment of sensitive data across multi-cloud architectures and tapping into new risk factors. In this talk, we will use the “Data Security Bang” analogy and learnings from resilience engineering to answer questions such as: How could we do more left of bang (prevention) to help with the speed of right of bang (remediation)? The audience will be guided through a set of example scenarios in a 90s-style game, using Kanister, OPA, and Prometheus, in which they can make decisions on data security to guide the way towards a more robust infrastructure.

想象不可避免的事情已经发生了——您遭遇了安全漏洞——现在您正在处理后果。组织必须迅速采取行动,确保业务迅速恢复运营,同时还要想办法防止将来发生类似事件。通过采用新的用例,工程团队同时加速了跨多云架构部署敏感数据,并利用新的风险因素。 在这次演讲中,我们将使用“数据安全爆炸”的类比和弹性工程的经验教训来回答诸如:我们如何可以在爆炸之前做更多的事情(预防),以帮助加快爆炸之后的速度(补救)?观众将通过90年代风格的游戏中的一系列示例场景,使用Kanister、OPA和Prometheus,来做出关于数据安全的决策,引导通往更健壮基础设施的道路。
Speakers
Friday August 23, 2024 3:15pm - 3:50pm HKT
Level 1 | Hung Hom Room 5
  Open Source Summit Sessions, Supply Chain Security

4:05pm HKT

Boosting LLM Development and Training Efficiency: Automated Parallelization with MindSpore | 提升LLM开发和培训效率:MindSpore自动并行化 - Yufeng Lyu, Huawei Technologies Co., Ltd
Friday August 23, 2024 4:05pm - 4:40pm HKT
With the popularity of LLM, large-scale pre-training has become an indispensable step in AI research and implementation. However, large-scale distributed parallel training requires developers to consider various factors affecting the efficiency of model development and training, such as partitioning and communication, and then modify the model accordingly. In this presentation, we will demonstrate an automatic parallelization approach that allows developers to focus on algorithm research without the need for intrusive model modifications. Distributed training on a large-scale cluster can be achieved simply by configuring strategies. Developers can also utilize MindSpore's hyperparameter search model to automatically find the best parallelization strategy. The parallel strategy obtained through search can achieve 90%-110% of the expert tuning performance, significantly reducing the time required for model modifications while efficiently accelerating LLM training.

随着LLM的流行,大规模预训练已成为人工智能研究和实施中不可或缺的一步。然而,大规模分布式并行训练需要开发人员考虑各种影响模型开发和训练效率的因素,如分区和通信,然后相应地修改模型。 在本次演示中,我们将展示一种自动并行化方法,使开发人员能够专注于算法研究,而无需进行侵入性的模型修改。通过配置策略,可以简单实现在大规模集群上的分布式训练。开发人员还可以利用MindSpore的超参数搜索模型自动找到最佳的并行化策略。通过搜索获得的并行策略可以实现专家调整性能的90%-110%,显著减少了模型修改所需的时间,同时有效加速LLM的训练。
Speakers
avatar for Yufeng Lyu

Yufeng Lyu

Senior Engineer, Huawei Technologies Co., Ltd
Lyu Yufeng, a technical architect at MindSpore and maintainer of the MindNLP framework, focuses his research on natural language processing and distributed parallelism for LLM. He possesses extensive experience in the development and implementation of LLM solutions.
Friday August 23, 2024 4:05pm - 4:40pm HKT
Level 1 | Hung Hom Room 3

4:05pm HKT

CubeFS Boosts Efficiency of AI Production | CubeFS提高了AI生产的效率 - Chi He, OPPO
Friday August 23, 2024 4:05pm - 4:40pm HKT
With the booming development of AI, the scale of data required for AI model training has been increasing. As one of the fundamental infrastructures for AI, distributed file storage faces significant challenges, such as scalability and the need to provide high performance and stable storage while considering cost-effectiveness. This presentation will mainly share the practical experience and reflections of CubeFS in addressing these challenges.

随着人工智能的蓬勃发展,用于AI模型训练的数据规模不断增加。作为AI的基础设施之一,分布式文件存储面临着诸多挑战,如可扩展性和在考虑成本效益的同时提供高性能和稳定存储。本次演讲将主要分享CubeFS在应对这些挑战方面的实践经验和反思。
Speakers
avatar for chi he

chi he

senior engineer, OPPO
CubeFS commiter,responsible for the design and development of the CubeFS storage engine,including feature such as hybrid cloud,caching acceleration.
Friday August 23, 2024 4:05pm - 4:40pm HKT
Level 1 | Hung Hom Room 6

4:05pm HKT

JD Cloud's Large-Scale Serverless Practice : APP Management and Elastic Scaling on Karmada | 京东云的大规模无服务器实践:在Karmada上的应用管理和弹性扩展 - XiaoFei Wang & Chen Yanying, JDCloud
Friday August 23, 2024 4:05pm - 4:40pm HKT
In JDCloud, the federated Serverless service is based on the federated management model and Serverless application model, providing JDOS application container control services for federated application container deployment, elastic scaling, and fault migration capabilities. It manages multiple clusters with over 10,000 nodes. Unify management of multiple sub-clusters to improve overall resource utilization. Reduce the complexity of multi-cluster management, scheduling, and distribution on the platform. End users can use our platform just like the native Kubernetes API. Throughout the process, we will address numerous technical challenges, including: 1. Multi-cluster management and distribution practice 2. Efficient cross-cluster elastic scaling solution 3. Problems encountered in production and sharing

在京东云中,联邦Serverless服务基于联邦管理模型和Serverless应用模型,为联邦应用容器部署、弹性扩展和故障迁移提供JDOS应用容器控制服务。它管理超过10,000个节点的多个集群。统一管理多个子集群,提高整体资源利用率。减少平台上多集群管理、调度和分发的复杂性。最终用户可以像使用本机Kubernetes API一样使用我们的平台。在整个过程中,我们将解决许多技术挑战,包括: 1. 多集群管理和分发实践 2. 高效的跨集群弹性扩展解决方案 3. 在生产和分享中遇到的问题
Speakers
avatar for Chen Yanying

Chen Yanying

Cloud Native Engineer, JDCloud
Engaged in the construction and internal promotion of basic platforms such as Federated Clusters, Serverless, Service Mesh and some middleware, based on JD's large-scale Kubernetes clusters
avatar for XiaoFei Wang

XiaoFei Wang

CloudNativeEngineer, JDCloud
As a software engineer, he is responsible for cluster deployment, multi-cluster management, and federated clusters. Participate in JD.com’s 618 and 11.11. Have rich practical experience in cloud native.
Friday August 23, 2024 4:05pm - 4:40pm HKT
Level 1 | Hung Hom Room 1
  KubeCon + CloudNativeCon Sessions, Platform Engineering

4:05pm HKT

TiDB: Your Next MySQL Is Not a MySQL | TiDB:你的下一个 MySQL 何必是 MySQL - Qizhi Wang, PingCAP
Friday August 23, 2024 4:05pm - 4:40pm HKT
You might have heard of TiDB, a distributed open-source database known for its virtually limitless horizontal scalability, capable of handling both online transactional processing and analytical workloads while being compatible with the MySQL protocol. Traditionally, different databases have been employed to handle various workloads in our application architecture designs. Commonly, relational databases are used for online transaction processing, with data asynchronously distributed to analytical databases, document stores, and cache databases. With the rise of AI, an additional type of database needs consideration — the vector database. But introducing this type of database can add unnecessary complexity to your technology stack. This talk we will discuss how TiDB integrates multiple functionalities such as real-time transaction processing, online analytics, sharding-free architecture, and vector type computations, all aimed at reducing the cognitive load for developers.

您可能已经听说过 TiDB,这是一个分布式开源数据库,以其几乎无限的水平扩展性而闻名,能够处理在线事务处理和分析工作负载,同时兼容 MySQL 协议。 传统上,在我们的应用架构设计中,通常会使用不同的数据库来处理各种工作负载。通常情况下,关系数据库用于在线事务处理,数据会异步分布到分析数据库、文档存储和缓存数据库。随着人工智能的兴起,还需要考虑一种额外的数据库类型 —— 向量数据库。但引入这种类型的数据库可能会给您的技术堆栈增加不必要的复杂性。 在本次演讲中,我们将讨论 TiDB 如何集成多种功能,如实时事务处理、在线分析、无分片架构和向量类型计算,所有这些都旨在减少开发人员的认知负荷。
Speakers
avatar for Qizhi Wang

Qizhi Wang

TiDB Ecosystem Software Architect and Senior Developer Advocate at PingCAP, PingCAP
Qizhi is a TiDB Ecosystem Software Architect & Senior Developer Advocate at PingCAP, the company behind TiDB. In this role, He focuses on EcoSystem development and has been instrumental in integrating TiDB with various platforms such as AWS, GORM, MySQL Connector/J, Hibernate, DBeaver... Read More →
Friday August 23, 2024 4:05pm - 4:40pm HKT
Level 2 | Grand Ballroom 1-2

4:05pm HKT

Unlocking LLM Performance with EBPF: Optimizing Training and Inference Pipelines | 通过eBPF解锁LLM性能:优化训练和推理管道 - Yang Xiang, Yunshan Networks, Inc.
Friday August 23, 2024 4:05pm - 4:40pm HKT
The training and inference processes of Large Language Models (LLMs) involve handling vast amounts of model data and training data, and consume significant GPU compute resources. However, enhancing GPU utilization becomes extremely challenging in the absence of observability. This presentation will introduce how to achieve observability in LLM training and inference processes with zero disruption using eBPF. This includes utilizing Memory Profiling to understand the loading performance of models and training data, Network Profiling to comprehend the data exchange performance, and GPU Profiling to analyze GPU's MFU (Model FLOPs Utilization) and performance bottlenecks. Additionally, we will share the practical effects of implementing observability in a PyTorch LLM application and the llm.c project using eBPF, aiming to enhance training and inference performance.

大型语言模型(LLMs)的训练和推断过程涉及处理大量的模型数据和训练数据,并消耗大量的GPU计算资源。然而,在缺乏可观察性的情况下,提高GPU利用率变得极具挑战性。 本次演讲将介绍如何利用eBPF在LLM训练和推理过程中实现零中断的可观察性。这包括利用内存分析来了解模型和训练数据的加载性能,网络分析来理解数据交换性能,以及GPU分析来分析GPU的MFU(模型FLOPs利用率)和性能瓶颈。 此外,我们将分享在PyTorch LLM应用程序和llm.c项目中使用eBPF实现可观察性的实际效果,旨在提高训练和推理性能。
Speakers
avatar for Yang Xiang

Yang Xiang

VP of Engineering, Yunshan Networks, Inc.
Received a Ph.D. from Tsinghua University, and currently serving as VP of Engineering at Yunshan Networks and the head of the DeepFlow open-source community. He has presented academic papers on topics such as application observability and network measurement at top international academic... Read More →
Friday August 23, 2024 4:05pm - 4:40pm HKT
Level 1 | Hung Hom Room 2
  KubeCon + CloudNativeCon Sessions, Observability
 

Share Modal

Share this link via

Or copy link

Filter sessions
Apply filters to sessions.