Loading…
Attending this event?
In-person
21-23 August, 2024
Learn More and Register to Attend

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon + Open Source Summit + AI_Dev China 2024 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in Hong Kong Standard Time (UTC +8). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change and session seating is available on a first-come, first-served basis. 

亲临现场
2024年8月21-23日
了解更多并注册参加

Sched应用程序允许您创建自己的日程安排,但不能替代您的活动注册。您必须注册参加KubeCon + CloudNativeCon + Open Source Summit + AI_Dev China 2024,才能参加会议。如果您尚未注册但希望加入我们,请访问活动注册页面购买注册。

请注意:本日程自动显示为香港标准时间(UTC +8)。要查看您偏好的时区的日程,请从右侧“按日期筛选”上方的下拉菜单中选择。日程可能会有变动,会议席位先到先得。
英语 (English) clear filter
Wednesday, August 21
 

11:00 HKT

Addressing Challenges of Cross-Architecture Dynamic Migration Over Heterogeneous Acceleration System | 解决异构加速系统上跨架构动态迁移的挑战 - Yanjun Chen, China Mobile
Wednesday August 21, 2024 11:00 - 11:35 HKT
With the surge of application computing demand, the industry began to run AI applications on diverse acceleration hardwares (GPU, FPGA, NPU...) to gain more processing capability. One key problem to use diverse accelerators is tool chain & vendor lock-in in application Dev-to-Run processes. Cross-system (multi-arch chips + multi-vendor tool chain) application development and migration is hard to achieve. In this presentation China Mobile will introduce the practices to solve above challenges allowing AI applications smoothly migrate among different accelerators. It includes a unified abstraction for diverse accelerators, a middle-compiler using existing compilers (CUDA, ROCm, oneAPI...) to achieve cross-architecture compile in the same execution, and a runtime supporting dynamic and replaceable link. We want to enable applications migrate freely between diverse accelerators without changing development habits, and show the architecture design, open source plans and a demo.

随着应用计算需求的激增,行业开始在各种加速硬件(GPU、FPGA、NPU等)上运行AI应用程序,以获得更多的处理能力。在使用各种加速器时,一个关键问题是在应用程序开发到运行过程中的工具链和供应商锁定。跨系统(多架构芯片+多供应商工具链)应用程序开发和迁移很难实现。在这个演示中,中国移动将介绍解决上述挑战的实践,使AI应用程序能够在不同的加速器之间平稳迁移。这包括对各种加速器的统一抽象,使用现有编译器(CUDA、ROCm、oneAPI等)的中间编译器实现跨架构编译在同一执行中,以及支持动态和可替换链接的运行时。我们希望能够使应用程序在不改变开发习惯的情况下自由迁移至各种加速器,并展示架构设计、开源计划和演示。
Speakers
avatar for Yanjun Chen

Yanjun Chen

Open Source Expert, China Mobile
Yanun Chen is the open source expert and CNCF delegate in China Mobile. She joined actively in many open source projects and now she is the TSC member of LF Edge Akraino.
Wednesday August 21, 2024 11:00 - 11:35 HKT
Level 1 | Hung Hom Room 3

11:00 HKT

Securing the Supply Chain: A Practical Guide to SLSA Compliance from Build to Runtime | 保障供应链安全:从构建到运行的SLSA合规实用指南 - Enguerrand Allamel, Ledger
Wednesday August 21, 2024 11:00 - 11:35 HKT
Navigating the complexities of supply chain security might seem intimidating, especially with evolving frameworks like SLSA (Supply-chain Levels for Software Artifacts). This talk introduces beginners to the foundational practices required to secure software from build to runtime using CNCF tools. We'll explore how GitHub Actions can automate build processes, integrate with Cosign for keyless artifact signing, and use Kyverno for runtime policy enforcement. Additionally, we'll discuss how tools like in-toto and Kubescape help manage and verify artifact integrity, providing a holistic view of SLSA compliance in the Kubernetes ecosystem. To enhance security further, we will also briefly discuss the potential integration of Hardware Security Modules (HSMs) into the supply chain. HSMs can offer an added layer of security for key management operations critical to signing processes, ensuring that cryptographic keys are managed securely and are resilient against attack.

在KubeCon的一个会话描述: 供应链安全的复杂性可能看起来令人望而却步,尤其是随着像SLSA(软件构件供应链级别)这样不断发展的框架。 本次演讲将向初学者介绍使用CNCF工具来确保软件从构建到运行时的基本实践。 我们将探讨GitHub Actions如何自动化构建流程,与Cosign集成进行无密钥构件签名,以及使用Kyverno进行运行时策略执行。此外,我们还将讨论像in-toto和Kubescape这样的工具如何帮助管理和验证构件完整性,为Kubernetes生态系统中的SLSA合规性提供全面视角。 为了进一步增强安全性,我们还将简要讨论将硬件安全模块(HSMs)集成到供应链中的潜在可能性。HSMs可以为关键管理操作提供额外的安全层,这对签名过程至关重要,确保加密密钥得到安全管理,并且具有抵御攻击的弹性。
Speakers
avatar for Enguerrand Allamel

Enguerrand Allamel

Senior Cloud Security Engineer, Ledger
Enguerrand is a Senior Cloud Security Engineer with experience in Site Reliability Engineering at Ledger since 2022. His work focuses on the security of scalable and reliable cloud systems, leveraging his knowledge of hybrid computing technologies and container orchestration with... Read More →
Wednesday August 21, 2024 11:00 - 11:35 HKT
Level 1 | Hung Hom Room 1
  KubeCon + CloudNativeCon Sessions, Security

11:50 HKT

Ethics in the Cloud: Safeguarding Responsible AI Development in Asia | 云计算中的伦理:在亚洲保障负责任的人工智能发展 - Quiana Berry, Red Hat
Wednesday August 21, 2024 11:50 - 12:25 HKT
Ethics serve as the compass guiding responsible innovation and societal progress. This presentation blends ethics, cloud computing, and AI advancement, spotlighting the imperative of upholding responsible AI practices, particularly within the Asian market. From safeguarding data privacy and fortifying cybersecurity to navigating regulatory compliance and governance, this comprehensive discourse delves into multifaceted dimensions essential for ethical AI development. As Asia, including China, propels the frontier of AI innovation, the imperative of embedding ethics and responsible practices becomes increasingly pronounced. This session is tailored to provide actionable strategies and regulatory insights for Asian leaders. Together, we'll empower attendees to become champions of responsible AI practices, fostering a culture of integrity and innovation in the vibrant and diverse tech landscape of Asia.

伦理道德是引导负责任创新和社会进步的指南。本次演讲融合了伦理、云计算和人工智能的进步,重点关注在亚洲市场内坚持负责任人工智能实践的必要性。从保护数据隐私和加强网络安全到遵守监管合规和治理,这场全面的讨论深入探讨了对伦理人工智能发展至关重要的多方面维度。 随着亚洲,包括中国,推动人工智能创新的前沿,嵌入伦理和负责任实践的必要性变得日益突出。本场演讲旨在为亚洲领导者提供可行的策略和监管洞察。 让我们共同助力与会者成为负责任人工智能实践的倡导者,在亚洲充满活力和多样化的科技领域中培育诚信和创新的文化。
Speakers
avatar for Quiana Berry

Quiana Berry

Product Lead, Red Hat
Quiana is a dynamic cloud Product Lead at Red Hat/IBM, dedicated to crafting innovative developer tools and reshaping the future of technology. With an academic foundation encompassing Anthropology, Biology, and Chemistry and a specialty in the fusion of (DEI) and Ethical AI, Quiana... Read More →
Wednesday August 21, 2024 11:50 - 12:25 HKT
Level 1 | Hung Hom Room 3

11:50 HKT

Safeguarding Cloud Native Supply Chain | 保护云原生供应链-公证项目介绍,新功能和即将推出的内容 - Yi Zha, Microsoft & Mostafa Radwan, CloudRoads
Wednesday August 21, 2024 11:50 - 12:25 HKT
Ensuring a secure supply chain for container images is vital in the cloud-native ecosystem. But how can you be certain that container images originate from trusted sources? And how can you verify that they haven’t been altered since their creation? Join this session to delve into how the Notary Project bolsters cloud native supply chains by leveraging authentic container images and other OCI artifacts. Our maintainers will present an overview of the project for newcomers. Discover the latest features that enhance software supply chains, gain insights into the roadmap, and explore upcoming developments, including attestations. Observe a user demonstrating how the Notary Project guarantees the integrity and authenticity of container images and arbitrary files. Our maintainers will be available to answer any questions you may have. Whether you’re new or experienced in container security, or someone interested in contributing to the project, this session is not to be missed!

确保容器镜像的安全供应链对于云原生生态系统至关重要。但是,您如何确保容器镜像来自可信的来源?您如何验证它们自创建以来没有被篡改?加入本场演讲,深入了解Notary项目如何通过利用真实的容器镜像和其他OCI工件来增强云原生供应链。我们的维护人员将为新手提供项目概述。发现增强软件供应链的最新功能,了解路线图,探索即将推出的开发,包括证明。观察用户演示Notary项目如何保证容器镜像和任意文件的完整性和真实性。我们的维护人员将随时回答您可能有的任何问题。无论您是容器安全的新手还是经验丰富的人,或者是有兴趣为项目做出贡献的人,都不要错过本场演讲!
Speakers
avatar for Mostafa Radwan

Mostafa Radwan

Principal Consultant, CloudRoads
Mostafa is a technologist and consultant specialized in cloud native computing. He started his career as a software engineer before getting in the trenches of application and production support. He enjoys helping enterprise companies successfully adopt DevOps and cloud native technologies... Read More →
avatar for Yi Zha

Yi Zha

Senior Product Manager, Microsoft
Yi is a senior product manager in Azure Container Upstream team at Microsoft and is responsible for container supply chain security for Azure services and customers. He is also a maintainer of CNCF project Notary, and a contributor of CNCF ORAS and OSS project Ratify.
Wednesday August 21, 2024 11:50 - 12:25 HKT
Level 1 | Hung Hom Room 6

11:50 HKT

Implementing Fine-Grained and Pluggable Container Resource Management Leveraging NRI | 基于 NRI 实现精细化且可插拔的容器资源管理 - Qiang Ren, Intel & He Cao, ByteDance
Wednesday August 21, 2024 11:50 - 12:25 HKT
To overcome Kubernetes' limitations in resource management, ByteDance developed Katalyst, a resource management system. Katalyst employs a range of methodologies, including colocation, node over-commitment, specification recommendation, and tidal colocation, aimed at optimizing cluster resource utilization.

Initially, Katalyst introduced a QoS Resource Manager (QRM) framework within kubelet, facilitating versatile container resource allocation through a plugin architecture. Presently, the Node Resource Interface (NRI) presents a refined alternative.

This session elucidates how Katalyst leverages NRI for fine-grained and adaptable container resource management, ensuring efficiency without intrusive modifications of upstream components. This novel architecture allows Katalyst to seamlessly integrate with native Kubernetes, offering a user-friendly and easily maintainable solution.

为了克服 Kubernetes 在资源管理方面的局限性,字节跳动构建了一个资源管理系统 Katalyst,通过在离线业务常态混部、资源超分、规格推荐、潮汐混部等方式,提升集群的资源利用率。最初,Katalyst 在 kubelet 中引入了一个 QoS Resource Manager(QRM)框架,通过插件化的方式来扩展容器的资源分配策略;当前,Node Resource Interface(NRI)提供了一个原生的替代方案。

本次演讲将介绍 Katalyst 如何通过 NRI 实现精细化且可插拔的容器资源管理,在不对上游组件进行侵入性修改的情况下,提升资源利用率并保证业务的 SLO 不受影响。这种全新的架构使 Katalyst 能够与原生 Kubernetes 无缝集成,提供了一种易于使用和维护的解决方案。
Speakers
avatar for Qiang Ren

Qiang Ren

Software Engineer, Intel
Ren Qiang works as a Cloud Orchestration Software Engineer in SATG, Intel. He mainly focuses on Cloud Native technologies in the runtime. At the same time, he actively participates in open-source projects and is committed to promoting the development of runtime and resource isola... Read More →
avatar for He Cao

He Cao

Senior Software Engineer, ByteDance
He Cao is a senior software engineer on the Cloud Native team at ByteDance, a maintainer of Katalyst and KubeZoo, and a member of Istio. He has 5+ years of experience in the cloud native area. Since joining ByteDance, he has designed and implemented several critical systems for VKE... Read More →
Wednesday August 21, 2024 11:50 - 12:25 HKT
Level 1 | Hung Hom Room 2

11:50 HKT

Community Charter and Cookbook: The Recipe of Building Communities in the Open | 社区章程与手册:在开放中建立社区的秘诀 - Prithvi Raj, Harness
Wednesday August 21, 2024 11:50 - 12:25 HKT
An open source project holding immense value and with massive potential fails to build an exciting community. A community that is interactive and scaling at the start becomes stagnant after a point. A project has good github traction but doesn't have a good community traction. These are some of the many problems that arise while building an open source project community and are still very much prevalent. This talk summarises the right ingredients essential in nurturing a project community and ensuring its growth over the years to come through Prithvi's experience. He will be highlighting steps and best practices to be ensured in terms of the right metrics, content curation, social portrayal and building the right culture amongst stakeholders and the broader audience. He will also share tips and tricks on creating the right special interest groups, ensuring constant contributions and incentivising the community.

一个拥有巨大价值和巨大潜力的开源项目,却未能建立一个令人兴奋的社区。一个在开始时互动并扩展的社区在某个时刻变得停滞不前。一个项目在GitHub上有良好的关注度,但却缺乏良好的社区关注度。 这些是在建立开源项目社区时出现的许多问题,而且仍然非常普遍。这次演讲总结了在培育项目社区并确保其未来增长方面至关重要的正确要素,通过Prithvi的经验。 他将重点介绍在正确的指标、内容策划、社交表现和在利益相关者和更广泛的观众中建立正确文化方面应确保的步骤和最佳实践。 他还将分享有关创建正确的特殊兴趣小组、确保持续贡献和激励社区的技巧和窍门。
Speakers
avatar for Prithvi Raj

Prithvi Raj

Technical Community Manager, Harness
Prithvi Raj is a Technical Community Manager at Harness and a CNCF Ambassador. He is currently leading the community for the LitmusChaos CNCF incubating project. He has 4 years of experience in the industry and has helped scale the broader Chaos Engineering community. He has worked... Read More →
Wednesday August 21, 2024 11:50 - 12:25 HKT
Level 1 | Hung Hom Room 5

13:50 HKT

Is Your GPU Really Working Efficiently in the Data Center? N Ways to Improve GPU Usage | 您的GPU在数据中心真的高效工作吗?提高GPU使用率的N种方法 - Xiao Zhang, DaoCloud
Wednesday August 21, 2024 13:50 - 14:25 HKT
AI has penetrated into various industries, and companies have purchased many expensive AI GPU devices and used them for training and inference. So what is the reality of the use of these devices? Is the usage rate really high? Is the GPU card being monopolized by a large number of applications that are not heavily used? Do these AI devices work efficiently 24/7? This session will combine our mass production practices to summarize N ways to improve the utilization rate of AI equipment, such as * How to avoid monopoly and improve GPU usage through GPU sharing technology * How to improve GPU device usage through co-located in scenes with obvious tides * How to better perform GPU mark group matching training and inference applications to improve GPU usage This session will combine the practical experience of the two open source projects HAMi and Volcano in production, hoping to give everyone a clearer understanding of how to improve GPU usage.

人工智能已经渗透到各个行业,公司购买了许多昂贵的人工智能GPU设备,并将它们用于训练和推理。那么这些设备的使用情况如何呢?使用率真的很高吗?GPU卡是否被大量不常用的应用程序垄断?这些人工智能设备是否能够24/7高效工作? 本场演讲将结合我们的大规模生产实践,总结提高人工智能设备利用率的N种方法,例如: * 如何通过GPU共享技术避免垄断并提高GPU使用率 * 如何通过与明显潮汐场景共同使用GPU设备来提高GPU使用率 * 如何更好地执行GPU标记组匹配训练和推理应用程序,以提高GPU使用率 本场演讲将结合两个开源项目HAMi和Volcano在生产中的实际经验,希望能让大家更清楚地了解如何提高GPU使用率。
Speakers
avatar for xiaozhang

xiaozhang

Senior Technical Lead, DaoCloud
- Xiao Zhang is leader of the Container team(focus on infra,AI,Muti-Cluster,Cluster - LCM,OCI) - Kubernetes / Kubernetes-sigs active Contributor、member - Karmada maintainer,kubean maintainer,HAMi maintainer - Cloud-Native Developer - CNCF Open Source Enthusiast. - GithubID: waw... Read More →
Wednesday August 21, 2024 13:50 - 14:25 HKT
Level 1 | Hung Hom Room 3

13:50 HKT

Power TiKV with in-Memory Engine | 使用内存引擎强化 TiKV - Chenjie Tang, PingCAP
Wednesday August 21, 2024 13:50 - 14:25 HKT
As a distributed kv database, TiKV supports no-transactional operations and transactional operations. In the transactional mode, there would be many kv versions if there are many UPDATE operations. Scanning such a range containing a large number of kv versions, the read latency is unpredictable. In order to achieve stable low latency for TiKV, we introduced a "In-Memory Engine" to reduce the read latency in such case. Meanwhile the "In-Memory Engine" also can improve the overall performance when there are some hot ranges with heavy read workload.

作为一个分布式kv数据库,TiKV支持非事务操作和事务操作。在事务模式下,如果有很多UPDATE操作,就会有很多kv版本。扫描包含大量kv版本的范围时,读取延迟是不可预测的。为了实现TiKV的稳定低延迟,我们引入了一个“内存引擎”来减少这种情况下的读取延迟。同时,“内存引擎”还可以在存在一些热门范围和大量读取工作负载时提高整体性能。
Speakers
avatar for Chenjie Tang

Chenjie Tang

Engineer, PingCAP
TiKV committer, rust programmer, focus on large scale distributed system
Wednesday August 21, 2024 13:50 - 14:25 HKT
Level 1 | Hung Hom Room 6

13:50 HKT

Enhancing Cyber Resilience Through Zero Trust Chaos Experiments in Cloud Native Environments | 通过在云原生环境中进行零信任混沌实验来增强网络安全弹性 - Sayan Mondal, Harness & Rafik Harabi, Sysdig
Wednesday August 21, 2024 13:50 - 14:25 HKT
Cyber-attacks against cloud-native infrastructure are increasing in frequency and sophistication. The complexity of modern cloud-native systems and the speed at which technology is developing have outpaced cloud security solutions. On the flip side, cyber-criminals are taking advantage of these developments to launch successful cloud attacks. This session delves into the paradigm of Zero Trust Chaos Experiments, exploring how intentional disruptions and simulated cyber threats can uncover vulnerabilities and enhance cyber resilience. Through practical insights, we will illustrate the transformative impact of Zero Trust Chaos Experiments on organizations' ability to detect and mitigate cyber incidents. By the end of the session, participants will be equipped with actionable strategies and a better understanding of how Zero Trust Chaos Experiments can elevate cyber resilience in cloud-native environments

针对云原生基础设施的网络攻击频率和复杂性正在增加。现代云原生系统的复杂性和技术发展速度已经超过了云安全解决方案。与此同时,网络犯罪分子正在利用这些发展来发动成功的云攻击。 本场演讲将深入探讨零信任混沌实验的范式,探讨有意的干扰和模拟网络威胁如何揭示漏洞并增强网络安全弹性。通过实用的见解,我们将阐明零信任混沌实验对组织检测和缓解网络事件能力的转变影响。在会议结束时,参与者将掌握可操作的策略,并更好地了解零信任混沌实验如何提升云原生环境中的网络安全弹性。
Speakers
avatar for Rafik Harabi

Rafik Harabi

Senior Solutions Architect, Sysdig
Rafik has more than 15 years of tech and internet industry experience. Currently, he is a Senior Solution Architect devoted to helping customers secure their cloud native platforms and applications. Before joining Sysdig, he was responsible for executing go-to cloud programmes in... Read More →
avatar for Sayan Mondal

Sayan Mondal

Senior Software Engineer 2, Harness
Sayan Mondal is a Senior Software Engineer II at Harness, building their Chaos Engineering platform and helping them shape the customer experience market. He's the maintainer of a few open-source libraries and is also a maintainer of LitmusChaos (the Incubating CNCF project). Sayan's... Read More →
Wednesday August 21, 2024 13:50 - 14:25 HKT
Level 2 | Grand Ballroom 1-2
  KubeCon + CloudNativeCon Sessions, Security

13:50 HKT

Kubespray Unleashed: Navigating Bare Metal Services in Kubernetes for LLM and RAG | Kubespray大放异彩:在Kubernetes中为LLM和RAG部署裸金属服务 - Kay Yan, DaoCloud & Alan Leung, Equinix
Wednesday August 21, 2024 13:50 - 14:25 HKT
Kubespray, popular within the SIG-Cluster-Lifecycle of Kubernetes, is celebrated for deploying production-ready Kubernetes clusters, particularly on bare metal, which boosts performance for AI workloads like LLM and RAG. This session will explore using Kubespray in bare metal settings, addressing challenges, and sharing best practices. The first part of the talk will show Kubespray's key features and provide practical tips. The latter half will focus on swiftly deploying AI using Retrieval-Augmented Generation (RAG), demonstrating how Kubespray facilitates setting up Kubernetes clusters on bare metal. This setup enhances AI applications by integrating continuous knowledge updates and domain-specific information via RAG, improving the accuracy and credibility of the AI systems. The session will conclude with discussions on community engagement and future advancements, followed by a Q&A period to address participant queries.

KubeCon会议描述: Kubespray在Kubernetes的SIG-Cluster-Lifecycle中备受推崇,以在裸金属上部署可用于生产的Kubernetes集群而闻名,特别是对于像LLM和RAG这样的AI工作负载,可以提高性能。本场演讲将探讨在裸金属环境中使用Kubespray,解决挑战,并分享最佳实践。 演讲的第一部分将展示Kubespray的关键特性并提供实用技巧。后半部分将重点介绍如何使用检索增强生成(RAG)快速部署AI,演示Kubespray如何在裸金属上设置Kubernetes集群。通过RAG集成持续的知识更新和领域特定信息,这种设置可以提升AI应用程序的性能,提高AI系统的准确性和可信度。 本场演讲将以社区参与和未来发展的讨论结束,随后进行问答环节以解答参与者的疑问。
Speakers
avatar for Kay Yan

Kay Yan

Principal Software Engineer, DaoCloud
Kay Yan is kubespray maintainer, containerd/nerdctl maintainer. He is the Principal Software Engineer in DaoCloud, and develop the DaoCloud Enterprise Kubernetes Platform since 2016.
avatar for Alan Leung

Alan Leung

Digital Technical Specialist, Equinix
Alan is the Digital Technical Specialist at Equinix with focus on enabling customers, prospects and partners to develop innovative solutions to solve business challenges at the digital edge.
Wednesday August 21, 2024 13:50 - 14:25 HKT
Level 1 | Hung Hom Room 2

13:50 HKT

Zen and the Art of OSPO Maintenance - Group Reflection of OSPO Summits | 禅与OSPO维护的艺术-OSPO峰会的团体反思 - Nadia Jiang, SegmentFault; Richard Sikang Bian, Ant Group; Li Jiansheng, Open Source Way; Zhiqiang Yu, Linux Foundation APAC; Jie Liu, Huawei Technologie
Wednesday August 21, 2024 13:50 - 14:25 HKT
Building OSPOs can be easy, but evolving and maintaining them to continue delivering sustainable and widely recognized value is challenging. Our goal is not only to assist companies in establishing their first OSPOs but also to ensure them continually generate value through community-led approaches. With this mission, the LFAPAC OSPO SIG, collaborating with OSPO Group and SegmentFault, successfully held OSPO Summits in 2023 and 2024. These summits convened OSPO practitioners, corporate project leads, and community leaders, facilitating collaboration, and earned high praise. Nonetheless, we also encountered numerous inquiries and discussions about the difficulties of sustainably developing OSPOs. In this panel discussion, we gathered the co-chairs of the OSPO Summits. They will explore these challenges, share their insights and strategies to make overall "OSPO maintenance" easier with the support from OSS Zen and methodologies.

构建OSPO可能很容易,但是让它们不断发展和维持以持续提供可持续和广泛认可的价值是具有挑战性的。我们的目标不仅是帮助公司建立他们的第一个OSPO,还要确保他们通过社区主导的方法持续产生价值。 LFAPAC OSPO SIG与OSPO Group和SegmentFault合作,成功举办了2023年和2024年的OSPO峰会。这些峰会汇集了OSPO从业者、企业项目负责人和社区领导者,促进了合作,并获得了高度赞誉。然而,我们也遇到了许多关于可持续发展OSPO的困难的询问和讨论。 在这个小组讨论中,我们邀请了OSPO峰会的联合主席。他们将探讨这些挑战,分享他们的见解和策略,以便在OSS Zen和方法论的支持下使整体的“OSPO维护”更加容易。
Speakers
avatar for Jie Liu

Jie Liu

Open Source Evangelist, Huawei Technologies Co. Ltd.
Co-Chair of the 2nd OSPO Summit. As an open-source evangelist and OSPOer at Huawei, Jie Liu is dedicated to promoting open source development, fostering collaboration within the open-source communities, and advocating for open source culture. She has been working in the ICT industry... Read More →
avatar for Zhiqiang Yu

Zhiqiang Yu

Open Source Evangelist, Linux Foundation APAC
Zhiqiang Yu is the Chief Open Source Liaison Officer at China Mobile Research. He has been a member of the LF APAC Open Source Evangelist team since 2022 and currently serves as the co-chair of the LF APAC OSPO SIG. Alongside Nadia Jiang and Jiangsheng Li, he launched the first OSPO... Read More →
avatar for Li Jiansheng

Li Jiansheng

creator, 「Open Source Way 」
Open Source advocate.
avatar for Nadia Jiang

Nadia Jiang

COO, SegmentFault
Nadia Jiang currently serves as the COO of SegmentFault and is a co-founder of Apache Answer. She is an active contributor to several open source organizations, including KAIYUANSHE (China Open Source Alliance), Chance Foundation, China Computer Federation (CCF), and China Institute... Read More →
avatar for Richard Sikang Bian

Richard Sikang Bian

Head of Open Source Growth and Strategy, Ant Group
As an engineer by training and father to a toddler, Richard was ex-Square, ex-Microsoft who currently works on the Technical Strategy Initiatives team of Ant Group. Richard is also in charge of Ant Group's Open Source Program Office (OSPO) and enjoys being the evangelist of Open Source... Read More →
Wednesday August 21, 2024 13:50 - 14:25 HKT
Level 1 | Hung Hom Room 5

13:55 HKT

⚡ Lightning Talk: Discussion on CNAI Widely Used in Education | ⚡ 闪电演讲: 教育中广泛使用的CNAI讨论 - Chen Lin, VMware by Broadcom
Wednesday August 21, 2024 13:55 - 14:00 HKT
This lightening talk will discuss Cloud Native Artificial Intelligence (CNAI) used in education from three aspects. Firstly, introduce the current situations on CNAI applied on children education. Secondly, demo a kids-friendly prototype of AI training process on cloud native infrastructure. Thirdly, talk about more possibilities of CNAI used in pre-school and in-school education(children enlightenment, students assignment corrections, AI teaching... ), also brings up the foreseen malicious abuse of CNAI problems.

这个闪电演讲将从三个方面讨论在教育领域中使用的云原生人工智能(CNAI)。首先,介绍目前在儿童教育中应用CNAI的现状。其次,演示一个儿童友好的AI培训过程原型在云原生基础设施上的应用。第三,讨论CNAI在学前和学校教育中的更多可能性(儿童启蒙,学生作业批改,AI教学...),同时提出CNAI可能存在的恶意滥用问题。
Speakers
avatar for Chen Lin

Chen Lin

Software Engineer, VMware by Broadcom
Chen Lin joined VMware in 2019 and has 5 years cloud native experience. Chen worked on PKS, Tanzu and TKGs product targeting at networking and production CI/CD. Chen is also member of Kubernetes community, and the maintainer of cloud-provider-vsphere.
Wednesday August 21, 2024 13:55 - 14:00 HKT
Level 1 | Hung Hom Room 1

14:00 HKT

⚡ Lightning Talk: WASM on Embedded Systems (RTOS) | ⚡ 闪电演讲: 嵌入式系统(RTOS)上的WASM - Han Wu, University of Exeter
Wednesday August 21, 2024 14:00 - 14:05 HKT
Web Assembly (WASM) has seen significant success in web applications and is now making inroads into other areas like cloud services and even embedded systems that run Real-time Operating Systems (RTOS), such as Zephyr, RT-Thread, Nuttx, and ESP-IDF. This lighting talk will present different approaches to using WASM on embedded systems. - wasmtime (Arm Linux) - wasm-micro-runtime (RTOS) - wasm3 (Baremetal) The above WASM runtimes offer full support for the WASM core specifications. Additionally, their limited support for the WebAssembly System Interface (WASI) enables access to components such as threads, file systems, and network sockets. Although the WASI specifications that provide access to hardware peripherals such as wasi-i2c, wasi-spi, and wasi-digital-io are still in the early stages of development, the potential advantages in portability, security, and deployment simplicity make WASM a promising choice for embedded systems.

Web Assembly(WASM)在Web应用程序中取得了显著的成功,现在正在进入其他领域,如云服务甚至运行实时操作系统(RTOS)的嵌入式系统,例如Zephyr、RT-Thread、Nuttx和ESP-IDF。 这个Lightning Talk将介绍在嵌入式系统中使用WASM的不同方法。 - wasmtime(Arm Linux) - wasm-micro-runtime(RTOS) - wasm3(裸机) 上述WASM运行时完全支持WASM核心规范。此外,它们对WebAssembly系统接口(WASI)的有限支持使得可以访问诸如线程、文件系统和网络套接字等组件。 尽管提供访问硬件外设的WASI规范,如wasi-i2c、wasi-spi和wasi-digital-io仍处于早期开发阶段,但WASM在可移植性、安全性和部署简易性方面的潜在优势使其成为嵌入式系统的一个有前途的选择。
Speakers
avatar for Han

Han

Ph.D. Student, University of Exeter
Ph.D. Student at the University of Exeter in the U.K. for Deep Learning Security in Autonomous Systems. Prior research experience at RT-Thread, LAIX, Xilinx.
Wednesday August 21, 2024 14:00 - 14:05 HKT
Level 1 | Hung Hom Room 1
  ⚡ Lightning Talks | ⚡ 闪电演讲, Cloud Native Novice

14:10 HKT

⚡ Lightning Talk: K8SUG: Unleashing the Power of Community | ⚡ 闪电演讲: K8SUG:释放社区的力量 - Yongkang He, K8SUG.com
Wednesday August 21, 2024 14:10 - 14:15 HKT
Unveiling the Powerhouse of Knowledge: K8SUG - the Most Active Kubernetes User Group! Step into the world of K8SUG, where passion meets innovation, and connections spark like wildfire. As the brainchild of its founder, the K8SUG Singapore meetup blossomed into a global phenomenon, stretching its reach from Australia to Canada and the UK, with the USA next on the horizon. In just 1.5 electrifying years, our community has swelled to over 14,000 members worldwide, all fueled by the dedication of our volunteers. Join us and be part of the dynamic exchange shaping the future of Kubernetes!

揭开知识强大的力量:K8SUG - 最活跃的Kubernetes用户组! 走进K8SUG的世界,激情与创新相遇,连接如野火般迸发。作为其创始人的心血结晶,K8SUG新加坡聚会已经发展成为一个全球现象,其影响力从澳大利亚延伸至加拿大和英国,美国也在未来的计划之中。 在短短1.5年的时间里,我们的社区已经发展到全球超过14,000名成员,所有这一切都得益于我们志愿者的奉献。加入我们,成为塑造Kubernetes未来的动态交流的一部分!
Speakers
avatar for Yongkang He

Yongkang He

Founder / Principal Containers Specialist, K8SUG.com
Yongkang He is a {'Kubestronaut', 'CNCF Ambassador', 'AWS Builder', 'Microsoft MVP', 'Google Champion', 'Alibaba MVP'} based in Singapore. He has over 20 years experiences in IT. In recent years, he shifted the focus on Kubernetes, Multi-Cloud. He is 1 of the most certified including... Read More →
Wednesday August 21, 2024 14:10 - 14:15 HKT
Level 1 | Hung Hom Room 1

14:40 HKT

⚡ Lightning Talk: Kubernetes Raises Questions. Can a PaaS Answer Them? | ⚡ 闪电演讲: Kubernetes引发了问题。 PaaS能解答吗? - Ram Iyengar, Cloud Foundry Foundation
Wednesday August 21, 2024 14:40 - 14:45 HKT
The enormous success of the CNCF Landscape has produced an overwhelming number of options in the space, where organizations struggle to establish their platforms quickly. This talk will help guide the community through the thought process of building these platforms, explore some examples of what a healthy source-driven platform ecosystem looks like, and showcase the power that a good cloud native platform will deliver to an organization. Though there are variations of platforms (i.e data, application, machine learning, etc) many start to have the same problems. These include artifact management, secrets management, TLS certificates, cloud permissions, and the list goes on. Providing turnkey solutions for platforms that can be ready in minutes adds much velocity to engineering teams across organizations that adopt the platform engineering model.

CNCF景观的巨大成功在该领域产生了大量的选择,组织往往难以快速建立自己的平台。本次演讲将帮助指导社区通过构建这些平台的思考过程,探讨健康的源驱动平台生态系统的一些示例,并展示一个优秀的云原生平台将为组织带来的力量。 尽管平台有各种变化(如数据、应用程序、机器学习等),许多开始出现相同的问题。这些问题包括工件管理、密钥管理、TLS证书、云权限等等。为平台提供即插即用的解决方案,可以在几分钟内准备就绪,为采用平台工程模型的组织的工程团队带来更大的速度。
Speakers
avatar for Ram Iyengar

Ram Iyengar

Chief Evangelist, Cloud Foundry Foundation
Ram Iyengar is an engineer by practice and an educator at heart. He was (cf) pushed into technology evangelism along his journey as a developer and hasn’t looked back since! He enjoys helping engineering teams around the world discover new and creative ways to work. He is a proponent... Read More →
Wednesday August 21, 2024 14:40 - 14:45 HKT
Level 1 | Hung Hom Room 1

14:40 HKT

Self-Hosted LLM Agent on Your Own Laptop or Edge Device | 在自己的笔记本电脑或边缘设备上自托管LLM Agent - Michael Yuan, Second State
Wednesday August 21, 2024 14:40 - 15:15 HKT
As LLM applications evolve from chatbots to copilots to AI agents, there are increasing needs for privacy, customization, cost control, and value alignment. Running open-source LLMs and agents on personal or private devices is a great way to achieve those goals. With the release of a new generation of open-source LLMs, such as Llama 3, the gap between open-source and proprietary LLMs is narrowing fast. In many cases, open source LLMs are already outperforming SaaS-based proprietary LLMs. For AI agents, open-source LLMs are not just cheaper and more private. They allow customization through finetuning and RAG prompt engineering using private data. This talk shows you how to build a complete AI agent service using an open-source LLM and a personal knowledge base. We will use the open-source WasmEdge + Rust stack for LLM inference, which is fast and lightweight without complex Python dependencies. It is cross-platform and achieves native performance on any OSes, CPUs, and GPUs.

随着LLM应用程序从聊天机器人发展到副驾驶员再到AI代理,对隐私、定制、成本控制和价值对齐的需求越来越大。在个人或私人设备上运行开源LLMs和代理是实现这些目标的好方法。 随着新一代开源LLMs(如Llama 3)的发布,开源和专有LLMs之间的差距迅速缩小。在许多情况下,开源LLMs已经超越了基于SaaS的专有LLMs。对于AI代理来说,开源LLMs不仅更便宜、更私密,还允许通过微调和使用私人数据进行RAG提示工程来进行定制。 本次演讲将向您展示如何使用开源LLM和个人知识库构建完整的AI代理服务。我们将使用开源的WasmEdge + Rust堆栈进行LLM推理,这种方法快速轻便,不需要复杂的Python依赖。它是跨平台的,在任何操作系统、CPU和GPU上都能实现原生性能。
Speakers
avatar for Michael Yuan

Michael Yuan

Product Manager, Second State
Dr. Michael Yuan is a maintainer of WasmEdge Runtime (a project under CNCF) and a co-founder of Second State. He is the author of 5 books on software engineering published by Addison-Wesley, Prentice-Hall, and O'Reilly. Michael is a long-time open-source developer and contributor... Read More →
Wednesday August 21, 2024 14:40 - 15:15 HKT
Level 1 | Hung Hom Room 3

14:40 HKT

What Is the Future of Service Mesh? Sidecar or Sidecarless | 服务网格的未来是什么?边车还是无边车 - Zhonghu Xu, Huawei
Wednesday August 21, 2024 14:40 - 15:15 HKT
Istio is the most well known service mesh since 2017. From the long term, it has been evolving around sidecar, which is widely used by people. While some people claim the bad sideeffects of sidecar, istio has began to design a new sidecarless mode `Ambient` since 2022. Recently, after more than one years development, we made it beta in 1.22. In this presentation, we will also talk about what istio community has done recently, like Gateway API support, delta xDS status. At last, What is the future of istio then? Will we run all in Ambient mode or keep dual-wheel driving? Join us with the presentation, we will discuss the plans of the future,.

Istio自2017年以来是最知名的服务网格。从长期来看,它一直围绕着sidecar进行演进,这是被广泛使用的。虽然有些人声称sidecar会带来负面影响,但自2022年以来,istio已经开始设计一种新的无sidecar模式`Ambient`。最近,在经过一年多的开发后,我们在1.22版本中将其推出为beta版。 在这个演讲中,我们还将谈论istio社区最近所做的工作,比如Gateway API支持,delta xDS状态。 最后,那么istio的未来会是什么样子呢?我们会全部在Ambient模式下运行,还是保持双轮驱动?加入我们的演讲,我们将讨论未来的计划。
Speakers
avatar for Zhonghu Xu

Zhonghu Xu

Principle Engineer, huawei
Zhonghu is an open-source enthusiast and has focused on oss since 2017. In 2023, Zhonghu was awarded `Google Open Source Peer Bonus`. He has worked on istio for more than 6 years and has been a core Istio maintainer and the TOP 3 contributors. He has been continuously serving as Istio... Read More →
Wednesday August 21, 2024 14:40 - 15:15 HKT
Level 1 | Hung Hom Room 6

14:40 HKT

Scaling Kubernetes: Best Practices for Managing Large-Scale Batch Jobs with Spark and Argo Workflow | 扩展Kubernetes:管理大规模批处理作业的最佳实践与Spark和Argo工作流 - Yu Zhuang & Liu Jiaxu, Alibaba Cloud
Wednesday August 21, 2024 14:40 - 15:15 HKT
Are you managing large-scale batch jobs on Kubernetes, like data processing with Spark applications or genomics computing with Argo workflows? To complete these jobs promptly, a significant number of pods have to be scaled out/in quickly for parallel computation. It means a big pressure to Kubernetes control plane. In this talk, we will use Spark and Argo workflows as example, guiding you how to build a Kubernetes cluster which supports creating/deleting 20000 of pods frequently. Our focus will be on tuning the Kubernetes control plane, including optimizing the list-watch mechanism, service broadcasting, environment variable attachments, API server configurations. Additionally, we'll share some of the best practices for configuring Spark operator and Argo workflows controller.

您是否正在Kubernetes上管理大规模的批处理作业,比如使用Spark应用程序进行数据处理或使用Argo工作流进行基因组计算?为了及时完成这些作业,需要快速地扩展/缩减大量的Pod以进行并行计算,这给Kubernetes控制平面带来了巨大压力。 在本次演讲中,我们将以Spark和Argo工作流为例,指导您如何构建一个支持频繁创建/删除20000个Pod的Kubernetes集群。我们将重点放在调优Kubernetes控制平面上,包括优化列表-观察机制、服务广播、环境变量附加、API服务器配置等。此外,我们还将分享一些配置Spark操作员和Argo工作流控制器的最佳实践。
Speakers
avatar for Liu Jiaxu

Liu Jiaxu

Senior Engineer, Alibaba Cloud
Jiaxu Liu is a Senior Engineer on the Container Service Team at Alibaba Cloud. He specializes in observability enhancement and large-scale cluster management and optimization for Alibaba Cloud's container service offerings. Before joining Alibaba Cloud, he worked at Nokia as a Senior... Read More →
Wednesday August 21, 2024 14:40 - 15:15 HKT
Level 1 | Hung Hom Room 2

14:40 HKT

The Zen and Learning from Project Open Governance to Corporate OSS Governance | 从项目开放治理到企业开源治理的禅意与学习 - Xu Wang, Ant Group
Wednesday August 21, 2024 14:40 - 15:15 HKT
As an Open Source veteran who've been working on secure container technology (Kata Containers), the speaker has been crafting Open Source governance and strategies for projects for years. The team joined Ant Group 5 years ago and was continuously focusing on Cloud Native and Trust technologies. In 2023, the speaker was appointed to assume the role of Vice President of Open Source Technical Oversight Committee for Ant Group.The TOC job requires setting up open source strategy and growth tactics, but now for a company with 25K employees and 13K engineers. It turned out that the experience leading a top level project was immensely valuable for the new position. In this session, we'll share first hand experiences for a tech leader to wear multiple hats of tech director, open source leader, and the go-to person for OSS strategies for a large corporation, and the learnings / reflections coming from the new challenges.

作为一位开源资深人士,演讲者一直致力于安全容器技术(Kata Containers),并多年来一直在为项目制定开源治理和战略。团队于5年前加入蚂蚁集团,一直专注于云原生和信任技术。在2023年,演讲者被任命为蚂蚁集团开源技术监督委员会副主席。TOC的工作需要制定开源战略和增长策略,但现在是为一个拥有25,000名员工和13,000名工程师的公司。事实证明,领导一个顶级项目的经验对新职位非常有价值。在这场演讲上,我们将分享一个技术领导者如何在大公司中扮演技术总监、开源领导者和开源战略的权威人士等多重角色的第一手经验,以及从新挑战中获得的经验和反思。
Speakers
avatar for Xu Wang

Xu Wang

Vice President of Ant Group Open Source Technical Committee, Ant Group
Xu joined Ant Group in 2019 and is in charge of container-based Cloud-Native infrastructure and the open-source related strategies of Ant Group. Xu is also a director of the Open Infrastructure Foundation (OIF) Board. Before joining Ant Group, Xu was the CTO and co-founder of hyper.sh... Read More →
Wednesday August 21, 2024 14:40 - 15:15 HKT
Level 1 | Hung Hom Room 5

14:45 HKT

⚡ Lightning Talk: Rocket Power Your Kubernetes Career with Kubestronaut Program | ⚡ 闪电演讲: 用Kubestronaut计划提升您的Kubernetes职业生涯火力 - Giorgi Keratishvili, EPAM Systems
Wednesday August 21, 2024 14:45 - 14:50 HKT
Are you a person who wants to fly high? Conquer mountains of Kubernetes certifications then this talk is for you, Giorgi will share all details of kubestronaut program, what benefits does it gives to person and his certification journey as he holds all 5 and even more certificates from CNCF also he has been beta tester and exam developer some of them...

您是想要飞得更高的人吗?征服 Kubernetes 认证的高山?那么这个讲座适合您。Giorgi 将分享 kubestronaut 计划的所有细节,以及它对个人和他的认证之旅带来的好处。他拥有 CNCF 颁发的所有 5 个甚至更多证书,并且还曾担任其中一些证书的测试人员和考试开发人员...
Speakers
avatar for Giorgi Keratishvili

Giorgi Keratishvili

Lead System Engineer (DevOps), EPAM Systems
Giorgi has been in IT field a decade, during this period he has been exposed to majority fields of Development and Operation starting from bear metal infrastructure to higher level of automatization, beside working hour Giorgi is very actively participating in community He plays role... Read More →
Wednesday August 21, 2024 14:45 - 14:50 HKT
Level 1 | Hung Hom Room 1

14:50 HKT

⚡ Lightning Talk: Running Native WebAssembly AI Applications Everywhere | ⚡ 闪电演讲: 在任何地方运行原生WebAssembly人工智能应用程序 - Tiejun Chen, VMware
Wednesday August 21, 2024 14:50 - 14:55 HKT
In recent years WASM has been one of the hottest topics in the world of computing due to its portability, small size, fast loading, and compatibility. And given these advantages, WebAssembly is an ideal technology based on sandbox schemes for modern applications including ML/AI. But beyond the browser, currently WebAssembly only can leverage CPU to accelerate ML/AI mostly. Here we offer a flexible way to make running ML/AI on WebAssembly over a variety of AI Accelerators by empowering WASM with a transparent backend interposer. With this, your native ML/AI WebAssembly workloads can seamlessly enjoy the underlying AI accelerators such as CPU, GPU, FPGA and so on, with best performance. During this presentation we also would like to show our latest implementation with demos to help users get direct insight of running ML/AI with WebAssembly on AI accelerators.

近年来,由于其可移植性、体积小、加载速度快和兼容性等优势,WASM已成为计算领域最热门的话题之一。鉴于这些优势,WebAssembly是基于沙箱方案的现代应用程序,包括ML/AI的理想技术。但除了浏览器之外,目前WebAssembly只能利用CPU来加速大部分ML/AI。在这里,我们提供了一种灵活的方式,通过为WASM赋予一个透明的后端插入器,使其能够在各种AI加速器上运行ML/AI。借助这一技术,您的本地ML/AI WebAssembly工作负载可以无缝地享受CPU、GPU、FPGA等底层AI加速器的最佳性能。在本次演示中,我们还将展示我们最新的实现,并通过演示帮助用户直观了解在AI加速器上运行ML/AI的WebAssembly。
Speakers
avatar for Tiejun Chen

Tiejun Chen

Sr. Technical Lead, VMware
Tiejun Chen was Sr. technical leader. He ever worked several tech companies such as VMware, Intel, Wind River Systems and so on, involved in - cloud native, edge computing, ML/AI, RISC-V, WebAssembly, etc. He ever made many presentations at AI.Dev NA 2023, kubecon China 2021, Kube... Read More →
Wednesday August 21, 2024 14:50 - 14:55 HKT
Level 1 | Hung Hom Room 1

14:55 HKT

⚡ Lightning Talk: Tips and Tricks to (Right) Size Your Kubernetes Cluster for Efficiency and Cost Saving | ⚡ 闪电演讲: 为了提高效率和节约成本,调整Kubernetes集群大小的技巧和窍门 - Daniele Polencic, Learnk8s
Wednesday August 21, 2024 14:55 - 15:00 HKT
In this session, you will learn how Kubernetes allocates resources in worker nodes and how you can obtain the most out of them by choosing the right kind of limits and requests for your workloads. You will cover some practical tips to allocate the right number of nodes and resources to your cluster: - Should you have larger or smaller nodes? - How reservation affects efficiency and cost savings? - How to "defrag" your cluster to optimize allocations And more.

在这场演讲中,您将学习Kubernetes如何在工作节点中分配资源,以及如何通过为工作负载选择正确的限制和请求来充分利用它们。 您将学习一些实用的技巧,来为您的集群分配正确数量的节点和资源: - 您应该选择更大还是更小的节点? - 预留资源如何影响效率和节约成本? - 如何“整理”您的集群以优化分配 等等。
Speakers
avatar for Daniele Polencic

Daniele Polencic

Instructor, Learnk8s
Daniele teaches containers and Kubernetes at Learnk8s. Daniele is a certified Kubernetes administrator by the Linux Foundation. In the last decade, Daniele trained developers for companies in the e-commerce, finance and public sector.
Wednesday August 21, 2024 14:55 - 15:00 HKT
Level 1 | Hung Hom Room 1

15:35 HKT

Sit Back and Relax with Fault Awareness and Robust Instant Recovery for Large Scale AI Workloads | 坐和放宽,了解大规模 AI 负载场景下的故障感知和健壮的快速故障恢复 - Fanshi Zhang & Kebe Liu, DaoCloud
Wednesday August 21, 2024 15:35 - 16:10 HKT
The fault tolerance during train, fine-tuning, and even inferencing is crucial to modern AI workloads when it happens on large scale, with loads of GPU clusters. For training and fine-tuning tasks, failure of GPUs, storages, any hardware issues often cause the extending the training time to weeks and even months significantly. For inferencing, when massive loads of requests income, if one of the inferencing servers went faulty, we need a policy and scheduler to perform mitigation to transfer the workloads fast and efficiently. In this talk, We will introduce a series of mechanism we have designed to help Kubernetes clusters and workloads itself to locate, diagnostic the root cause, schedule and perform mitigation when it comes to any of hardware or CUDA API call failures to reduce the overall operating challenges. But the possibilities will not stop here, the fault awareness and mitigation scheduler will help any of the workloads to mitigate during failures.

在大规模GPU集群上进行训练、微调甚至推理时的容错性对现代人工智能工作负载至关重要。 对于训练和微调任务,GPU、存储等硬件故障经常会导致训练时间延长至数周甚至数月。对于推理任务,当大量请求涌入时,如果其中一个推理服务器出现故障,我们需要一种策略和调度程序来快速高效地转移工作负载。 在本次演讲中,我们将介绍一系列我们设计的机制,帮助Kubernetes集群和工作负载本身定位、诊断根本原因,并在硬件或CUDA API调用失败时进行调度和执行缓解,以减少整体运营挑战。但可能性不会止步于此,故障感知和缓解调度程序将帮助任何工作负载在故障期间进行缓解。
Speakers
avatar for Kebe Liu

Kebe Liu

Senior software engineer, DaoCloud
Member of Istio Steering Committee, focused on cloud-native and Istio, eBPF and other areas in recent years. Founder of Merbridge project.
avatar for Neko Ayaka

Neko Ayaka

Software Engineer, DaoCloud
Cloud native developer, AI researcher, Gopher with 5 years of experience in loads of development fields across AI, data science, backend, frontend. Co-founder of https://github.com/nolebase
Wednesday August 21, 2024 15:35 - 16:10 HKT
Level 1 | Hung Hom Room 3

15:35 HKT

OpenTelemetry Community Update | OpenTelemetry社区更新 - Zihao Rao & Huxing Zhang, Alibaba Cloud
Wednesday August 21, 2024 15:35 - 16:10 HKT
OpenTelemetry has emerged as the de facto standard for observability, gaining significant industry adoption. This talk delves into two key aspects: 1. Latest OpenTelemetry community updates: We'll explore latest within the OpenTelemetry community, presented by a community contributor. 2. Alibaba Cloud's Journey with OpenTelemetry adoption: We'll share Alibaba Cloud's experience adopting OpenTelemetry over the past several years. By actively engaging with the community, we've leveraged the community power to build full-stack observability capabilities based on OpenTelemetry. This includes: - Language-specific instrumentation for Java, Go, and Python - OpenTelemetry collectors - Continuous profiling - Observability for Large Language Model (LLM) based applications

OpenTelemetry已成为可观察性的事实标准,获得了行业的广泛采用。本次讨论涉及两个关键方面: 1. 最新的OpenTelemetry社区更新:我们将探讨OpenTelemetry社区的最新动态,由社区贡献者介绍。 2. 阿里云与OpenTelemetry采用的历程:我们将分享阿里云在过去几年中采用OpenTelemetry的经验。通过积极参与社区,我们利用社区力量构建了基于OpenTelemetry的全栈观测能力。这包括: - 针对Java、Go和Python的特定语言工具 - OpenTelemetry收集器 - 持续性能分析 - 针对基于大型语言模型(LLM)的应用的可观测性。
Speakers
avatar for Huxing Zhang

Huxing Zhang

Staff Engineer, Alibaba Cloud
Huxing Zhang is a Staff Engineer of Alibaba Cloud working on observability. He is also member of Apache Software Foundation, PMC member of Apache Tomcat and Apache Dubbo. He speaks at ApacheCon, OTel Community Days, etc.
avatar for Zihao Rao

Zihao Rao

Software Engineer, Alibaba Cloud
Zihao is a software engineer at Alibaba Cloud. Over the past few years, he has participated in several well-known open source projects, he is steering committee member of Spring Cloud Alibaba project, and is a triager for OpenTelemetry Java Instrumentation now.
Wednesday August 21, 2024 15:35 - 16:10 HKT
Level 1 | Hung Hom Room 6

15:35 HKT

Strengthening Container Security: A Collaborative Journey | 加强容器安全性:共同的旅程 - Yi Zha, Microsoft & Beltran Rueda Borrego, VMware (part of Broadcom)
Wednesday August 21, 2024 15:35 - 16:10 HKT
Ensuring the integrity and authenticity of container images is critical in securing the container supply chain. As developers are increasingly using images from external sources, questions arise: How can we verify these images originate from trusted vendors? How do we guarantee they are not altered since their creation? In this session, you will learn from the real-world experience of VMware Bitnami, who partnered with the Notary Project community to implement image signing and verification. Bitnami will show you how they use Notary Project signatures to ensure the integrity and authenticity of images from Docker Hub. Don't miss this opportunity to gain practical insights into container security with Notary Project within your CI/CD pipelines and during Kubernetes deployments! Additionally, we’ll explore future enhancements, including attestation support, empowering users to verify images from various perspectives such as provenance, vulnerability assessment, and software compliance.

确保容器镜像的完整性和真实性对于保护容器供应链至关重要。随着开发人员越来越多地使用来自外部来源的镜像,一些问题浮出水面:我们如何验证这些镜像来自可信赖的供应商?我们如何确保它们自创建以来没有被篡改?在这场演讲中,您将从VMware Bitnami的实际经验中学习,他们与Notary项目社区合作实施了镜像签名和验证。Bitnami将向您展示他们如何使用Notary项目签名来确保来自Docker Hub的镜像的完整性和真实性。不要错过这个机会,在您的CI/CD流水线和Kubernetes部署中通过Notary项目获得容器安全的实用见解!此外,我们将探讨未来的增强功能,包括证明支持,使用户能够从各种角度验证镜像,如来源、漏洞评估和软件合规性。
Speakers
avatar for Yi Zha

Yi Zha

Senior Product Manager, Microsoft
Yi is a senior product manager in Azure Container Upstream team at Microsoft and is responsible for container supply chain security for Azure services and customers. He is also a maintainer of CNCF project Notary, and a contributor of CNCF ORAS and OSS project Ratify.
Wednesday August 21, 2024 15:35 - 16:10 HKT
Level 2 | Grand Ballroom 1-2
  KubeCon + CloudNativeCon Sessions, Security

15:35 HKT

Tackling Operational Time-to-Market Decelerators in AI/ML Projects | 应对人工智能/机器学习项目中的运营时间市场减速器 - Adrian Matei & Andreea Munteanu, Canonical
Wednesday August 21, 2024 15:35 - 16:10 HKT
In the competitive AI market, Time To Market (TTM) is crucial for success. Ensuring secure, scalable, and compliant ML infrastructures often slows TTM due to the complexities of updates, patches, monitoring, and security enforcement. This leads to decreases in ROI, profitability, reproducibility, and competitive edge. To address this, companies can engage Managed Service Providers (MSPs) to offload operational burdens and focus on innovation, yet selecting the right MSP requires consideration of expertise, automation capabilities, and compliance adherence. This presentation explores the AI operational landscape, highlighting indicators and challenges in MSP collaboration. We will focus on the management of open source tools like Kubeflow and MLflow across hybrid and multicloud environments. By understanding operational excellence in AI and available options to achieve it, attendees will gain insights into choosing an approach that aligns with their greater objectives.

在竞争激烈的人工智能市场中,上市时间对于成功至关重要。确保安全、可扩展和合规的机器学习基础设施通常会因更新、补丁、监控和安全执行的复杂性而减慢上市时间,导致投资回报率、盈利能力、可复制性和竞争优势下降。为了解决这个问题,公司可以与托管服务提供商(MSPs)合作,减轻运营负担,专注于创新,但选择合适的MSP需要考虑专业知识、自动化能力和合规性。 本次演讲探讨了人工智能运营领域,重点介绍了MSP合作中的指标和挑战。我们将重点关注在混合和多云环境中管理开源工具如Kubeflow和MLflow。通过了解人工智能运营卓越性以及实现卓越性的可用选项,与会者将获得选择与其更大目标一致的方法的见解。
Speakers
avatar for Andreea Munteanu

Andreea Munteanu

AI Product Manager, Canonical
Andreea Munteanu is a Product Manager at Canonical, leading the MLOps area. With a background in Data Science in various industries, she used AI techniques to enable enterprises to benefit from their initiatives and make data-driven decisions. Nowadays, Andreea is looking to help... Read More →
avatar for Adrian Matei

Adrian Matei

Product Manager, Canonical
With a degree in Information Management for Business, Adrian is now guiding Canonical’s open-source operational management toolset as Product Manager. He has been working in open source operations for the past two years, having previously accumulated experience in technology consulting... Read More →
Wednesday August 21, 2024 15:35 - 16:10 HKT
Level 1 | Hung Hom Room 2

16:25 HKT

Simplify AI Infrastructure with Kubernetes Operators | 使用Kubernetes Operators简化AI基础设施 - Ganeshkumar Ashokavardhanan, Microsoft & Tariq Ibrahim US, NVIDIA
Wednesday August 21, 2024 16:25 - 17:00 HKT
ML applications often require specialized hardware and additional configuration to run efficiently and reliably on Kubernetes. However, managing the cluster lifecycle, the diversity and complexity of hardware configuration across nodes can be challenging. How can we simplify and automate this process to ensure a smooth experience for kubernetes users? Kubernetes Operators offer a great solution. In this session, we will go over operators and demonstrate how they can help automate the installation, configuration, and lifecycle management of AI-ready infra end to end from cluster provisioning and k8s node configuration to deep learning model deployments. We will demo a fine-tuning LLM workload, to showcase how existing operators in the ecosystem such as Cluster API Operator, GPU Operator, Network Operator, and the Kubernetes AI Toolchain Operator, can be used to simplify the infra. Finally, we will discuss challenges and best practices of using operators in production.

ML 应用通常需要专门的硬件和额外的配置才能在 Kubernetes 上高效可靠地运行。然而,管理集群生命周期、节点间硬件配置的多样性和复杂性可能具有挑战性。我们如何简化和自动化这个过程,以确保 Kubernetes 用户的顺畅体验? Kubernetes 运算符提供了一个很好的解决方案。在本场演讲中,我们将介绍运算符,并演示它们如何帮助自动化 AI-ready 基础架构的安装、配置和生命周期管理,从集群提供和 k8s 节点配置到深度学习模型部署。我们将演示一个微调 LLM 工作负载,展示生态系统中现有运算符(如 Cluster API Operator、GPU Operator、Network Operator 和 Kubernetes AI Toolchain Operator)如何简化基础架构。最后,我们将讨论在生产环境中使用运算符的挑战和最佳实践。
Speakers
avatar for Ganeshkumar Ashokavardhanan

Ganeshkumar Ashokavardhanan

Software Engineer, Microsoft
Ganesh is a Software Engineer on the Azure Kubernetes Service team at Microsoft, working on node lifecycle, and is the lead for the GPU workload experience on this kubernetes platform. He collaborates with partners in the ecosystem like NVIDIA to support operator models for machine... Read More →
avatar for Tariq Ibrahim US

Tariq Ibrahim US

Senior Cloud Platform Engineer, NVIDIA
Tariq Ibrahim is a Senior Cloud Platform Engineer on the Cloud Native team at NVIDIA where he works on enabling GPUs in containers and Kubernetes. He is a maintainer of the NVIDIA GPU Operator. He has also contributed to several cloud native OSS projects like kube-state-metrics, Istio... Read More →
Wednesday August 21, 2024 16:25 - 17:00 HKT
Level 1 | Hung Hom Room 3

16:25 HKT

XRegistry - Looking Beyond CloudEvents | xRegistry - 超越CloudEvents - Leo Li, Red Hat
Wednesday August 21, 2024 16:25 - 17:00 HKT
CloudEvents helps in the delivery of events by standardizing where common event metadata can be found in the messages carrying those events without the need to understand the schema of each event. But discovering which endpoints support those events, how to communicate with them, and finding the schema of the messages carrying those events can be challenging. This is where xRegistry can be used. xRegisty defines a core set of interoperable APIs for a generic "registry" that can be used to persist and query its contents to help discover resources and their metadata. On top of this extensible base registry model we are developing 3 domain specific registries: Endpoint, Message and Schema registries - specifically aimed at enabling the automation, tooling and code generation often needed in distributed systems development. In this session you will learn about CloudEvents, xRegistry and how we're trying to help users be more productive in an event-driven world.

CloudEvents通过标准化事件元数据在携带这些事件的消息中的位置,帮助传递事件,而无需了解每个事件的架构。但是,发现哪些端点支持这些事件,如何与它们通信,以及找到携带这些事件的消息的架构可能具有挑战性。这就是xRegistry可以使用的地方。xRegistry定义了一组用于通用“注册表”的可互操作API,可用于持久化和查询其内容,以帮助发现资源及其元数据。在这个可扩展的基础注册表模型之上,我们正在开发3个特定领域的注册表:端点、消息和架构注册表 - 专门旨在实现分布式系统开发中经常需要的自动化、工具和代码生成。在本场演讲中,您将了解CloudEvents、xRegistry以及我们如何努力帮助用户在事件驱动的世界中更加高效。
Speakers
avatar for Leo Li

Leo Li

Software Engineer Intern, Red Hat
Leo is a passionate Knative Eventing Maintainer and the technical lead of Knative UX Working Group. He developed a comprehensive Knative sample app, co-created the “Intro to Open Source” learning path on KubeByExample, and implemented key features like HTTPS support for the Kafka... Read More →
Wednesday August 21, 2024 16:25 - 17:00 HKT
Level 1 | Hung Hom Room 6

16:25 HKT

Staying Ahead of Fast-Moving Attackers | 保持领先于快速移动的攻击者 - Aizhamal Nurmamat kyzy, Sysdig
Wednesday August 21, 2024 16:25 - 17:00 HKT
How to find the right balance between convenience, operational efficiency, and a strong security policy in a world of ephemeral containers? And how can we ensure security at a time when Advanced Persistent Threats (APTs) are more prevalent? In this talk we will present the latest Cloud Native Security & Usage Report findings on critical vulnerabilities inherent in today’s container security practices. We will also demonstrate how a compromised, short-lived container can be an insidious security risk, and what we can do to detect and mitigate those risks in real time using cloud native open source tools.

在一个短暂容器世界中,如何在便利性、运营效率和强大安全政策之间找到合适的平衡?在APT(高级持续性威胁)更加普遍的时代,我们如何确保安全? 在这次演讲中,我们将介绍最新的云原生安全和使用报告发现,揭示当今容器安全实践中存在的关键漏洞。 我们还将演示一个被 compromise 的短暂容器如何成为一个隐蔽的安全风险,以及我们如何使用云原生开源工具实时检测和减轻这些风险。
Speakers
avatar for Aizhamal Nurmamat kyzy

Aizhamal Nurmamat kyzy

Director, DevRel, Sysdig
Aizhamal is a Director of DevRel at Sysdig where she focuses on education around security and open source. Previously she worked at Google's OSPO where she helped build open source communities in cloud native and data analytics ecosystems.
Wednesday August 21, 2024 16:25 - 17:00 HKT
Level 2 | Grand Ballroom 1-2
  KubeCon + CloudNativeCon Sessions, Security

16:25 HKT

Unleashing the Power of Cluster API: Extensibility and Customization | 释放Cluster API的力量:可扩展性和定制化 - Zain Malik, CityStorageSystems & Nibir Bora, Startup
Wednesday August 21, 2024 16:25 - 17:00 HKT
Cluster API, designed with extensibility at its core, has revolutionized Kubernetes cluster management. Its open and pluggable architecture empowers providers to implement custom solutions tailored to their unique requirements. In this session, we will explore how Cluster API's extension-by-design philosophy has opened new horizons for organizations seeking to create bespoke Kubernetes clusters. Managing Kubernetes clusters at scale presents unique operational challenges that cannot be tamed with manual operations. Through real-world examples and lessons learned, we will demonstrate how Cluster API's flexibility allows for the integration of diverse infrastructure providers and the implementation of organization-specific customizations. Attendees will gain insights into best practices for extending Cluster API, including developing custom controllers, integrating third-party tools, and creating bespoke workflows.

Cluster API是以可扩展性为核心设计的,已经彻底改变了Kubernetes集群管理。其开放和可插拔的架构赋予提供者实施定制解决方案的能力,以满足其独特需求。在本场演讲中,我们将探讨Cluster API的“通过设计进行扩展”的理念如何为寻求创建定制化Kubernetes集群的组织开辟了新的视野。 在规模化管理Kubernetes集群时,会面临无法通过手动操作解决的独特运营挑战。 通过现实世界的例子和经验教训,我们将演示Cluster API的灵活性如何允许集成各种基础设施提供者,并实施组织特定的定制化。与会者将获得有关扩展Cluster API的最佳实践的见解,包括开发自定义控制器、集成第三方工具和创建定制工作流程。
Speakers
avatar for Zain Malik

Zain Malik

Staff Software Engineer, CityStorageSystems
Zain Malik serves as a tech lead in the compute team for a startup, where he has significantly contributed to projects related to cost saving and reliability. And help mature cluster lifecycle management. Before this role, Zain was a product owner and staff software engineer in the... Read More →
avatar for Nibir Bora

Nibir Bora

Engineering Manager, Startup
Nibir is a Engineering Manager in charge of Core Infrastructure at a Stealth Startup, where he is responsible for the company's Kubernetes infrastructure running 100s of clusters globally.
Wednesday August 21, 2024 16:25 - 17:00 HKT
Level 1 | Hung Hom Room 2
  KubeCon + CloudNativeCon Sessions, Operations + Performance

16:25 HKT

Scaling Open Source Impact: FOSSASIA's Journey from Bootstrap to Educating 300,000 Developers | 扩大开源影响力:FOSSASIA从初创到教育30万开发者的旅程 - Hong Phuc Dang, FOSSASIA
Wednesday August 21, 2024 16:25 - 17:00 HKT
Hong Phuc Dang and Mario Behling will share FOSSASIA's journey from humble beginnings to educating over 300,000 developers in Asia. Learn how FOSSASIA scaled open-source education, engaged communities, & developed pioneering projects. Discover FOSSASIA's approach to automation and technological solutions, which streamlined operations long before the low-code movement. They'll spotlight projects like Eventyay & SUSI.AI, showcasing pioneering yet challenging endeavors. Learn about FOSSASIA's event organization best practices and their strategy of involving non-tech students in programs. Gain insights applicable beyond open source, impacting education and business operations. This session offers valuable knowledge for educators, open-source enthusiasts, developers, & business professionals. Whether you aim to expand projects or infuse startups with fresh ideas, join us to learn how a pragmatic open-source strategy can revolutionize organizations and empower tech pioneers and startups.

洪福·邓和马里奥·贝林将分享FOSSASIA从起步阶段到在亚洲教育超过30万开发人员的旅程。了解FOSSASIA如何扩展开源教育,参与社区,并开发开创性项目。 探索FOSSASIA的自动化和技术解决方案,这些解决方案在低代码运动之前就已经简化了运营。他们将重点介绍像Eventyay和SUSI.AI这样的项目,展示开创性而具有挑战性的努力。 了解FOSSASIA的活动组织最佳实践以及他们在项目中吸引非技术学生的策略。获得超越开源的见解,影响教育和业务运营。 这场演讲为教育工作者、开源爱好者、开发人员和商业专业人士提供宝贵的知识。无论您的目标是扩大项目还是为初创企业注入新思路,加入我们,了解一个务实的开源策略如何革新组织,赋予科技先驱和初创企业力量。
Speakers
avatar for Hong Phuc Dang

Hong Phuc Dang

Founder, FOSSASIA
Hong Phuc is the founder of FOSSASIA, an organization dedicated to leveraging open technologies to enhance societal well-being and foster sustainable production practices. She chairs the annual FOSSASIA Summit, one of the largest open source conferences in Asia. With over a decade... Read More →
Wednesday August 21, 2024 16:25 - 17:00 HKT
Level 1 | Hung Hom Room 5

17:15 HKT

Unlocking Heterogeneous AI Infrastructure K8s Cluster: Leveraging the Power of HAMi | 解锁异构AI基础设施K8s集群:发挥HAMi的力量 - Xiao Zhang, DaoCloud & Mengxuan Li, The 4th Paradigm
Wednesday August 21, 2024 17:15 - 17:50 HKT
With AI's growing popularity, Kubernetes has become the de facto AI infrastructure. However, the increasing number of clusters with diverse AI devices (e.g., NVIDIA, Intel, Huawei Ascend) presents a major challenge. AI devices are expensive, how to better improve resource utilization? How to better integrate with K8s clusters? How to manage heterogeneous AI devices consistently, support flexible scheduling policies, and observability all bring many challenges The HAMi project was born for this purpose. This session including: * How K8s manages heterogeneous AI devices (unified scheduling, observability) * How to improve device usage by GPU share * How to ensure the QOS of high-priority tasks in GPU share stories * Support flexible scheduling strategies for GPU (NUMA affinity/anti-affinity, binpack/spread etc) * Integration with other projects (such as volcano, scheduler-plugin, etc.) * Real-world case studies from production-level users. * Some other challenges still faced and roadmap

随着人工智能的日益普及,Kubernetes已成为事实上的人工智能基础设施。然而,不断增加的具有多样化人工智能设备(如NVIDIA、Intel、华为Ascend)的集群数量带来了重大挑战。人工智能设备价格昂贵,如何更好地提高资源利用率?如何更好地与K8s集群集成?如何一致地管理异构人工智能设备,支持灵活的调度策略和可观察性都带来了许多挑战。HAMi项目应运而生。本场演讲包括: * K8s如何管理异构人工智能设备(统一调度、可观察性) * 如何通过GPU共享提高设备使用率 * 如何确保GPU共享故事中高优先级任务的QOS * 为GPU支持灵活的调度策略(NUMA亲和性/反亲和性、binpack/spread等) * 与其他项目的集成(如volcano、scheduler-plugin等) * 来自生产级用户的实际案例研究。 * 仍然面临的一些其他挑战和路线图
Speakers
avatar for xiaozhang

xiaozhang

Senior Technical Lead, DaoCloud
- Xiao Zhang is leader of the Container team(focus on infra,AI,Muti-Cluster,Cluster - LCM,OCI) - Kubernetes / Kubernetes-sigs active Contributor、member - Karmada maintainer,kubean maintainer,HAMi maintainer - Cloud-Native Developer - CNCF Open Source Enthusiast. - GithubID: waw... Read More →
avatar for Mengxuan Li

Mengxuan Li

senior developer, The 4th Paradigm Co., Ltd
Reviewer of volcano community Founder of CNCF Landscape project HAMi responsible for the development of gpu virtualization mechanism on volcano. It have been merged in the master branch of volcano, and will be released in v1.8. speaker, in OpenAtom Global Open Source Commit#2023 speaker... Read More →
Wednesday August 21, 2024 17:15 - 17:50 HKT
Level 1 | Hung Hom Room 3

17:15 HKT

How to Manage Database Clusters Without a Dedicated Operator | 如何在没有专门Operator的情况下管理数据库集群 - Shanshan Ying, ApeCloud & Shun Ding, China Mobile Cloud
Wednesday August 21, 2024 17:15 - 17:50 HKT
As Kubernetes becomes integral to cloud-native environments, more organizations are deploying database services on K8S, facing significant challenges. Integrating new database engines typically requires developing a dedicated Kubernetes operator that manages not only resource provisioning but also essential maintenance tasks like high availability, backup & restore, and configuration management. This session introduces a universal operator framework that supports various database engines, enabling rapid, minimal-code integration. We will present a case study from China Mobile Cloud on integrating a new cloud-native database engine into K8S using this framework, achieved with minimal coding and reduced time investment, bypassing the extensive Golang coding usually required for developing a dedicated operator.

随着Kubernetes成为云原生环境中不可或缺的一部分,越来越多的组织在K8S上部署数据库服务,面临着重大挑战。集成新的数据库引擎通常需要开发一个专门的Kubernetes operator,管理资源提供以及高可用性、备份和恢复、配置管理等重要维护任务。 本场演讲将介绍一个支持各种数据库引擎的通用operator框架,实现快速、最小代码集成。我们将从中国移动云的一个案例研究中介绍如何使用这个框架将新的云原生数据库引擎集成到K8S中,通过最小的编码和减少时间投入来实现,避免通常需要开发专门operator所需的大量Golang编码。
Speakers
avatar for Shanshan Ying

Shanshan Ying

Maintainer, ApeCloud
Shanshan is currently a maintainer of KubeBlocks by ApeCloud. Before joining ApeCloud, she worked in Aliyun Database Group for years. She received her PhD degree from National University of Singapore.
avatar for Shun Ding

Shun Ding

Senior Systems Architect, China Mobile Cloud
Shun is a Senior Systems Architect at China Mobile Cloud, leading the design, development, and deployment of next-generation Kubernetes-based large-scale database managing service. With over a decade of experience in cloud computing and database technologies, Shun has extensive expertise... Read More →
Wednesday August 21, 2024 17:15 - 17:50 HKT
Level 1 | Hung Hom Room 2
  KubeCon + CloudNativeCon Sessions, Operations + Performance

17:15 HKT

Leveraging Wasm for Portable AI Inference Across GPUs, CPUs, OS & Cloud-Native Environments | 利用Wasm在GPU、CPU、操作系统和云原生环境中进行可移植的AI推理 - Miley Fu & Hung-Tung Tai, Second State
Wednesday August 21, 2024 17:15 - 17:50 HKT
This talk will focus on the advantages of using WebAssembly (Wasm) for running AI inference tasks in a cloud-native ecosystem. We will explore how wasm empowers devs to develop on their own PC and have their AI inference uniformly performed across different hardware, including GPUs and CPUs, operating systems, edge cloud etc. We'll discuss how Wasm and Wasm runtime facilitates seamless integration into cloud-native frameworks, enhancing the deployment and scalability of AI applications. This presentation will specifically highlight how Wasm provides a flexible, efficient solution suitable for diverse cloud-native architectures, including Kubernetes, to allow developers to fully tap the potential of LLMs, especially open source LLMs. The session offers insights into maximizing the potential of AI applications by leveraging the cross-platform capabilities of Wasm, ensuring consistency, low cost, and efficiency in AI inference across different computing environments.

本次演讲将重点介绍在云原生生态中运行AI推理任务时使用WebAssembly(Wasm)的优势。我们将探讨如何使用Wasm使开发者能够在自己的个人电脑上开发,并在不同硬件(包括GPU和CPU)、操作系统、边缘云等上统一执行他们的AI推理。 我们将讨论Wasm和Wasm运行时如何实现无缝集成到云原生框架中,增强AI应用程序的部署和可扩展性。本次演示将重点展示Wasm如何提供灵活、高效的解决方案,适用于各种云原生架构,包括Kubernetes,以帮助开发者充分发挥大语言模型的潜力,特别是开源大语言模型。 将深入探讨通过利用Wasm的跨平台能力来最大限度地发挥AI应用的潜力,确保在不同计算环境中实现AI推理的一致性、低成本和高效性。
Speakers
avatar for Hung-Ying Tai

Hung-Ying Tai

Software Engineer, Second State
Hung-Ying is a maintainer of the WasmEdge project and a pioneer in compiler optimization and virtual machine design. He is a prolific open-source contributor, participating in many open-source projects, including go-ethereum, solidity, SOLL, crun, and WasmEdge.
avatar for Miley Fu

Miley Fu

CNCF Ambassador, Founding member at WasmEdge, Second State Inc
Miley is a Developer Advocate with a passion for empowering devs to build and contribute to open source. With over 5 years of experience working on WasmEdge runtime in CNCF sandbox as the founding member, she talked at KubeCon, KCD Shenzhen, CloudDay Italy, DevRelCon, Open Source... Read More →
Wednesday August 21, 2024 17:15 - 17:50 HKT
Level 1 | Hung Hom Room 7
  KubeCon + CloudNativeCon Sessions, AI + ML

17:15 HKT

Scorecard: Assessments Made Easy | Scorecard:让开源项目评估更轻松 - Ram Iyengar, Cloud Foundry Foundation
Wednesday August 21, 2024 17:15 - 17:50 HKT
Scorecard is a project of the OpenSSF, which makes it simple to assess the health of any repository. It is a fully open source project built with the aim of bringing transparency and standardization around security health metrics. Scorecard is a cross-industry collaboration between big and small names in OSS/security. Scorecard checks for vulnerabilities affecting different parts of the software supply chain including source code, build, dependencies, testing, and project maintenance.

Scorecard 是 OpenSSF 的一个项目,它简化了对任何代码仓库健康状况的评估。这是一个完全开源的项目,旨在为安全健康指标带来透明度和标准化。Scorecard 是开源软件/安全领域大大小小公司之间的跨行业合作。Scorecard 检查影响软件供应链不同部分的漏洞,包括源代码、构建、依赖关系、测试和项目维护。
Speakers
avatar for Ram Iyengar

Ram Iyengar

Chief Evangelist, Cloud Foundry Foundation
Ram Iyengar is an engineer by practice and an educator at heart. He was (cf) pushed into technology evangelism along his journey as a developer and hasn’t looked back since! He enjoys helping engineering teams around the world discover new and creative ways to work. He is a proponent... Read More →
Wednesday August 21, 2024 17:15 - 17:50 HKT
Level 1 | Hung Hom Room 5
  Open Source Summit Sessions, Supply Chain Security
 
Thursday, August 22
 

10:05 HKT

Keynote: Supporting Large-Scale and Reliability Testing in Kubernetes using KWOK | 主论坛演讲: 支持在Kubernetes中使用KWOK进行大规模和可靠性测试 - Yuan Chen, NVIDIA & Shiming Zhang, DaoCloud
Thursday August 22, 2024 10:05 - 10:20 HKT
Kubernetes is the de facto platform for running workloads at scale. This talk will present KWOK (https://kwok.sigs.k8s.io/), an open-source toolkit that enables the creation and testing of large-scale Kubernetes clusters with minimal resources, even on a laptop.
Shiming Zhang, the creator and maintainer of KWOK, and Yuan Chen, an engineer at NVIDIA GPU Cloud, will outline KWOK's capabilities to generate and manage a large number of virtual nodes that simulate Kubelet APIs and mimic real nodes, allowing for workload deployment and testing. They will discuss practical use cases of KWOK.

The talk will then introduce KWOK's recent enhancements for reliability and fault-tolerance testing, showcasing its ability to simulate failures by injecting targeted faults into nodes and pods. Through examples and demos, the talk will demonstrate how KWOK can be used for reliability testing and evaluating fault-tolerance mechanisms, ultimately improving workload resilience in Kubernetes.



Kubernetes是运行大规模工作负载的事实标准平台。本次演讲将介绍KWOK(https://kwok.sigs.k8s.io/),这是一个开源工具包,可以利用极少的资源(甚至在笔记本电脑上)创建和测试大规模Kubernetes集群。

KWOK的创始人和维护者张世明,以及NVIDIA GPU Cloud的工程师陈源,将详细阐述KWOK的功能,包括生成和管理大量模拟Kubelet API和真实节点的虚拟节点,从而支持工作负载的部署和测试。他们将讨论KWOK的实际使用案例。

演讲还将介绍KWOK最近针对可靠性和容错性测试的增强功能,展示其通过向节点和Pod注入有针对性的故障来模拟故障的能力。通过示例和演示,演讲将展示如何利用KWOK进行可靠性测试和评估容错机制,从而最终提升Kubernetes中工作负载的弹性能力。
Speakers
avatar for Yuan Chen

Yuan Chen

Principal Software Engineer, NVIDIA
Yuan Chen is a Principal Software Engineer at NVIDIA, working on building NVIDIA GPU Cloud. He served as a Staff Software Engineer at Apple from 2019 to 2024, where he contributed to the development of Apple's Kubernetes infrastructure. Yuan has been an active code contributor to... Read More →
avatar for Shiming Zhang

Shiming Zhang

Software Engineer, DaoCloud
Shiming Zhang is a contributor to Kubernetes with the main focus on scalability, performance, reliability, and testing, he gained experience and contributed to many Kubernetes features and most of its components.
Thursday August 22, 2024 10:05 - 10:20 HKT
Level 2 | Grand Ballroom 1-2

11:00 HKT

Unlocking the Power of Kubernetes: AI-Driven Innovations for Next-Gen Infrastructure | 释放 Kubernetes 的力量:面向下一代基础设施的 AI 驱动创新 - Brandon Kang, Akamai Technologies
Thursday August 22, 2024 11:00 - 11:35 HKT
My session is about dynamic synergy between Kubernetes and AI, unveiling a transformative paradigm shift in modern infrastructure management. The presentation unveils how Kubernetes serves as an enabler for deploying and scaling AI workloads efficiently, optimizing resource utilization, and ensuring unparalleled scalability. Delving deeper, it explores the realm of AI-powered automation, showcasing how intelligent algorithms enhance auto-scaling, workload optimization, and predictive maintenance within Kubernetes clusters. Moreover, it sheds light on the crucial aspect of security, elucidating how AI-driven measures bolster threat detection and anomaly identification, fortifying Kubernetes environments against potential risks. This presentation beckons organizations to embrace the convergence of Kubernetes and AI, unlocking boundless possibilities to redefine infrastructure management and propel towards unprecedented efficiency and resilience.

我的演讲是关于 Kubernetes 和人工智能之间的动态协同作用,揭示了现代基础设施管理中的转变范式。演示展示了 Kubernetes 如何作为部署和扩展人工智能工作负载的促进者,有效优化资源利用率,并确保无与伦比的可扩展性。更深入地探讨了基于人工智能的自动化领域,展示了智能算法如何增强 Kubernetes 集群内的自动扩展、工作负载优化和预测性维护。此外,它还阐明了安全的关键方面,阐明了人工智能驱动的措施如何加强威胁检测和异常识别,加固 Kubernetes 环境抵御潜在风险。 这个演示呼吁组织 embrace Kubernetes 和人工智能的融合,解锁无限可能性,重新定义基础设施管理,并朝着前所未有的效率和韧性迈进。
Speakers
avatar for Brandon Kang

Brandon Kang

Principal Technical Solutions Architect, Akamai Technologies
Brandon Kang is a Cloud Specialist at Akamai Technologies, where he oversees cloud computing projects across the APJ markets, including China. Before his tenure at Akamai, Brandon was a software engineer at Samsung, a program manager at Microsoft, and a service platform expert at... Read More →
Thursday August 22, 2024 11:00 - 11:35 HKT
Level 1 | Hung Hom Room 3

11:00 HKT

Dollars and PPM's - Carbon Emissions and Cloud Spend | 美元和PPM - 碳排放和云支出 - Bryan Oliver, Thoughtworks
Thursday August 22, 2024 11:00 - 11:35 HKT
Cloud Carbon emissions are unfortunately not the priority of most enterprises. Costs, however, are. In the Cloud Native space, there is an ever-growing list of spend tracking and reduction tools. In this talk, we'll discuss several strategies you can adopt to unify the prioritization of cloud costs and carbon impact. We want to show how you can align with your business goal of simultaneously reducing cloud spend and overall carbon emissions.

云计算的碳排放很可惜并不是大多数企业的首要任务。成本,然而,是。在云原生领域,有越来越多的支出跟踪和降低工具。 在这次讨论中,我们将讨论几种您可以采用的策略,统一云成本和碳影响的优先级。我们希望展示如何与您同时降低云支出和整体碳排放的业务目标保持一致。
Speakers
avatar for Bryan Oliver

Bryan Oliver

Principal, Thoughtworks
Bryan is an experienced engineer and leader who designs and builds complex distributed systems. He has spent his career developing mobile and back-end systems whilst building autonomous teams. More recently he has been focused on delivery and cloud native at Thoughtworks. In his free... Read More →
Thursday August 22, 2024 11:00 - 11:35 HKT
Level 2 | Grand Ballroom 1-2
  KubeCon + CloudNativeCon Sessions, Observability

11:00 HKT

The Journey of Next-Gen FinTech IDP at China Merchants Bank | 中国招商银行下一代金融科技IDP之旅 - Jiahang Xu, China Merchants Bank
Thursday August 22, 2024 11:00 - 11:35 HKT
Explore China Merchants Bank's (CMB), one of China's largest retail banks, transformative journey through cloud migration, cloud-native transformation, and platform engineering over the past three years. Despite challenges such as increased complexity in cloud technology and management, and potential risks to developer productivity and continuous assurance of financial services, CMB successfully leveraged KubeVela, OpenFeature, Envoy, Clilum, and OpenTelemetry to build the Next-Gen FinTech IDP. This led to the management of 70% of applications within a year and improved developer experience, covering thousands of R&D engineers. We'll discuss the strategic thinking, 'Golden Path' implementation, struggles, trade-offs, and key success metrics with platform engineering maturity model. This session provides a blueprint and reference architecture for financial organizations undergoing similar transformations.

在KubeCon的会议描述中,探索中国招商银行(CMB)作为中国最大的零售银行之一,在过去三年中通过云迁移、云原生转型和平台工程的变革之旅。尽管面临诸如云技术和管理复杂性增加、开发人员生产力和金融服务持续保障的潜在风险等挑战,CMB成功利用KubeVela、OpenFeature、Envoy、Clilum和OpenTelemetry构建了下一代金融科技IDP。这导致了一年内管理了70%的应用程序,并改善了开发人员体验,涵盖了数千名研发工程师。我们将讨论战略思维、“黄金路径”实施、挣扎、权衡和关键成功指标,以及平台工程成熟度模型。本场演讲提供了金融机构进行类似转型的蓝图和参考架构。
Speakers
avatar for Jiahang Xu

Jiahang Xu

System Architect, China Merchants Bank
Jiahang Xu is a System Architect at China Merchants Bank. He has over 14 years of unique cross-domain experience working in telecom, automotive, financial industry, startup as a co-founder, and KubeVela maintainer. He's mainly focused on cloud-native application technology and platform... Read More →
Thursday August 22, 2024 11:00 - 11:35 HKT
Level 1 | Hung Hom Room 7
  KubeCon + CloudNativeCon Sessions, Platform Engineering

11:50 HKT

VeScale: A PyTorch Native LLM Training Framework | veScale:一个PyTorch原生LLM训练框架 - Hongyu Zhu, ByteDance
Thursday August 22, 2024 11:50 - 12:25 HKT
The era of giant LLM today calls forth distributed training. Despite countless distributed training frameworks that have been published in the past decade, few have excelled at real industry production, as the quality favored the most is often the Ease of Use instead of pure Performance. The Ease of Use lies in two essentials -- PyTorch and Automatic Parallelism, because: i) PyTorch ecosystem dominates and owns 92% of models on HuggingFace, and ii) giant models cannot be trained without complex nD Parallelism. Currently, this Ease of Use is "broken" for industry-level frameworks, as they are either not PyTorch-native (TensorFlow/JAX) or not fully Automated (Megatron/DeepSpeed/torch). We propose a novel framework that combines PyTorch Nativeness and Automatic Parallelism for scaling LLM training with Ease of Use. We only expect developers to write single-device torch code but automatically parallelize it into nD parallelism with all heavy lifting handled transparently.

当今巨型LLM时代呼唤分布式训练。尽管过去十年中已经发布了无数分布式训练框架,但很少有能够在真实产业生产中表现出色,因为最受青睐的质量往往是易用性而不是纯性能。易用性在于两个关键点--PyTorch和自动并行性,因为:i)PyTorch生态系统主导并拥有HuggingFace上92%的模型,ii)巨型模型无法在没有复杂的nD并行性的情况下进行训练。 目前,这种易用性对于产业级框架来说已经“破碎”,因为它们要么不是PyTorch原生的(TensorFlow/JAX),要么不是完全自动化的(Megatron/DeepSpeed/torch)。 我们提出了一个结合了PyTorch原生性和自动并行性的新型框架,以便通过易用性扩展LLM训练。我们只期望开发人员编写单设备torch代码,但自动将其并行化为nD并行性,所有繁重的工作都由框架透明地处理。
Speakers
avatar for Hongyu Zhu

Hongyu Zhu

Machine Learning System Software Engineer, ByteDance
Hongyu is a Machine Learning System Engineer in ByteDance AML group, working on systems and compilers for training workloads. He got his PhD degree from University of Toronto, where he worked with Professor Gennady Pekhimenko. He is generally interested in machine learning compilers... Read More →
Thursday August 22, 2024 11:50 - 12:25 HKT
Level 1 | Hung Hom Room 3

11:50 HKT

Building a High-Performance Time Series Database from Scratch: Optimization Strategies | 从零开始构建高性能时序数据库:优化策略 - Aliaksandr Valialkin, VictoriaMetrics
Thursday August 22, 2024 11:50 - 12:25 HKT
Application Performance Monitoring and Kubernetes monitoring in their current state are pretty expensive. The average VictoriaMetrics installation is processing 2-4 million samples/s on the ingestion path, and 20-40 million samples/s on the read path. The biggest installations account for 100 million samples/s on the ingestion path. This requires being very clever with data pipelines to keep them efficient and scalable by adding more resources. In this session, we'll explore essential optimizations to maintain database speed such as string interning, caching results, goroutine management and utilizing sync.Pool for efficient resource management. These techniques help strike a balance between performance and resource consumption. This talk focuses on practical strategies for enhancing database speed.

在当前状态下,应用程序性能监控和Kubernetes监控非常昂贵。平均VictoriaMetrics安装在摄入路径上处理2-4百万样本/秒,在读取路径上处理20-40百万样本/秒。最大的安装在摄入路径上占据了1亿样本/秒。这需要通过对数据管道进行非常聪明的优化,通过增加更多资源来保持其高效和可扩展性。在本场演讲中,我们将探讨保持数据库速度的基本优化,如字符串内部化、缓存结果、goroutine管理和利用sync.Pool进行有效的资源管理。这些技术有助于在性能和资源消耗之间取得平衡。本次演讲侧重于增强数据库速度的实用策略。
Speakers
avatar for Hui Wang

Hui Wang

Software Engineer, VictoriaMetrics
I'm working on monitoring at VictoriaMetrics. My passion is cloud-native technologies and opensource.
avatar for Aliaksandr Valialkin

Aliaksandr Valialkin

CTO, VictoriaMetrics
Aliaksandr is a co-founder and the principal architect of VictoriaMetrics. He is also a well-known author of the popular performance-oriented libraries: fasthttp, fastcache and quicktemplate. He holds a Master’s Degree in Computer Software Engineering. He decided to found VictoriaMetrics... Read More →
Thursday August 22, 2024 11:50 - 12:25 HKT
Level 1 | Hung Hom Room 2

11:50 HKT

Redefining Service Mesh: Leveraging EBPF to Optimize Istio Ambient Architecture and Performance | 重新定义服务网格:利用eBPF优化Istio环境架构和性能 - Yuxing Zeng, Alibaba Cloud
Thursday August 22, 2024 11:50 - 12:25 HKT
Istio Ambient separates the L4/L7 functions found in the traditional sidecar model and introduces the ztunnel component, which implement the L4 network load balancing and secure zero-trust. However, as ztunnel is deployed at the node level with DaemonSet, any malfunction or anomaly in ztunnel may impact the traffic of all mesh-related pods under that node. Furthermore, performance tests of Ambient Mesh have not delivered the anticipated outcomes; ztunnel often becomes a performance bottleneck. These factors make it challenging to apply Ambient Mesh in production environments. it appears that we require a more optimized and practical implementation solution. This session will share: 1. An introduction to the architecture of Istio Ambient Mesh, along with current known issues with the existing implement. 2. using eBPF to implement zero-trust and L4 network traffic capabilities, enhancing the stability of the Mesh network, and significantly improving overall performance.

Istio Ambient将传统的边车模型中发现的L4/L7功能分离,并引入了ztunnel组件,实现了L4网络负载均衡和安全的零信任。然而,由于ztunnel部署在节点级别的DaemonSet上,ztunnel中的任何故障或异常可能会影响该节点下所有与网格相关的Pod的流量。此外,Ambient Mesh的性能测试并未达到预期的结果;ztunnel经常成为性能瓶颈。这些因素使得在生产环境中应用Ambient Mesh变得具有挑战性。看起来我们需要一个更优化和实用的实现解决方案。 本次会话将分享: 1. Istio Ambient Mesh架构的介绍,以及现有实现中已知的问题。 2. 使用eBPF实现零信任和L4网络流量功能,增强Mesh网络的稳定性,并显著提高整体性能。
Speakers
avatar for Jesse Zeng

Jesse Zeng

Technical Expert, Alibaba Cloud
Yuxing Zeng is a technical expert in the Container Service Team at Alibaba Cloud. He is also a Istio Member、Envoy Contributor. He has rich experience in cloud native fields such as Kubernetes、Istio、 Envoy, etc.
Thursday August 22, 2024 11:50 - 12:25 HKT
Level 1 | Hung Hom Room 1
  KubeCon + CloudNativeCon Sessions, Connectivity

11:50 HKT

Unlocking Scalability and Simplifying Multi-Cloud Management with Karmada and PipeCD | 使用Karmada和PipeCD解锁可扩展性并简化多云管理 - Khanh Tran, CyberAgent, Inc. & Hongcai Ren, Huawei
Thursday August 22, 2024 11:50 - 12:25 HKT
In the new AI coming age, it has become inevitable for any organizations to embrace the multi-cloud approach. Managing applications across multiple clouds can present various challenges, including resilience, performance, security, cost, and deployment management. How well did you prepare yourself and your services for that new coming age? This presentation will introduce Karmada and PipeCD, two powerful tools designed to support organizations in effectively addressing these challenges and achieving seamless multi-cloud management. Karmada is a multi-cloud container orchestration, while PipeCD is a multi-cloud continuous delivery solution. Both tools are built based on extensive experience in managing applications at scale across multiple clouds. We will delve into the key features and benefits of Karmada and PipeCD, and how they can simplify multi-cloud management. Together, we can unlock the true potential of multi-cloud systems and empower organizations to thrive in the era of AI.

在新的人工智能时代,任何组织都不可避免地需要采用多云方法。在多个云上管理应用程序可能会带来各种挑战,包括弹性、性能、安全性、成本和部署管理。您为新时代做好了多少准备?本次演讲将介绍Karmada和PipeCD,这两款强大的工具旨在支持组织有效应对这些挑战,实现无缝的多云管理。Karmada是一个多云容器编排工具,而PipeCD是一个多云持续交付解决方案。这两款工具都是基于在多个云上管理应用程序的丰富经验构建的。我们将深入探讨Karmada和PipeCD的关键特性和优势,以及它们如何简化多云管理。让我们一起释放多云系统的真正潜力,赋予组织在人工智能时代蓬勃发展的力量。
Speakers
avatar for Hongcai Ren

Hongcai Ren

Senior Software Engineer, Huawei
Hongcai Ren(@RainbowMango) is the CNCF Ambassador, who has been working on Kubernetes and other CNCF projects since 2019, and is the maintainer of the Kubernetes and Karmada projects.
avatar for Khanh Tran

Khanh Tran

Software Engineer, CyberAgent, Inc.
Khanh is a maintainer of the PipeCD project. He is currently employed at CyberAgent Inc, and responsible for the CI/CD system across the organization. As a member of the developer productivity team, his primary focus is on automation and anything that enhances the development process... Read More →
Thursday August 22, 2024 11:50 - 12:25 HKT
Level 1 | Hung Hom Room 7
  KubeCon + CloudNativeCon Sessions, Platform Engineering

13:50 HKT

Choose Your Own Adventure: The Struggle for Security | 选择你的冒险:安全之战 - Whitney Lee, VMware Tanzu & Viktor Farcic, Upbound
Thursday August 22, 2024 13:50 - 14:25 HKT
Our hero, a running application in a Kubernetes production environment, knows they are destined for greater things! They are serving end users, but currently, they are also endangering those users, the system, and themselves! But the struggle for security is HARD, filled with system design choices concerning secrets management; cluster-level and runtime policies; and securing pod-to-pod communications. It is up to you, the audience, to guide our hero and help them grow from a vulnerable, unprotected application to their final form⎯an app that is more secure against invasion. In their third ‘Choose Your Own Adventure’-style talk, Whitney and Viktor will present choices that an anthropomorphized app must make as they try to protect themselves against every kind of exploit. Throughout the presentation, the audience (YOU!) will vote to decide our hero app's path! Can we navigate CNCF projects to safeguard our app, system, and users against attack before the session time elapses?

我们的英雄是一个在Kubernetes生产环境中运行的应用程序,他知道自己注定要成为更伟大的存在!他正在为最终用户提供服务,但目前却也在危及这些用户、系统和自己!但是安全的斗争是艰难的,充满了关于秘钥管理、集群级别和运行时策略以及保护Pod之间通信的系统设计选择。 观众们,你们将扮演引导我们英雄并帮助他们从一个脆弱、无保护的应用程序成长为更加安全抵御入侵的终极形态的角色。在这场第三场“选择你自己的冒险”风格的演讲中,Whitney和Viktor将呈现一个拟人化应用程序必须做出的选择,以试图保护自己免受各种利用。在整个演示过程中,观众(就是你!)将投票决定我们英雄应用程序的道路!在演讲结束之前,我们能否通过探索CNCF项目来保护我们的应用程序、系统和用户免受攻击呢?
Speakers
avatar for Viktor Farcic

Viktor Farcic

Developer Advocate, Upbound
Viktor Farcic is a lead rapscallion at Upbound, a member of the CNCF Ambassadors, Google Developer Experts, CDF Ambassadors, and GitHub Stars groups, and a published author. He is a host of the YouTube channel DevOps Toolkit and a co-host of DevOps Paradox.
avatar for Whitney Lee

Whitney Lee

Developer Advocate, VMware Tanzu
Whitney is a lovable goofball and a CNCF Ambassador who enjoys understanding and using tools in the cloud native landscape. Creative and driven, Whitney recently pivoted from an art-related career to one in tech. You can catch her lightboard streaming show ⚡️ Enlightning on Tanzu.TV... Read More →
Thursday August 22, 2024 13:50 - 14:25 HKT
Level 1 | Hung Hom Room 2
  KubeCon + CloudNativeCon Sessions, Cloud Native Novice

13:50 HKT

Testing and Release Patterns for Crossplane | 跨平面的测试和发布模式 - Yury Tsarev & Steven Borrelli, Upbound
Thursday August 22, 2024 13:50 - 14:25 HKT
Crossplane has become the foundation of many Internal Developer Platforms (IDPs). A requirement for any IDP in production is the ability to make changes and upgrades to the platform with confidence. This talk will cover testing and release patterns based on our experience building production-ready environments across a range of Crossplane users. We’ll cover the lifecycle of a Crossplane Composition upgrade, from local commit to pull request to target customer environment, end-to-end testing tools, handling API changes, and how to control updates to customer environments. For quite a while, testing Crossplane Compositions meant relying exclusively on costly end-to-end layers. In this talk, we're unveiling new unit testing capabilities that allow you to evaluate and test your Composition code in complete isolation.

Crossplane已成为许多内部开发者平台(IDPs)的基础。在生产中,任何IDP的要求都是能够有信心地对平台进行更改和升级。 本次演讲将涵盖基于我们在跨多个Crossplane用户构建生产就绪环境的经验,讨论测试和发布模式。 我们将介绍Crossplane Composition升级的生命周期,从本地提交到拉取请求再到目标客户环境,端到端测试工具,处理API更改以及如何控制对客户环境的更新。 相当长一段时间以来,测试Crossplane Compositions意味着完全依赖昂贵的端到端层。在本次演讲中,我们将揭示新的单元测试功能,使您能够在完全隔离的环境中评估和测试您的Composition代码。
Speakers
avatar for Steven Borrelli

Steven Borrelli

Principal Solutions Architect, Upbound
Steven is a Principal Solutions Architect for Upbound, where he helps customers adopt Crossplane.
avatar for Yury Tsarev

Yury Tsarev

Principal Solutions Architect, Upbound
Yury is an experienced software engineer who strongly focuses on open-source, software quality and distributed systems. As the creator of k8gb (https://www.k8gb.io) and active contributor to the Crossplane ecosystem, he frequently speaks at conferences covering topics such as Control... Read More →
Thursday August 22, 2024 13:50 - 14:25 HKT
Level 1 | Hung Hom Room 7
  KubeCon + CloudNativeCon Sessions, Platform Engineering

14:40 HKT

Find Your Own Personal Tutor for the Study of Kubernetes | 为学习Kubernetes找到适合您的个人导师 - Hoon Jo, Megazone
Thursday August 22, 2024 14:40 - 15:15 HKT
Kubernetes novice users ask questions to stackoverflow or community or friends :) when they encounter the problem. However it needs to explain my environment and the background information. Even though it is not a guaranteed answer from someone. Thus I suggest to use K8sGPT with ollama to leverage the lack of knowledge at this moment. Furthermore, k8sGPT provides interactive mode that is able to ask continuing questions until I receive enough answers. Plus it could be helpful to ask other language who is not familiar with English. (Mostly it is big concern from the beginning of the stage) I highly recommend using K8sGPT to study who is a newcomer for soft landing in Kubernetes world.

在KubeCon上,我们将讨论Kubernetes新手用户在遇到问题时通常会向stackoverflow、社区或朋友提问的情况。然而,我们需要解释我的环境和背景信息。虽然并不能保证会得到答案,但我建议使用K8sGPT与ollama来弥补当前知识的不足。此外,k8sGPT提供交互模式,可以持续提问直到我得到足够的答案。此外,对于不熟悉英语的人来说,询问其他语言可能会有所帮助(这在刚开始阶段时是一个大问题)。我强烈推荐使用K8sGPT来帮助新手顺利进入Kubernetes世界。
Speakers
avatar for Hoon Jo

Hoon Jo

Cloud Solutions Architect | Cloud Native Engineer,, Megazone
Hoon Jo is Cloud Solutions Architect as well as Cloud Native engineer at Megazone. He has many times of speaker experience for cloud native technologies. And spread out Cloud Native Ubiquitous in the world. He wrote 『Python for System/Network Administrators』 (Wikibooks, 2017... Read More →
Thursday August 22, 2024 14:40 - 15:15 HKT
Level 1 | Hung Hom Room 2
  KubeCon + CloudNativeCon Sessions, Cloud Native Novice

14:40 HKT

Kelemetry: Global Control Plane Tracing for Kubernetes | Kelemetry:面向Kubernetes控制面的全局追踪系统 - Wei Shao & Jonathan Chan, ByteDance
Thursday August 22, 2024 14:40 - 15:15 HKT
Debugging Kubernetes system issues is complicated: different controllers manipulate objects independently, sometimes triggering changes in other controllers. Unlike traditional RPC-based services, the relationship between components is not explicit; identifying which component causes an issue could be like finding a needle in a haystack. Components expose their own fragmented data, often limited to the lifecycle of a single request and fail to illustrate the bigger picture of asynchronous causal events. This talk introduces Kelemetry, a global tracing system for the Kubernetes control plane using scattered data sources from audit log, events, informers and component traces. Through several demonstrations of troubleshooting online problems, we will see how Kelemetry reveals the state transition of related objects over a long timespan and reconstructs the causal hierarchy of events to provide intuitive insight into the What, When and Why of everything going on in a Kubernetes system.

调试Kubernetes系统问题是复杂的:不同的控制器独立地操作对象,有时会触发其他控制器的变化。与传统的基于RPC的服务不同,组件之间的关系并不明确;确定哪个组件引起了问题就像在一堆草堆中找针一样困难。组件展示它们自己的碎片化数据,通常仅限于单个请求的生命周期,并未展示异步因果事件的整体情况。 本次演讲介绍了Kelemetry,这是一个利用审计日志、事件、通知器和组件跟踪的分散数据源的Kubernetes控制平面全局跟踪系统。通过几次在线问题排查演示,我们将看到Kelemetry如何揭示相关对象在长时间跨度内的状态转换,并重建事件的因果层次结构,以提供对Kubernetes系统中发生的一切的直观洞察。
Speakers
avatar for Wei Shao

Wei Shao

Senior Software Engineer, ByteDance
Wei Shao is a tech lead on the Orchestration & Scheduling team at ByteDance, and a maintainer of KubeWharf projects. Wei has 6+ years of experience in the cloud native area, focusing on resource management and performance-enhanced systems in K8s. Wei led the development of multiple... Read More →
avatar for Jonathan Chan

Jonathan Chan

Software engineer, ByteDance
Jonathan is a software engineer at ByteDance working on Kubernetes related infrastructure such as observability systems and cluster federation. He is also a passionate contributor to a number of open source projects.
Thursday August 22, 2024 14:40 - 15:15 HKT
Level 2 | Grand Ballroom 1-2
  KubeCon + CloudNativeCon Sessions, Observability

14:40 HKT

NanoVisor: Revolutionizing FaaS Cold Start Performance with Secure, Lightweight Container Runtime | NanoVisor:通过安全、轻量级容器运行时改变FaaS冷启动性能 - Tianyu Zhou, Ant Group
Thursday August 22, 2024 14:40 - 15:15 HKT
Function as a Service(FaaS) is booming, but cold start time, the time it takes to create a new container for a function, remains a significant bottleneck. This not only impacts user experience with noticeable delays, but also incurs unnecessary costs due to wasted resources. NanoVisor, a groundbreaking container runtime built on gVisor, tackles the challenge of slow cold start time in FaaS. It achieves this by a series of optimizations specifically designed for FaaS: lightweight containerd interaction for faster setup, read-only filesystem for enhanced efficiency, and a sandbox fork mechanism that replaces the heavy container creation for significant performance gains. These empower NanoVisor to create secure, sandboxed containers ready for function execution within an astonishing 5ms,

Function as a Service(FaaS)正在蓬勃发展,但冷启动时间,即为函数创建新容器所需的时间,仍然是一个重要的瓶颈。这不仅影响用户体验,导致明显的延迟,还因浪费资源而产生不必要的成本。NanoVisor是一种基于gVisor构建的开创性容器运行时,解决了FaaS中慢冷启动时间的挑战。它通过一系列专为FaaS设计的优化来实现:轻量级的containerd交互以加快设置速度,只读文件系统以提高效率,以及一个替代繁重容器创建的沙箱分叉机制,以获得显著的性能提升。这些优化使NanoVisor能够在惊人的5毫秒内创建安全的、沙箱化的容器,每个实例的内存开销不到1MB,每个节点的QPS为1.5K。它已成功应用于蚂蚁集团的生态系统,包括支付宝云基地和SOFA Function,以及CI/CD加速。
Speakers
avatar for Tianyu Zhou

Tianyu Zhou

System Engineer, Ant Group
Tianyu Zhou, a system engineer at Ant Group. I graduated from Zhejiang University with a master's degree in cyberspace security. My research interests include kernel, system security and container security.
Thursday August 22, 2024 14:40 - 15:15 HKT
Level 1 | Hung Hom Room 7
  KubeCon + CloudNativeCon Sessions, Emerging + Advanced

14:40 HKT

Open Sourcing the Future of Z: Unleashing Innovation on the Mainframe | 开源Z的未来:释放大型机上的创新 - Dong Ma & Chen ji, IBM; Mike Friesenegger, SUSE
Thursday August 22, 2024 14:40 - 15:15 HKT
The IBM Z platform, known for its security, reliability, and high-volume transaction processing, has long been a cornerstone of enterprise computing. However, the traditional closed-source approach to Z development has limited innovation and collaboration. This talk explores the growing movement towards open-source software for Z, examining the technical and strategic considerations. Discuss the challenges of the closed-source model for Z, highlight successful examples of open mainframe projects like Feilong project. Discuss technical challenges of developing open-source software for Z, offering potential solutions and strategies to overcome these hurdles. Discuss the benefits of open-source Z development for developers.

IBM Z平台以其安全性,可靠性和高交易处理量而闻名,长期以来一直是企业计算的支柱。然而,传统的封闭源码开发方式限制了创新和合作。本次演讲探讨了向Z开放源码软件的不断发展,审视了技术和战略考虑因素。讨论了Z封闭源码模型的挑战,重点介绍了像Feilong项目这样的开源主机项目的成功案例。讨论了为Z开发开源软件的技术挑战,提供了潜在解决方案和克服这些障碍的策略。讨论了开源Z开发对开发人员的好处。
Speakers
avatar for Dong Ma

Dong Ma

Software Engineer, IBM
Dong Ma is a Software Engineer at IBM, Open Mainframe Project and CD Foundation Ambassador. He now works on IBM Cloud Infrastructure Center, offering on-premises cloud deployments on the IBM Z and IBM LinuxONE platforms. He’s been an active technical contributor to OpenStack since... Read More →
avatar for ji chen

ji chen

IBM Senior Technical Staff Member, IBM
Ji Chen is a software architect working on zSystem and LinuxONE platform in IBM ,contribute to various CNCF projects such as Kepler, Cluster-API, Cloud Provider etc
avatar for Mike Friesenegger

Mike Friesenegger

Solutions Architect, SUSE
Mike is a solutions architect in the SUSE Integrated Solutions team. He works closely with a number of key hardware partners to identify, test and document joint solutions that help SUSE create unique value in the marketplace. One of his specialties include Linux on IBM Z and LinuxONE... Read More →
Thursday August 22, 2024 14:40 - 15:15 HKT
Level 1 | Hung Hom Room 5

15:35 HKT

OpAMP: Scaling OpenTelemetry with Flexibility | OpAMP:灵活扩展OpenTelemetry - Husni Alhamdani, Censhare & Herbert Sianturi, Krom Bank
Thursday August 22, 2024 15:35 - 16:10 HKT
In this session, we will delve into how OpAMP (Open Agent Management Protocol) revolutionizes the management of large fleets of data collection Agents and its pivotal role in scaling OpenTelemetry deployments with unparalleled flexibility. Discover how OpAMP empowers organizations to remotely manage diverse Agents, irrespective of vendor, through its vendor-agnostic protocol. Learn how OpAMP facilitates status reporting, telemetry reporting, centralized management, allowing for tailored configurations and efficient monitoring of individual Agents or types of Agents, management of downloadable Agent-specific packages, and robust connection credentials management. Join us to unleash the potential of OpAMP and revolutionize your OpenTelemetry scalability strategy.

在这场演讲中,我们将深入探讨OpAMP(开放式代理管理协议)如何革新大规模数据收集代理的管理,并在扩展OpenTelemetry部署中发挥关键作用,具有无与伦比的灵活性。 发现OpAMP如何赋予组织远程管理各种代理的能力,无论供应商如何,通过其供应商无关的协议。了解OpAMP如何促进状态报告、遥测报告、集中管理,允许定制配置和有效监控单个代理或代理类型,管理可下载的特定代理软件包,以及强大的连接凭证管理。 加入我们,释放OpAMP的潜力,革新您的OpenTelemetry可扩展性策略。
Speakers
avatar for Husni Alhamdani

Husni Alhamdani

Senior Site Reliability Engineer, Censhare
Husni is a CNCF Ambassador, and a Site Reliability Engineer at Censhare, where he is responsible for building and maintaining infrastructure platforms. In addition to these responsibilities, he primarily focuses on architecting Cloud-Native solutions. He also graduated from the LFX... Read More →
avatar for Herbert Sianturi

Herbert Sianturi

Senior DevOps Engineer, Krom Bank
Herbert Sianturi serves as a Senior DevOps Engineer at Krom Bank Indonesia, where he roles spearheads efforts in enhancing the quality of end-to-end application lifecycle and applying open source platform as a base. With years of expertise in container orchestration and cloud computing... Read More →
Thursday August 22, 2024 15:35 - 16:10 HKT
Level 1 | Hung Hom Room 6
  KubeCon + CloudNativeCon Sessions, Observability

15:35 HKT

Optimize and Accelerate Cloud AI Infrastructure with Autoscaling | 通过自动缩放优化和加速云AI基础设施 - Yuan Mo, Alibaba Cloud
Thursday August 22, 2024 15:35 - 16:10 HKT
With the rise of generative AI technology, more and more applications are starting to integrate with the capabilities of generative AI. However, the high costs of training and inference can be daunting for developers. In this talk, we will discuss the issues and solutions that need additional consideration when using elastic scaling in generative AI scenarios, including: ● How to enhance the elastic startup efficiency of generative AI ● How to address the efficiency of inference when separating compute and storage in generative AI ● How to reduce the costs of training and inference ● How to solve the interruption problem in AI training scenarios using Spot instances ● How to address the issue of capacity elasticity in LLM scenarios Finally, we will introduce the practical experience of the world's leading generative AI service provider: HaiYi (seaart.ai), allowing more developers to understand the architectural methods of elastic cloud AI infrastructure.

随着生成式人工智能技术的兴起,越来越多的应用程序开始与生成式人工智能的能力集成。然而,训练和推理的高成本可能会让开发人员望而却步。在这次演讲中,我们将讨论在生成式人工智能场景中使用弹性扩展时需要额外考虑的问题和解决方案,包括: ● 如何提高生成式人工智能的弹性启动效率 ● 如何在生成式人工智能中分离计算和存储时解决推理效率的问题 ● 如何降低训练和推理的成本 ● 如何使用Spot实例解决AI训练场景中的中断问题 ● 如何解决LLM场景中的容量弹性问题 最后,我们将介绍世界领先的生成式人工智能服务提供商海艺(seaart.ai)的实际经验,让更多开发人员了解弹性云AI基础设施的架构方法。
Speakers
avatar for Yuan Mo

Yuan Mo

Staff Engineer, Alibaba Cloud
Senior technical expert at Alibaba Cloud, the maintainer of the Kubernetes elastic component autoscaler, the founder of the cloud-native gaming community and OpenKruiseGame, and has given several talks at kubecon before. Focus on the cloud-native transformation of the gaming industry... Read More →
Thursday August 22, 2024 15:35 - 16:10 HKT
Level 1 | Hung Hom Room 7
  KubeCon + CloudNativeCon Sessions, Platform Engineering

15:35 HKT

Revolutionizing Scientific Simulations with Argo Workflows | 用Argo工作流彻底改变科学模拟 - ShaungKun Tian, Alibaba Cloud & 建翔 孙, 北京深势科技有限公司
Thursday August 22, 2024 15:35 - 16:10 HKT
DP Technology provides scientific simulation platforms for research in biomedicine, energy, materials and other industries. Science simulation workflows are inherently complex and resource-intensive, and manual deployment is often prone to errors. After adopting Argo workflows to orchestrate science simulation, we get productivity 100% improvement. In this talk, we will introduce why chose Argo Workflow, how to orchestrate large-scale tasks of science simulation, how to make whole system scalability and reliability. Specially, we will share best practice about how manage super large workflow (thousands of tasks), how to do reasonable workflow retry, how to use memorization to reduce runtime and compute cost, how to interact with HPC systems. We also made contributions to Argo community to enhance functionalities and improve reliability. Additionally, we'll introduce DFlow, our open-source Python SDK designed for the seamless orchestration of scientific simulations with Argo Workflows.

DP Technology为生物医药、能源、材料等行业的研究提供科学模拟平台。科学模拟工作流程本质上复杂且资源密集,手动部署往往容易出错。采用Argo工作流程来编排科学模拟后,我们的生产力提高了100%。在本次演讲中,我们将介绍为什么选择Argo工作流程,如何编排大规模科学模拟任务,如何实现整个系统的可扩展性和可靠性。特别是,我们将分享如何管理超大型工作流程(数千个任务),如何合理重试工作流程,如何使用记忆化来减少运行时间和计算成本,如何与HPC系统交互。我们还为Argo社区做出了贡献,以增强功能性和提高可靠性。此外,我们还将介绍DFlow,我们的开源Python SDK,旨在与Argo工作流程无缝协同编排科学模拟。
Speakers
avatar for 建翔 孙

建翔 孙

软件工程师, 北京深势科技有限公司
I once built a machine learning platform at Kuaishou, and currently, I am involved in scheduling scientific computing tasks at DP Technology, as well as constructing workflow platforms. I specialize in the field of cloud-native development.
Thursday August 22, 2024 15:35 - 16:10 HKT
Level 1 | Hung Hom Room 1
  KubeCon + CloudNativeCon Sessions, Platform Engineering

15:35 HKT

Phippy’s Field Guide to Wasm | Phippy的Wasm指南 - Karen Chu, Fermyon & Matt Butcher, Fermyon Technologies
Thursday August 22, 2024 15:35 - 16:10 HKT
The creators of the original Illustrated Children’s Guide to Kubernetes have written a fourth book, this time focused on the emerging technology that is WebAssembly, one of the fastest growing cloud native trends. As with previous books, we broach a complex technical topic with a fun and friendly format designed for all skill levels. On their camping trip with Blossom the Wasm Possum, Phippy and Zee’s adventures illustrate the basics of Wasm, introduce key terminology, and frame how it compliments existing cloud technologies like containers and Kubernetes. In the first half of the talk, we will do a reading of the book in Mandarin. We will then follow up (in English) with a technical overview of Wasm, latest updates to the ecosystem, and details on where to find the community.

原《插图儿童 Kubernetes 指南》的创作者们已经写了第四本书,这次的焦点是新兴技术 WebAssembly,这是增长最快的云原生趋势之一。与之前的书籍一样,我们以有趣友好的格式涉及复杂的技术主题,适合各种技能水平的读者。在他们与 Wasm 负鼠 Blossom 一起露营的旅行中,Phippy 和 Zee 的冒险展示了 Wasm 的基础知识,介绍了关键术语,并阐述了它如何与容器和 Kubernetes 等现有云技术相辅相成。在讲座的前半部分,我们将用普通话朗读这本书。然后我们将用英语进行技术概述,介绍 Wasm 生态系统的最新更新,并详细介绍社区的位置。
Speakers
avatar for Karen Chu

Karen Chu

Head of Community, Fermyon
Karen Chu is the Head of Community at Fermyon Technologies. Having participated in the cloud native community since 2015, she is a CNCF Ambassador, Helm community manager/maintainer, emeritus Kubernetes Code of Conduct Committee member, meet-up organizer, and conference organizer... Read More →
avatar for Matt Butcher

Matt Butcher

CEO, Fermyon Technologies
Matt Butcher (CEO) is a founder of Fermyon. He is one of the original creators of Helm, Brigade, CNAB, OAM, Glide, and Krustlet. He has written or co-written many books, including "Learning Helm" and "Go in Practice." He is a co-creator of the "Illustrated Children’s Guide to Kubernetes... Read More →
Thursday August 22, 2024 15:35 - 16:10 HKT
Level 1 | Hung Hom Room 5

16:25 HKT

Effortless Scalability: Orchestrating Large Language Model Inference with Kubernetes | 无缝扩展性:使用Kubernetes编排大型语言模型推理 - Joinal Ahmed & Nirav Kumar, Navatech Group
Thursday August 22, 2024 16:25 - 17:00 HKT
In the dynamic landscape of AI/ML, deploying and orchestrating large open-source inference models on Kubernetes has become paramount. This talk delves into the intricacies of automating the deployment of heavyweight models like Falcon and Llama 2, leveraging Kubernetes Custom Resource Definitions (CRDs) to manage large model files seamlessly through container images. The deployment is streamlined with an HTTP server facilitating inference calls using the model library. This session will explore eliminating manual tuning of deployment parameters to fit GPU hardware by providing preset configurations. Learn how to auto-provision GPU nodes based on specific model requirements, ensuring optimal utilization of resources. We'll discuss empowering users to deploy their containerized models effortlessly by allowing them to provide a pod template in the workspace custom resource inference field. The controller dynamically, in turn, creates deployment workloads utilizing all GPU nodes.

在AI/ML不断发展的领域中,在Kubernetes上部署和编排大型开源推理模型变得至关重要。本次演讲将深入探讨自动化部署像Falcon和Llama 2这样的重型模型的复杂性,利用Kubernetes自定义资源定义(CRDs)通过容器镜像无缝管理大型模型文件。部署通过HTTP服务器简化,以便使用模型库进行推理调用。 本场演讲将探讨通过提供预设配置来消除手动调整部署参数以适应GPU硬件的需求。了解如何根据特定模型要求自动配置GPU节点,确保资源的最佳利用。我们将讨论如何赋予用户轻松部署其容器化模型的能力,允许他们在工作区自定义资源推理字段中提供一个pod模板。控制器动态地创建部署工作负载,利用所有GPU节点。
Speakers
avatar for Joinal Ahmed

Joinal Ahmed

AI Architect, Navatech Group
Joinal is a seasoned Data Science expert passionate about rapid prototyping, community involvement, and driving technology adoption. With a robust technical background, he excels in leading diverse teams through ML projects, recruiting and mentoring talent, optimizing workflows, and... Read More →
avatar for Nirav Kumar

Nirav Kumar

Head of AI and Engineering, Navatech Group
Nirav Kumar is a leader in the field of Artificial Intelligence with over 13 years of experience in data science and machine learning. As Head of AI and Engineering at Navatech Group, he spearheads cutting-edge research and development initiatives aimed at pushing the boundaries of... Read More →
Thursday August 22, 2024 16:25 - 17:00 HKT
Level 1 | Hung Hom Room 3

16:25 HKT

Observability Supercharger: Build the Traffic Topology Map for Millions of Containers with Zero Code | 可观测性超级增强器:使用零代码为数百万个容器构建流量拓扑图 - Sheng Wei & Teck Chuan Lim, Shopee
Thursday August 22, 2024 16:25 - 17:00 HKT
Kubernetes makes container orchestration and management simple and easy. However, with the surge of applications and middleware onboard Kubernetes, it is difficult to analyze and identify the relationship and dependencies between huge amounts of services and middleware. The most general way requires the business side to make code changes to expose more information, which is impossible to cover for all applications. In this session, we will share: * How does Shopee leverage eBPF to build a universal map for a million containers in production environments? * How do we implement distributed tracing for arbitrary third-party middleware with different protocols and usage patterns? * How do we optimize eBPF code and Linux Kernel to minimize the impacts for injected containers? * How did we integrate with BigData and AI Stack to fully utilize the data for abnormal detection and incident troubleshooting?

Kubernetes使容器编排和管理变得简单易行。然而,随着应用程序和中间件在Kubernetes上的激增,分析和识别大量服务和中间件之间的关系和依赖关系变得困难。最常见的方法需要业务方进行代码更改以公开更多信息,这对所有应用程序来说是不可能覆盖的。 在本场演讲中,我们将分享: *Shopee如何利用eBPF在生产环境中为百万个容器构建通用映射? *我们如何为具有不同协议和使用模式的任意第三方中间件实现分布式跟踪? *我们如何优化eBPF代码和Linux内核以最小化对注入容器的影响? *我们如何与大数据和人工智能堆栈集成,充分利用数据进行异常检测和故障排除?
Speakers
avatar for Teck Chuan Lim

Teck Chuan Lim

Engineer, Shopee
Been working with Shopee since graduation in 2018. I am a long standing core team member of the engineering infrastructure team and took charge to drive Shopee's engineering infrastructure ecosystem from DevOps to DataOps. As of the moment, I am taking charge to drive forward towards... Read More →
Thursday August 22, 2024 16:25 - 17:00 HKT
Level 2 | Grand Ballroom 1-2
  KubeCon + CloudNativeCon Sessions, Observability

16:25 HKT

The Two Sides of the Kubernetes Enhancement Proposals (KEPs) | Kubernetes Enhancement Proposals(KEPs)的两面性 - Rayan Das, OneTrust LLC & Sreeram Venkitesh, BigBinary
Thursday August 22, 2024 16:25 - 17:00 HKT
Kubernetes Enhancement Proposals (KEPs) are pivotal in proposing, communicating, and coordinating new efforts within the Kubernetes project. As members of the Release Team (the team responsible for releasing the next version of Kubernetes) especially Enhancements Team under SIG-Release, we play a vital role in maintaining the active status of enhancements and facilitating communication between stakeholders, be it a deprecation or a feature update. In this talk, we look at the KEP lifecycle from the perspective of the release team, exploring the process (enhancements freeze, code freeze, and the exception process), major themes, and more. Additionally, we will discuss the developer's viewpoint on KEPs, highlighting the process, deadlines, and best practices for proposing, reviewing, and implementing KEPs effectively. Join us to know how KEPs drive innovation and collaboration within the Kubernetes community, empowering contributors to shape the future of Kubernetes development.

Kubernetes Enhancement Proposals(KEPs)在Kubernetes项目中提出、沟通和协调新工作方面起着关键作用。 作为发布团队的成员(负责发布下一个版本的Kubernetes的团队),特别是在SIG-Release下的Enhancements团队,我们在维护增强功能的活跃状态和促进利益相关者之间的沟通方面发挥着重要作用,无论是废弃还是功能更新。 在这次演讲中,我们将从发布团队的角度看待KEP的生命周期,探讨过程(增强功能冻结、代码冻结和异常处理过程)、主要主题等。此外,我们还将讨论开发人员对KEP的观点,重点介绍提出、审查和有效实施KEP的过程、截止日期和最佳实践。 加入我们,了解KEP如何推动Kubernetes社区内的创新和协作,赋予贡献者塑造Kubernetes开发未来的能力。
Speakers
avatar for Rayan Das

Rayan Das

Senior Site Reliability Engineer, OneTrust LLC
As a Senior Site Reliability Engineer, I devote my expertise to work on the infrastructure of OneTrust Privacy Software. Within the Kubernetes community, I've served as the SIG-Release Enhancement Shadow for Kubernetes v1.29, I applied for release shadow for v1.31 as well. Beyond... Read More →
avatar for Sreeram Venkitesh

Sreeram Venkitesh

Software Engineer, BigBinary
Sreeram Venkitesh is a Software Engineer at BigBinary and is an active contributor to Kubernetes. He is active in the Kubernetes release team, where he served as a shadow in the enhancements team from v1.29-v1.30 and is the enhancements sub-team lead for v1.31. He also helps write... Read More →
Thursday August 22, 2024 16:25 - 17:00 HKT
Level 1 | Hung Hom Room 7
  KubeCon + CloudNativeCon Sessions, Cloud Native Experience

16:25 HKT

Enforceable Supply Chain Security Policy with OPA Gatekeeper and Ratify | 通过OPA Gatekeeper和Ratify执行可强制执行的供应链安全策略 - Feynman Zhou, Microsoft & Dahu Kuang, Alibaba Cloud
Thursday August 22, 2024 16:25 - 17:00 HKT
Container supply chain threats are on the rise; to mitigate these threats, enterprises and open-source project maintainers are exploring new safeguards. Signing and verifying images, enforcing policies to block untrusted deployment, generating SBOM, provenance attestation, and vulnerability scanning are ways to keep attackers from compromising software. To safeguard the software supply chain with Gatekeeper policy, we built Ratify for Gatekeeper which acts as an external data provider and returns verification data that can be processed by Gatekeeper. Ratify as a verification engine enables users to enforce security policies through the verification of image signature, vulnerability reports and SBOM. We’ll demonstrate how you can establish trust for container images by enforcing security policies with Gatekeeper and Ratify. You can admit for deployment only the images that comply with your admission control policy, resulting in a more trustworthy container supply chain.

容器供应链威胁正在上升;为了减轻这些威胁,企业和开源项目维护者正在探索新的保障措施。签名和验证图像、强制执行政策以阻止不受信任的部署、生成SBOM、来源验证和漏洞扫描是防止攻击者损害软件的方法。 为了通过Gatekeeper策略保护软件供应链,我们为Gatekeeper构建了Ratify,它作为外部数据提供者返回验证数据,Gatekeeper可以处理这些数据。 Ratify作为验证引擎,使用户能够通过验证图像签名、漏洞报告和SBOM来执行安全策略。 我们将演示如何通过Gatekeeper和Ratify强制执行安全策略来建立对容器图像的信任。您可以仅允许符合入场控制策略的图像进行部署,从而实现更可信赖的容器供应链。
Speakers
avatar for Feynman Zhou

Feynman Zhou

Product Manager, Microsoft
Feynman is a product manager for Microsoft Azure. He is also a maintainer of the CNCF Notary Project, ORAS, and Ratify. Feynman has been contributing to multiple CNCF projects for six years and now focusing on the software supply chain security area. Feynman is also a writer, a public... Read More →
Thursday August 22, 2024 16:25 - 17:00 HKT
Level 1 | Hung Hom Room 5
  Open Source Summit Sessions, Supply Chain Security

17:15 HKT

Navigating the Ethical Horizon: Pioneering Responsible AI with the Generative AI Commons | 穿越伦理地平线:与生成式AI共同开创负责任的AI - Anni Lai, Futurewei
Thursday August 22, 2024 17:15 - 17:50 HKT
Join me to explore Responsible AI's vital role in shaping technology ethically. We'll navigate ethical dilemmas and societal impacts, emphasizing the urgency for frameworks prioritizing human well-being. At the core is the Responsible AI Framework by Generative AI Commons, guiding developers, researchers, and policymakers. Through transparency, fairness, accountability, and inclusivity, it empowers stakeholders to uphold ethical standards across the AI lifecycle. Let's journey towards an AI-powered future that's not just innovative but also ethically responsible.

加入我,探索负责任人工智能在塑造技术道德方面的重要作用。我们将探讨伦理困境和社会影响,强调制定以人类福祉为重点的框架的紧迫性。核心是生成式人工智能共同体的负责任人工智能框架,指导开发人员、研究人员和政策制定者。通过透明度、公平性、问责制和包容性,它赋予利益相关者在整个人工智能生命周期中维护伦理标准的能力。让我们一起走向一个不仅创新而且道德负责的人工智能驱动的未来。
Speakers
avatar for Anni Lai

Anni Lai

Head of Open Source Operations, Chair of Generative AI Commons, LF AI & Data, Futurewei
Anni drives Futurewei’s open source (O.S.) governance, process, compliance, training, project alignment, and ecosystem building. Anni has a long history of serving on various O.S. boards such as OpenStack Foundation, LF CNCF, LF OCI, LF Edge, and is on the LF OMF board and LF Europe... Read More →
Thursday August 22, 2024 17:15 - 17:50 HKT
Level 1 | Hung Hom Room 3

17:15 HKT

KubeEdge DeepDive: Extending Kubernetes to the Edge with Real-World Industry Use Case | KubeEdge深入探讨:将Kubernetes扩展到边缘,实现真实行业用例 - Yue Bao, Huawei Cloud Computing Technology Co., Ltd. & Hongbing Zhang, DaoCloud
Thursday August 22, 2024 17:15 - 17:50 HKT
In this session, KubeEdge project maintainers will provide an overview of KubeEdge's architecture, explore how KubeEdge with its industry-specific use cases. The session will kick off with a brief introduction to edge computing and its growing importance in IoT and distributed systems. The maintainers will then delve into the core components and architecture of KubeEdge, showcasing how it extends the capabilities of Kubernetes to manage edge computing workloads efficiently. Drawing on a range of industry use cases, including smart cities, industrial IoT, edge AI, robotics, and retail, the maintainers will share success stories and insights from organizations that have deployed KubeEdge in their edge environments, highlighting the tangible benefits and transformational possibilities it offers. The session will provide a detailed introduction to the certified KubeEdge conformance test. The maintainers will also share the advancements in technology and community governance in KubeEdge.

在这场演讲中,KubeEdge项目的维护者将介绍KubeEdge的架构,探讨KubeEdge与行业特定用例的关系。会议将以简要介绍边缘计算及其在物联网和分布式系统中日益重要的作用开始。维护者将深入探讨KubeEdge的核心组件和架构,展示它如何扩展Kubernetes的能力,以有效管理边缘计算工作负载。维护者将借助一系列行业用例,包括智慧城市、工业物联网、边缘人工智能、机器人和零售,分享已在其边缘环境中部署KubeEdge的组织的成功故事和见解,突出其提供的切实利益和变革可能性。会议将详细介绍认证的KubeEdge一致性测试。维护者还将分享KubeEdge技术和社区治理方面的进展。
Speakers
avatar for Yue Bao

Yue Bao

Senior Software Engineer, Huawei Cloud Computing Technology Co., Ltd.
Yue Bao serves as a software engineer of Huawei Cloud. She is now working 100% on open source and the member of KubeEdge maintainers, focusing on lightweight edge and edge api-server for KubeEdge. Before that, Yue worked on Huawei Cloud Intelligent EdgeFabric Service and participated... Read More →
avatar for Hongbing Zhang

Hongbing Zhang

Chief Operating Officer, DaoCloud
Hongbing Zhang is Chief Operating Officer of DaoCloud. He is a veteran in open source areas, he founded IBM China Linux team in 2011 and organized team to make significant contributions in Linux Kernel/openstack/hadoop projects. Now he is focusing on cloud native domain and leading... Read More →
Thursday August 22, 2024 17:15 - 17:50 HKT
Level 1 | Hung Hom Room 2

17:15 HKT

Addressing the #1 Threat to the Web: Authorization | 应对网络的头号威胁:授权 - Jimmy Zelinskie, authzed
Thursday August 22, 2024 17:15 - 17:50 HKT
As more folks deploy cloud-native architectures and technologies, store ever larger amounts of data, and build ever more complex software suites, the complexity required to correctly and securely authorize requests only becomes exponentially more difficult. Broken authorization now tops OWASP's Top 10 Security Risks for Web Apps. Their recommendation? Adopt an ABAC or ReBAC authorization model. This talk establishes the problems with the status quo, explains the core concepts behind ReBAC, and introduces SpiceDB, a widely adopted open source system inspired by the system internally powering Google: Zanzibar.

随着越来越多的人部署云原生架构和技术,存储越来越多的数据,并构建越来越复杂的软件套件,正确和安全地授权请求所需的复杂性变得指数级增加。 破解授权现在已经成为OWASP Web应用程序安全风险前十名之首。他们的建议是采用ABAC或ReBAC授权模型。本次演讲将阐明现状存在的问题,解释ReBAC背后的核心概念,并介绍SpiceDB,这是一个广泛采用的开源系统,受到Google内部系统Zanzibar的启发。
Speakers
avatar for Jimmy Zelinskie

Jimmy Zelinskie

cofounder, authzed
Jimmy Zelinskie is a software engineer and product leader with a goal of democratizing software via open source development. He's currently CPO of authzed where he's focused on bringing hyperscaler best-practices in authorization to the industry at large. At CoreOS, he helped pioneer... Read More →
Thursday August 22, 2024 17:15 - 17:50 HKT
Level 1 | Hung Hom Room 7
  KubeCon + CloudNativeCon Sessions, Security

17:15 HKT

OpenTelemetry Amplified: Full Observability with EBPF-Enabled Distributed Tracing | OpenTelemetry放大:使用eBPF启用的分布式跟踪实现全面的可观测性 - Kai Liu, Alibaba Cloud & Wanqi Yang, Sun Yat
Thursday August 22, 2024 17:15 - 17:50 HKT
Within the cloud-native ecosystem, OpenTelemetry (otel) has established itself as the de facto standard for cross-language and cross-platform observability. By providing comprehensive tracing, metrics, and logging solutions for various programming languages, otel has empowered developers and operators with deep insights into complex systems. In recent years, otel has further expanded its observability frontiers by introducing innovative capabilities in the Linux kernel space using eBPF. However, this innovative journey has encountered new challenges, particularly in reducing the invasiveness in certain programming languages and correlating observability data between kernel and user spaces. This session chronicles Alibaba Cloud’s journey through these challenges. By leveraging eBPF technology, we've pioneered innovative solutions that redefine the landscape of system observability, presenting an integrated, less invasive approach for real-time insights into distributed systems.

在云原生生态系统中,OpenTelemetry(otel)已经成为跨语言和跨平台可观测性的事实标准。通过为各种编程语言提供全面的跟踪、度量和日志解决方案,otel为开发人员和运维人员提供了对复杂系统的深入洞察。近年来,otel通过在Linux内核空间引入eBPF的创新能力,进一步拓展了其可观测性边界。 然而,这种创新之旅遇到了新的挑战,特别是在减少某些编程语言中的侵入性和在内核和用户空间之间相关联可观测性数据方面。 本场演讲将记录阿里云在这些挑战中的旅程。通过利用eBPF技术,我们开创了重新定义系统可观测性景观的创新解决方案,提供了一种集成的、不那么侵入性的方法,实时洞察分布式系统。
Speakers
avatar for Kai Liu

Kai Liu

Senior Software Developer, Alibaba Cloud
Liu Kai, a senior software development engineer in the Cloud Native Observability team of Alibaba Cloud. With years of practical experience and insights in the field of monitoring and observability, Liu Kai continuously delves into the realm of observability solutions, including architectural... Read More →
avatar for Wanqi Yang

Wanqi Yang

Student, Sun Yat-sen University
Wanqi Yang received the B.S. degree in Computer Science and Technology from Sun Yat-Sen University, Guangzhou, China. She is currently working toward the PhD degree in Computer Science and Technology at School of Computer Science and Engineering, Sun Yat-Sen University. Her research... Read More →
Thursday August 22, 2024 17:15 - 17:50 HKT
Level 1 | Hung Hom Room 1
  KubeCon + CloudNativeCon Sessions, Observability

17:15 HKT

Working with Raw Disk Drives in Kubrenetes — YDB's Experience | 在Kubernetes中使用原始磁盘驱动器——YDB的经验 - Ivan Blinkov, YDB
Thursday August 22, 2024 17:15 - 17:50 HKT
YDB is an open-source distributed database management system that, for performance reasons, uses raw disk drives (block devices) to store all data, without any filesystem. It was relatively straightforward to manage such setup in the bare-metal world of the past, but the dynamic nature of cloud-native environments introduced new challenges to keep this performance benefit. In this talk, we'll explore how to leverage Kubernetes and the Operator design pattern to modernize how stateful distributed database clusters are managed without changing the primary approach to how the data is physically stored.

YDB是一个开源的分布式数据库管理系统,为了性能考虑,使用原始磁盘驱动器(块设备)存储所有数据,而不使用任何文件系统。在过去的裸金属世界中管理这样的设置相对比较简单,但云原生环境的动态特性引入了新的挑战,以保持这种性能优势。在这次演讲中,我们将探讨如何利用Kubernetes和运算符设计模式来现代化管理有状态的分布式数据库集群,而不改变数据物理存储的主要方法。
Speakers
avatar for Ivan Blinkov

Ivan Blinkov

VP, Product and Open-Source, YDB
Ivan Blinkov is a seasoned technical leader specializing in data storage and processing. Over the last decade, he was involved in the development of several database management systems, two of which are open-source: ClickHouse in the past and, more recently, YDB.
Thursday August 22, 2024 17:15 - 17:50 HKT
Level 2 | Grand Ballroom 1-2
 
Friday, August 23
 

10:35 HKT

Breaking Boundaries: TACC as an Unified Cloud-Native Infra for AI + HPC | 打破界限:TACC作为AI + HPC统一云原生基础设施 - Peter Pan, DaoCloud & Kaiqiang Xu, Hong Kong University of Science and Technology
Friday August 23, 2024 10:35 - 11:10 HKT
Large AI models are driving significant investment in GPU clusters. Yet, managing these clusters is hard: Slurm-based HPC setups lack of management granularity and stability, while Kubernetes poses usability challenges for AI users. This talk introduces TACC, an AI infra management solution that bridges the advantages of both K8S and Slurm setups. This is a joint-work from computer system researchers at HKUST and leading CNCF contributors at DaoCloud. TACC manages a large-scale cluster at HKUST that supports over 500 active researchers since 2020. In this talk, we share our five-year journey with TACC, covering: * [User Experience] A seamless UI for job submissions and management, supporting both container and Slurm format, all on the same backbone * [Resource Management] Multi-tenant allocation with configurable strategies, using CNCF HAMi and Kueue * [Performance and Scalability] A robust distributed infrastructure with networked storage and RDMA, via CNCF SpiderPool,Fluid...

大型AI模型正在推动GPU集群的重大投资。然而,管理这些集群很困难:基于Slurm的HPC设置缺乏管理粒度和稳定性,而Kubernetes对AI用户存在可用性挑战。 本次演讲介绍了TACC,这是一种AI基础设施管理解决方案,可以结合K8S和Slurm设置的优势。这是香港科技大学的计算机系统研究人员与DaoCloud领先的CNCF贡献者共同合作的成果。 TACC自2020年以来管理着香港科技大学支持超过500名活跃研究人员的大规模集群。在本次演讲中,我们分享了与TACC一起的五年历程,涵盖以下内容: * [用户体验] 无缝的UI界面用于作业提交和管理,支持容器和Slurm格式,均在同一基础上 * [资源管理] 多租户分配与可配置策略,使用CNCF HAMi和Kueue * [性能和可扩展性] 强大的分布式基础设施,具有网络存储和RDMA,通过CNCF SpiderPool,Fluid...
Speakers
avatar for Peter Pan

Peter Pan

VP of R&D Engineering, DaoCloud
├ DaoCloud R&D Engineering VP├ CNCF wg-AI (AI Working-Group) member├ Maintainer of a few CNCF projects (GithubID: panpan0000): CloudTTY, KuBean, HwameiStor├ Public Tech Events:└─ 2023 KubeCon SH Speaker (https://sched.co/1PTFI)└─ 2023 KubeCon EU Program Committee... Read More →
avatar for Kaiqiang Xu

Kaiqiang Xu

Researcher, Hong Kong University of Science and Technology
Hong Kong University of Science and Technology
Friday August 23, 2024 10:35 - 11:10 HKT
Level 1 | Hung Hom Room 3

10:35 HKT

Containerd: Project Update and Deep Dive | Containerd:项目更新和深入探讨 - Akhil Mohan, VMware & Iceber Gu, DaoCloud
Friday August 23, 2024 10:35 - 11:10 HKT
Containerd as a mature, 7-year old project is moving from eight major releases into a new era: containerd 2.0. We’ll dive into all the new exciting features in 2.0, like Sandbox API, Transfer Service and Node Resource Interface, and help users understand what these new features enable for their use case. We’ll also provide an upgrade checklist and highlight changes users need to make before upgrading to the 2.0 release, since the 2.0 release will remove features marked as deprecated in past releases. We’ll also cover new updates on the API go module and the refactoring that went in to make containerd Go client stable. Guidance will be provided on using any supported release to support new Kubernetes releases. We're excited to share the progress of the containerd project. Come join us and ask your containerd questions with the handful of on-site containerd maintainers.

Containerd作为一个成熟的、已有7年历史的项目,正在从八个主要版本迈向一个新时代:containerd 2.0。我们将深入探讨2.0中所有新的令人兴奋的功能,如沙盒API、传输服务和节点资源接口,并帮助用户了解这些新功能为他们的使用案例带来了什么。我们还将提供升级检查表,并强调用户在升级到2.0版本之前需要做出的更改,因为2.0版本将删除在过去版本中标记为弃用的功能。我们还将介绍API go模块的新更新以及为使containerd Go客户端稳定而进行的重构。我们将提供指导,以使用任何支持的版本来支持新的Kubernetes版本。我们很高兴分享containerd项目的进展。快来加入我们,与现场的containerd维护人员一起提出你的containerd问题。
Speakers
avatar for Wei Cai(Iceber Gu)

Wei Cai(Iceber Gu)

Software Engineer, DaoCloud
Senior open source enthusiast, focused on cloud runtime, multi-cloud and WASM. I am a CNCF Ambassador and founded Clusterpedia and promoted it as a CNCF Sandbox project. I also created KasmCloud to promote the integration of WASM with Kubernetes and contribute it to the WasmCloud... Read More →
avatar for Akhil Mohan

Akhil Mohan

Software Engineer, VMware by Broadcom
Akhil works as a Senior Member of Technical Staff at VMware by Broadcom. An active contributor to projects in cloud native and container ecosystem. Akhil is a reviewer for containerd and a maintainer of kubernetes publishing-bot. He works mostly on container runtimes and kubernetes... Read More →
Friday August 23, 2024 10:35 - 11:10 HKT
Level 1 | Hung Hom Room 6

10:35 HKT

A Year in the Life of a Developer in the Era of Developer Portals: Navigating Backstage | 开发者在开发者门户时代的一年生活:导航Backstage - Helen Greul, Spotify
Friday August 23, 2024 10:35 - 11:10 HKT
In today's rapidly evolving landscape of software development, the role of developer portals has become indispensable. This presentation delves into the experiences of developers over the course of a year, exploring the transformative impact of Backstage developer portal on their workflows, collaboration, and overall productivity based on case studies from existing adopters of Backstage. Through a comprehensive exploration of real-world scenarios, this talk offers insights into the daily challenges faced by developers and how Backstage empowers them to overcome these hurdles. From streamlined onboarding processes to simplified access to internal services and documentation, attendees will gain a deeper understanding of the multifaceted benefits that Backstage brings to developer teams. Moreover, we'll discuss best practices for leveraging Backstage to foster a culture of innovation, collaboration, and continuous improvement.

在当今快速发展的软件开发领域,开发者门户的作用变得不可或缺。本次演讲将深入探讨开发者在一年时间内的经验,通过现有Backstage采用者的案例研究,探讨Backstage开发者门户对他们的工作流程、协作和整体生产力的转变影响。 通过对现实场景的全面探讨,本次演讲将为参与者提供洞察开发者面临的日常挑战,以及Backstage如何赋予他们克服这些障碍的能力。从简化入职流程到简化访问内部服务和文档,参与者将更深入地了解Backstage为开发团队带来的多方面好处。此外,我们还将讨论利用Backstage促进创新、协作和持续改进文化的最佳实践。
Speakers
avatar for Helen Greul

Helen Greul

Head of Engineering for Backstage, Spotify
Helen is an engineering leader, speaker and a strong advocate for creating developer ecosystems that empower teams to thrive. Her journey has taken her from hands-on coding to steering engineering and platform teams, providing her with a holistic perspective on the challenges and... Read More →
Friday August 23, 2024 10:35 - 11:10 HKT
Level 1 | Hung Hom Room 1
  KubeCon + CloudNativeCon Sessions, Platform Engineering

10:35 HKT

Deep Dive Into Windows CSI Driver HostProcess Containers | 深入探讨Windows CSI驱动程序HostProcess容器 - Andy Zhang (OSTC) & Weizhi Chen, Microsoft
Friday August 23, 2024 10:35 - 11:10 HKT
Currently, most Windows CSI drivers depend on Windows csi-proxy because various privileged operations cannot be done from a containerized application running on a Windows node. Beginning in Kubernetes 1.23, HostProcess container is supported and it can run directly on the host as a regular process. Switching to HostProcess container deployment will make Windows CSI driver development and deployment easier. This session will cover the history and implementation details of Windows csi-proxy project, why csi-proxy is needed on Windows CSI driver starting in kubernetes 1.18, and why we removed this csi-proxy dependency from Kubernetes 1.26. We will explore the key learnings and gotchas we resolved while migrating Windows CSI driver development from csi-proxy dependent deployment to HostProcess container deployment. After attending this session, you will understand why and how to migrate your Windows applications to gain the benefits of using HostProcess containers.

目前,大多数Windows CSI驱动程序依赖于Windows csi-proxy,因为各种特权操作无法从在Windows节点上运行的容器化应用程序中执行。从Kubernetes 1.23开始,支持HostProcess容器,它可以直接在主机上作为常规进程运行。切换到HostProcess容器部署将使Windows CSI驱动程序的开发和部署变得更加简单。本场演讲将涵盖Windows csi-proxy项目的历史和实施细节,解释为什么从Kubernetes 1.18开始在Windows CSI驱动程序中需要csi-proxy,以及为什么我们在Kubernetes 1.26中删除了这种csi-proxy依赖性。我们将探讨在将Windows CSI驱动程序开发从依赖于csi-proxy的部署迁移到HostProcess容器部署时解决的关键问题和注意事项。参加本场演讲后,您将了解为什么以及如何将您的Windows应用程序迁移到使用HostProcess容器以获得更多好处。
Speakers
avatar for Andy Zhang (OSTC)

Andy Zhang (OSTC)

Principal Software Engineer, Microsoft
Andy Zhang is the storage lead in Azure Kubernetes Service team at Microsoft, maintainer of multiple Kubernetes projects, including Windows csi-proxy project, Azure CSI drivers, SMB, NFS, iSCSI CSI drivers, etc. Andy focuses on improving the experience of using storage in Kuberne... Read More →
avatar for Weizhi Chen

Weizhi Chen

Senior Software Engineer, Microsoft
Work at Microsoft AKS team on Kubernetes. Focus on k8s storage drivers on Azure.
Friday August 23, 2024 10:35 - 11:10 HKT
Level 2 | Grand Ballroom 1-2

10:35 HKT

Empower WebAssembly and Container Both on RISC-V | 在RISC-V上加强WebAssembly和容器 - Tiejun Chen, VMware
Friday August 23, 2024 10:35 - 11:10 HKT
RISC-V has got noticed from many areas apparently. But in the real world there are the existing challenges for running workload on RISC-V based targets. From cloud to edge you can see the trend of deploying workloads on such sandboxed microservice platforms - containers, k8s, etc. Actually the underlying sandbox technologies are also evolving with something new like WebAssembly that's been considered as the future computing. In the real world we start running WebAssembly as an alternative lightweight runtime side-by-side with Containers and VMs. Here we'd like to review if-how we can build this multi-runtime platform on RISC-V where WebAssembly and container coexists. We will enable to deploy {WebAssembly, Docker} to RISC-V Linux running on a real RISC-V target, and further enable other open source utilities to RISC-V Linux distribution in order to help fit workload into WebAssembly and containers on RISC-V for next explore accelerating open software ecosystem on RISC-V.

RISC-V 显然已经引起了许多领域的关注。但在现实世界中,在基于 RISC-V 的目标上运行工作负载存在着现有的挑战。从云端到边缘,您可以看到在这种沙箱化微服务平台上部署工作负载的趋势 - 容器、k8s 等。实际上,底层的沙箱技术也在不断发展,出现了一些新技术,比如被认为是未来计算的 WebAssembly。在现实世界中,我们开始将 WebAssembly 作为一种轻量级运行时的替代方案与容器和虚拟机并存。在这里,我们想要审查如何在 RISC-V 上构建这种多运行时平台,其中 WebAssembly 和容器共存。我们将使 {WebAssembly,Docker} 能够部署到运行在真实 RISC-V 目标上的 RISC-V Linux,并进一步使其他开源实用工具能够适配到 RISC-V Linux 发行版,以帮助将工作负载适配到 RISC-V 上的 WebAssembly 和容器,以便探索加速 RISC-V 上开放软件生态系统的可能性。
Speakers
avatar for Tiejun Chen

Tiejun Chen

Sr. Technical Lead, VMware
Tiejun Chen was Sr. technical leader. He ever worked several tech companies such as VMware, Intel, Wind River Systems and so on, involved in - cloud native, edge computing, ML/AI, RISC-V, WebAssembly, etc. He ever made many presentations at AI.Dev NA 2023, kubecon China 2021, Kube... Read More →
Friday August 23, 2024 10:35 - 11:10 HKT
Level 1 | Hung Hom Room 5

11:25 HKT

LLM's Anywhere: Browser Deployment with Wasm & WebGPU | LLM随处可用:使用Wasm和WebGPU进行浏览器部署 - Joinal Ahmed, Navatech Group & Nikhil Rana, Google Cloud
Friday August 23, 2024 11:25 - 12:00 HKT
In today's interconnected world, deploying and accessing machine learning (ML) models efficiently poses significant challenges. Traditional methods rely on cloud GPU clusters and constant internet connectivity. However, WebAssembly (Wasm) and WebGPU technologies are revolutionizing this landscape. This talk explores leveraging Wasm and WebGPU for deploying Single Layer Models (SLMs) directly within web browsers, eliminating the need for extensive cloud GPU clusters and reducing reliance on constant internet access. We showcase practical examples and discuss how Wasm enables efficient cross-platform ML model execution, while WebGPU optimizes parallel computation within browsers. Join us to discover how this fusion empowers developers and users alike with unprecedented ease and efficiency in browser-based ML, while reducing dependence on centralized cloud infrastructure and internet connectivity constraints.

在当今互联世界中,高效部署和访问机器学习(ML)模型面临着重大挑战。传统方法依赖于云GPU集群和持续的互联网连接。然而,WebAssembly(Wasm)和WebGPU技术正在彻底改变这一局面。本次演讲探讨了如何利用Wasm和WebGPU在Web浏览器中直接部署单层模型(SLMs),消除了对庞大云GPU集群的需求,减少了对持续互联网访问的依赖。我们展示了实际示例,并讨论了Wasm如何实现高效的跨平台ML模型执行,以及WebGPU如何优化浏览器内的并行计算。加入我们,发现这种融合如何赋予开发人员和用户在基于浏览器的ML中前所未有的便利和效率,同时减少对集中式云基础设施和互联网连接的依赖。
Speakers
avatar for Joinal Ahmed

Joinal Ahmed

AI Architect, Navatech Group
Joinal is a seasoned Data Science expert passionate about rapid prototyping, community involvement, and driving technology adoption. With a robust technical background, he excels in leading diverse teams through ML projects, recruiting and mentoring talent, optimizing workflows, and... Read More →
avatar for Nikhil Rana

Nikhil Rana

AI Consultant, Google Cloud
Nikhil is an applied data science professional with over a decade of experience in developing and implementing Machine learning, Deep Learning, and NLP-based solutions for a variety of industries like Finance, FMCG, etc. He is a passionate advocate for the use of data science to solve... Read More →
Friday August 23, 2024 11:25 - 12:00 HKT
Level 1 | Hung Hom Room 3
  KubeCon + CloudNativeCon Sessions, AI + ML

11:25 HKT

New Advances for Cross-Platform AI Applications in Docker | Docker中跨平台AI应用程序的新进展 - Michael Yuan, Second State
Friday August 23, 2024 11:25 - 12:00 HKT
The talk proposes to delve into novel methods for enhancing cross-platform GPU/AI workloads within container ecosystems, with a specific emphasis on Docker's incorporation of the WebGPU standard. This standard empowers containerized applications to utilize host GPUs and additional AI accelerators via a flexible API. Consequently, there's no longer a necessity to construct Docker images tailored to individual GPU vendors and their proprietary drivers. The presentation will feature a demonstration highlighting how the WasmEdge project capitalizes on the WebGPU standard to craft portable LLM inference applications in Rust. Additionally, Docker's seamless management and orchestration of these applications will be showcased.

本次演讲旨在探讨增强容器生态系统中跨平台GPU/AI工作负载的新方法,特别强调Docker对WebGPU标准的整合。该标准使容器化应用程序能够通过灵活的API利用主机GPU和额外的AI加速器。因此,不再需要构建针对个别GPU供应商及其专有驱动程序的Docker镜像。演示将展示WasmEdge项目如何利用WebGPU标准在Rust中创建可移植的LLM推理应用程序。此外,还将展示Docker对这些应用程序的无缝管理和编排能力。
Speakers
avatar for Michael Yuan

Michael Yuan

Product Manager, Second State
Dr. Michael Yuan is a maintainer of WasmEdge Runtime (a project under CNCF) and a co-founder of Second State. He is the author of 5 books on software engineering published by Addison-Wesley, Prentice-Hall, and O'Reilly. Michael is a long-time open-source developer and contributor... Read More →
Friday August 23, 2024 11:25 - 12:00 HKT
Level 1 | Hung Hom Room 2
  KubeCon + CloudNativeCon Sessions, AI + ML

13:20 HKT

Constructing the 10x Efficiency of Cloud-Native AI Infrastructure | 如何让你的 AI 底座效能提升 10 倍? - Peter Pan, DaoCloud & 秋萍 戴, daocloud
Friday August 23, 2024 13:20 - 13:55 HKT
Enterprises keep invested in AI. But once GPU are installed in a data center, a challenge arises: how to construct an "AI cloud" atop bare-metal. Even when K8S is recognized as the foundational infrastructure for AI, But K8S only is merely the initial step. Organizations may face challenges: - Maximizing GPU utilization - Unifying multi-arch accelerators/GPUs (k8s DRA) - Organization quotas and cost management - Resource isolation among organizations - Smarter scheduling, tiered GPU allocation, task prioritization.. - Sharing GPU clusters between VMs & containers - Harnessing the full potential of high-speed networks , Storage optimization and dataset orchestration Leveraging open source stacks in Linux Foundation and CNCF, we've experience in building AI clouds for IDC or internal usage. We can share experiences to empower communities' journey towards constructing the 10x efficiency of cloud-native AI. Refer to `Additional resources` chapter for more details

企业继续投资于人工智能。但是一旦在数据中心安装了GPU,就会面临一个挑战:如何在裸金属之上构建一个“AI云”。即使K8S被认为是AI的基础基础设施,但K8S只是一个起步。 组织可能面临的挑战包括: - 最大化GPU利用率 - 统一多架构加速器/GPU(k8s DRA) - 组织配额和成本管理 - 组织之间的资源隔离 - 更智能的调度,分层GPU分配,任务优先级... - 在虚拟机和容器之间共享GPU集群 - 充分利用高速网络的潜力,优化存储和数据集编排 利用Linux基金会和CNCF中的开源堆栈,我们在为IDC或内部使用构建AI云方面有经验。我们可以分享经验,以赋予社区构建云原生AI的效率提升10倍的旅程。 有关更多详细信息,请参考“附加资源”章节。
Speakers
avatar for Peter Pan

Peter Pan

VP of R&D Engineering, DaoCloud
├ DaoCloud R&D Engineering VP├ CNCF wg-AI (AI Working-Group) member├ Maintainer of a few CNCF projects (GithubID: panpan0000): CloudTTY, KuBean, HwameiStor├ Public Tech Events:└─ 2023 KubeCon SH Speaker (https://sched.co/1PTFI)└─ 2023 KubeCon EU Program Committee... Read More →
avatar for 秋萍 戴

秋萍 戴

product mananger, daocloud
QiuPing Dai is a senior Technology Product Manager at DaoCloud for 5 years and involved in Cloud Computing ( including Kubernetes Computing, Storage, Network) development work. Before that, Qiuping worked at IBM for Cloud Computing. QiuPing is interested in Storage, Network , Scheduling... Read More →
Friday August 23, 2024 13:20 - 13:55 HKT
Level 1 | Hung Hom Room 2

13:20 HKT

Write Once Run Anywhere, but for GPUs | GPU 时代的“一次编写,到处运行” - Michael Yuan, Second State
Friday August 23, 2024 13:20 - 13:55 HKT
With the popularity of LLM apps, there is an increasing demand for running and scaling AI workloads in the cloud and on edge devices. Rust and Wasm offer a solution by providing a portable bytecode that abstracts hardware complexities. LlamaEdge is a lightweight, high-performance and cross-platform LLM inference runtime. Written in Rust and built on WasmEdge, LlamaEdge provides a standard WASI-NN API to developers. Developers only need to write against the API and compile to Wasm. The Wasm file can run on any device, where WasmEdge translates and routes Wasm calls to the underlying native libraries such as llama.cpp. This talk will discuss the design and implementation of LlamaEdge and show how it enables cross-platform LLM app development and deployment. We will also walk through several code examples from a basic sentence completion app, to a chat bot, to an RAG agent app with external knowledge in vector databases, to a Kubernetes managed app across a heterogeneous cluster.

随着LLM应用程序的流行,云端和边缘设备上运行和扩展AI工作负载的需求不断增加。Rust和Wasm通过提供一个抽象硬件复杂性的可移植字节码来提供解决方案。 LlamaEdge是一个轻量级、高性能和跨平台的LLM推理运行时。使用Rust编写,并构建在WasmEdge上,LlamaEdge为开发人员提供了一个标准的WASI-NN API。开发人员只需针对API编写代码并编译为Wasm。Wasm文件可以在任何设备上运行,WasmEdge将Wasm调用转换并路由到底层的本地库,如llama.cpp。 本次演讲将讨论LlamaEdge的设计和实现,并展示它如何实现跨平台的LLM应用程序开发和部署。我们还将从基本的句子补全应用程序、聊天机器人,到具有外部知识的矢量数据库中的RAG代理应用程序,再到跨异构集群的Kubernetes管理应用程序,演示几个代码示例。
Speakers
avatar for Michael Yuan

Michael Yuan

Product Manager, Second State
Dr. Michael Yuan is a maintainer of WasmEdge Runtime (a project under CNCF) and a co-founder of Second State. He is the author of 5 books on software engineering published by Addison-Wesley, Prentice-Hall, and O'Reilly. Michael is a long-time open-source developer and contributor... Read More →
Friday August 23, 2024 13:20 - 13:55 HKT
Level 1 | Hung Hom Room 3

13:20 HKT

Inplace-Update: The Past, Present and Future | 原地更新:过去、现在和未来 - Zhang Zhen & Mingshan Zhao & Yuxing Yuan, Alibaba Cloud
Friday August 23, 2024 13:20 - 13:55 HKT
Inplace-update is a controversial technique that is considered bad practice by many cloud native fans. Nevertheless , Inplace-update is used by many practitioners to containerize stateful apps and to greatly speed up rolling progress of stateless apps. In recent versions of k8s, additional support of Inplace-update had emerges, e.g. volume resizing and vertical pod scaling. It is a pity that these features are not integrated with in-tree workload. OpenKruise integrates inplace-update as one of core features of advance workload, and have accumulated many real use cases. We will share the challenges to implement inplace update in k8s such as enhancing pod lifecycle hooks and support inplace-update triggered by metadata changes. In addition, we will share the in-progress effort to integrate recent k8s enhancement into kruise workload so as to enable more cases for inplace update. Finally, we will discuss about the possibility to bring inplace-update feature in in-tree

Inplace-update 是一种备受争议的技术,被许多云原生爱好者认为是不良实践。然而,许多从业者仍然使用 Inplace-update 来容器化有状态应用程序,并大大加快无状态应用程序的滚动进展。在最近的 k8s 版本中,出现了对 Inplace-update 的额外支持,例如卷调整大小和垂直 Pod 缩放。遗憾的是,这些功能尚未与 in-tree 工作负载集成。 OpenKruise 将 Inplace-update 集成为高级工作负载的核心功能之一,并积累了许多真实用例。我们将分享在 k8s 中实现 Inplace-update 的挑战,例如增强 Pod 生命周期钩子和支持通过元数据更改触发 Inplace-update。此外,我们将分享将最近的 k8s 增强集成到 Kruise 工作负载中的正在进行的努力,以便为 Inplace-update 启用更多案例。最后,我们将讨论将 Inplace-update 功能引入 in-tree 工作负载的可能性。
Speakers
avatar for Zhen Zhang

Zhen Zhang

staff engineer, Alibaba Cloud
Zhen Zhang has been working on the cluster management of software applications. he is driving the new cloud native innovation in Alibaba and focus mainly on the application management domain. He is one of main maintainer in OpenKruise project.
avatar for Mingshan Zhao

Mingshan Zhao

Senior R&D Engineer, Alibaba Cloud
Senior R&D Engineer of AliCloud, Maintainer of OpenKruise community, has long been engaged in the research and development of cloud native, containers, scheduling and other fields; core R&D member of Alibaba's one million container scheduling system, and many years of experience in... Read More →
avatar for Yuxing Yuan

Yuxing Yuan

高级开发工程师, Alibaba Cloud
Cloud-native focus with an AI interest.
Friday August 23, 2024 13:20 - 13:55 HKT
Level 1 | Hung Hom Room 6

13:20 HKT

What if Your System Experiences an Outage? Let's Build a Resilient Systems with Chaos Engineering | 如果您的系统遇到故障怎么办?让我们通过混沌工程构建弹性系统 - NamKyu Park, LitmusChaos
Friday August 23, 2024 13:20 - 13:55 HKT
This session explores how LitmusChaos improves the resilience of cloud-native applications by injecting chaos. It also showcases the streamlined management of chaos engineering software through Backstage. Cloud-native applications can be complex to navigate and secure. Our session will present strategies to identify vulnerabilities using GitOps and monitoring, integrated seamlessly into your system. Learn how Backstage and LitmusChaos can enhance your application's resilience with ease! The session starts with chaos orchestration and analysis using LitmusChaos, followed by a live demo highlighting the utilization of LitmusChaos' Backstage plugin and others like Prometheus and ArgoCD. Learn how these plugins, when integrated with Backstage, effectively manage all components necessary for executing chaos engineering.

本场演讲探讨了LitmusChaos如何通过注入混沌来提高云原生应用程序的弹性。它还展示了通过Backstage简化混沌工程软件的管理。 云原生应用程序可能很复杂,难以导航和保护。我们的会议将介绍使用GitOps和监控来识别漏洞的策略,无缝集成到您的系统中。了解如何使用Backstage和LitmusChaos轻松增强您的应用程序的弹性! 本场演讲从使用LitmusChaos进行混沌编排和分析开始,然后展示了使用LitmusChaos的Backstage插件以及其他插件如Prometheus和ArgoCD的实时演示。了解这些插件与Backstage集成后,如何有效管理执行混沌工程所需的所有组件。
Speakers
avatar for Namkyu Park

Namkyu Park

Maintainer, LitmusChaos
Namkyu Park is a CNCF Ambassador and a Software Developer. He worked at several startups in South Korea. He has completed Linux Foundation Mentorship Programme(LitmusChaos) as a mentee and is currently a mentor and maintainer of LitmusChaos. He has previously spoken at GopherCon Korea... Read More →
Friday August 23, 2024 13:20 - 13:55 HKT
Level 1 | Hung Hom Room 7

13:20 HKT

Java Me Smarter: Unleashing AI Power with Quarkus | Java让我更聪明:用Quarkus释放人工智能的力量 - Daniel Oh, Red Hat
Friday August 23, 2024 13:20 - 13:55 HKT
Feeling stuck in a rut with traditional Java development? This session injects a shot of AI innovation to supercharge your Java skills! Daniel will dive into Quarkus, a modern Java framework perfectly suited for building microservices, and explore how it seamlessly integrates with cutting-edge AI functionalities. Get ready to: - Boost Your Java IQ: Learn how Quarkus streamlines development and empowers you to build scalable, high-performance microservices. - Unleash the AI Powerhouse: Discover how to leverage AI capabilities within your Java applications. Daniel will explore real-world use cases, from intelligent data analysis and machine learning to chatbots and recommendation engines. - AI Made Easy: See how Quarkus simplifies the integration of AI models and services into your Java codebase, making AI development more accessible than ever. - Witness the Future: Uncover the exciting possibilities that emerge when you combine the power of Java with AI.

在传统的Java开发中感到困境?本次会话将为您的Java技能注入一剂AI创新的强心剂!Daniel将深入探讨Quarkus,这是一个现代化的Java框架,非常适合构建微服务,并探索它如何与尖端的AI功能无缝集成。准备好了吗: - 提升您的Java智商:了解Quarkus如何简化开发,让您能够构建可扩展、高性能的微服务。 - 发挥AI强大功能:发现如何在您的Java应用程序中利用AI功能。Daniel将探讨真实的用例,从智能数据分析和机器学习到聊天机器人和推荐引擎。 - AI变得简单:看看Quarkus如何简化AI模型和服务与您的Java代码库的集成,使AI开发变得比以往更加易于访问。 - 见证未来:揭示当您将Java的力量与AI相结合时所产生的令人兴奋的可能性。
Speakers
avatar for Daniel Oh

Daniel Oh

Senior Principal Developer Advocate, Red Hat
Daniel Oh is a Java Champion and Senior Principal Developer Advocate at Red Hat, passionately promoting the development of cloud-native microservices and serverless functions using cloud-native runtimes. As a CNCF ambassador, he actively contributes to various open-source cloud projects... Read More →
Friday August 23, 2024 13:20 - 13:55 HKT
Level 1 | Hung Hom Room 5

14:10 HKT

Unveiling the Future: Nurturing Openness in AI Development | 揭示未来:培育人工智能开放性发展 - Anni Lai, Futurewei & Mer Joyce, Do Big Good LLC
Friday August 23, 2024 14:10 - 14:45 HKT
In the rapidly evolving landscape of AI, the concept of openness emerges as a cornerstone for ethical, accountable, and sustainable development. This talk delves into the significance of fostering openness in AI endeavors, exploring two groundbreaking efforts: the Open Source AI Definition led by the Open Source Initiative (OSI) and Model Openness Framework (MOF) introduced by LF AI & Data Generative AI Commons. Through the lens of the OSI's definition co-design process, we'll navigate the evolving landscape of Open Source AI, deciphering its potential to democratize access to cutting-edge technology while fortifying principles of inclusivity and collaboration. We'll unravel the transformative potential of the MOF to foster transparency and trust in AI models. By elucidating the core tenets of the framework and the definition, we'll illuminate pathways for advancing responsible AI development.

在人工智能快速发展的领域中,开放性的概念成为道德、负责任和可持续发展的基石。本次演讲深入探讨了在人工智能努力中培育开放性的重要性,探索了两项开创性的工作:由开源倡议组织(OSI)领导的开源人工智能定义和LF AI & Data生成人工智能共同体引入的模型开放性框架(MOF)。 通过OSI定义的共同设计过程,我们将探索开源人工智能不断发展的领域,解读其潜力,使人们能够民主化获得尖端技术,同时巩固包容性和合作原则。 我们将揭示MOF在促进人工智能模型透明度和信任方面的转变潜力。通过阐明框架和定义的核心原则,我们将阐明推进负责任人工智能发展的途径。
Speakers
avatar for Anni Lai

Anni Lai

Head of Open Source Operations, Chair of Generative AI Commons, LF AI & Data, Futurewei
Anni drives Futurewei’s open source (O.S.) governance, process, compliance, training, project alignment, and ecosystem building. Anni has a long history of serving on various O.S. boards such as OpenStack Foundation, LF CNCF, LF OCI, LF Edge, and is on the LF OMF board and LF Europe... Read More →
avatar for Mer Joyce

Mer Joyce

Founder, Do Big Good LLC
Mer Joyce (she/her) is the founder of the co-design firm Do Big Good and is the facilitator of the Open Source Initiative's consultative process to co-design the Open Source AI Definition (OSAID). She has over a decade of international experience at the intersection of research, tech... Read More →
Friday August 23, 2024 14:10 - 14:45 HKT
Level 1 | Hung Hom Room 3

14:10 HKT

Developing a Standard Multi-Cluster Inventory API | 开发标准的多集群Inventory API - Zhiying Lin & Chen Yu, Microsoft; Hongcai Ren, Huawei; Di Xu, Xiaohongshu; Jian Qiu, Redhat
Friday August 23, 2024 14:10 - 14:45 HKT
With one year's effort, the kubernetes community has made great progress on final approval of the cluster inventory API project. The project has gained a lot of attention and interest from different companies and open source projects, with many new use cases being explored. This panel discussion brings together maintainers from different multicluster management projects who bootstraps this project. We will share what is cluster inventory API, and how we get there. We will also introduce the ongoing work and emerging use cases on this project, and our vision for the future plan. During the panel discussion, attendees will gain a comprehensive understanding of the use cases, eg, how to support multi-cluster AI workload scheduling using inventory API, and challenges, eg how to migrate a cluster manager tool to another seamlessly. We will shed light on the collaborative efforts to standardize cluster inventory APIs and how it evolves from a small group discussion to the community effort.

经过一年的努力,Kubernetes社区在最终批准集群清单API项目方面取得了巨大进展。该项目受到了不同公司和开源项目的关注和兴趣,许多新的用例正在被探索。本次小组讨论将汇集来自不同多集群管理项目的维护者,他们启动了这个项目。我们将分享什么是集群清单API,以及我们是如何实现的。我们还将介绍该项目的正在进行的工作和新兴用例,以及我们对未来计划的愿景。在小组讨论期间,与会者将全面了解用例,例如如何使用清单API支持多集群AI工作负载调度,以及挑战,例如如何无缝迁移集群管理工具。我们将阐明协作努力以标准化集群清单API,并介绍它是如何从一个小组讨论演变为社区努力的。
Speakers
avatar for Di Xu

Di Xu

Principle Software Engineer, Xiaohongshu
Currently, he serves as a Tech Lead at Xiaohongshu, where he leads a team focused on building a highly reliable and scalable container platform. He is the founder of CNCF Sandbox Project Clusternet. Also, he is a top 50 code contributor in Kubernetes community. He had spoken many... Read More →
avatar for Chen Yu

Chen Yu

Senior Software Engineer, Microsoft
Chen Yu is a senior software engineer at Microsoft with a keen interest in cloud-native computing. He is currently working on Multi-Cluster Kubernetes and contributing to the Fleet project open-sourced by Azure Kubernetes Service.
avatar for Zhiying Lin

Zhiying Lin

PRINCIPAL SOFTWARE ENGINEER, Microsoft
I'm a PRINCIPLE SOFTWARE ENGINEER at micosoft and my main contribution is the Azure Kubernetes Fleet Manager product. I'm one of the main maintainers of open source project Azure/fleet & Azure/fleet-networking.
Friday August 23, 2024 14:10 - 14:45 HKT
Level 1 | Hung Hom Room 1
  KubeCon + CloudNativeCon Sessions, Platform Engineering

14:10 HKT

Opportunities and Challenges of Cloud Native Technology in US Healthtech | 美国健康科技中云原生技术的机遇与挑战 - Katerina Arzhayev, SUSE
Friday August 23, 2024 14:10 - 14:45 HKT
In this session I will share the strategic roadmap for Cloud Native Technology companies eyeing expansion into the intricate US healthcare market. Delving into the multifaceted landscape of American healthcare, the session navigates through its complexities, from the dichotomy of public and private sectors to the nuanced regulatory framework dominated by HIPAA and FDA regulations. By illuminating Cloud Native Technology's transformative potential, particularly in fostering interoperability, enhancing telehealth capabilities, and empowering data analytics, the session showcases how innovation can meet the industry's pressing needs. Moreover, it sheds light on the indispensable considerations for market entry, emphasizing regulatory compliance, trust-building with healthcare stakeholders, and the imperative of market localization. Attendees will be equipped with a strategic playbook to navigate the intricate terrain of US healthtech.

在这场演讲上,我将分享云原生技术公司进军美国复杂医疗市场的战略路线。深入探讨美国医疗保健的多层面景观,本场演讲将引导参与者了解其复杂性,从公共和私营部门的对立到以HIPAA和FDA法规为主导的细致监管框架。通过阐明云原生技术的变革潜力,特别是在促进互操作性、增强远程医疗能力和赋能数据分析方面,本场演讲展示了创新如何满足行业迫切需求。此外,它还揭示了进入市场的不可或缺的考虑因素,强调了监管合规性、与医疗保健利益相关者建立信任以及市场本地化的必要性。与会者将获得一份战略指南,帮助他们在美国医疗科技领域的复杂地形中航行。
Speakers
avatar for Katerina Arzhayev

Katerina Arzhayev

Director of Product Management, Healthcare Edge, SUSE
Katerina Arzhayev is experienced in cross-cultural collaboration and technology strategy. She has a proven track record of driving business results through effective communication and strategic planning. Katerina's expertise lies in making highly complicated topics accessible to non-technical... Read More →
Friday August 23, 2024 14:10 - 14:45 HKT
Level 1 | Hung Hom Room 7
  KubeCon + CloudNativeCon Sessions, Cloud Native Experience

15:15 HKT

Detecting and Overcoming GPU Failures During ML Training | 在ML训练过程中检测和克服GPU故障 - Ganeshkumar Ashokavardhanan, Microsoft & Sarah Belghiti, Wayve
Friday August 23, 2024 15:15 - 15:50 HKT
Scaling ML training demands powerful GPU infrastructure, and as model sizes and training scale increases, GPU failures become an expensive risk. From outright hardware faults to subtle performance degradation, undetected GPU problems can sabotage training jobs, inflating costs and slowing development. This talk dives into GPU failure challenges in the context of ML training, particularly distributed training. We will explore the spectrum of GPU issues, and why even minor performance drops can cripple large jobs. Learn how observability (leveraging tools like NVIDIA DCGM) enables proactive problem detection through GPU health checks. Understand principles of fault-tolerant distributed training to mitigate GPU failure fallout. Drawing on cloud provider and autonomous vehicle company experience, we will share best practices for efficient identification, remediation, and prevention of GPU failures. We will also explore cutting-edge ideas like CRIU and task pre-emption for GPU workloads.

随着模型规模和训练规模的增加,机器学习训练需要强大的GPU基础设施,而GPU故障成为一种昂贵的风险。从硬件故障到性能逐渐下降,未被发现的GPU问题可能会破坏训练任务,增加成本并减缓开发速度。本次演讲将深入探讨在机器学习训练中GPU故障所带来的挑战,特别是在分布式训练中。我们将探讨各种GPU问题的范围,以及为什么即使是轻微的性能下降也可能瘫痪大型任务。 了解如何通过观测性(利用诸如NVIDIA DCGM之类的工具)通过GPU健康检查实现问题的主动检测。了解容错分布式训练的原则,以减轻GPU故障的后果。借鉴云服务提供商和自动驾驶汽车公司的经验,我们将分享高效识别、纠正和预防GPU故障的最佳实践。我们还将探讨像CRIU和任务抢占等尖端想法,以应对GPU工作负载。
Speakers
avatar for Ganeshkumar Ashokavardhanan

Ganeshkumar Ashokavardhanan

Software Engineer, Microsoft
Ganesh is a Software Engineer on the Azure Kubernetes Service team at Microsoft, working on node lifecycle, and is the lead for the GPU workload experience on this kubernetes platform. He collaborates with partners in the ecosystem like NVIDIA to support operator models for machine... Read More →
avatar for Sarah Belghiti

Sarah Belghiti

ML Platform Engineer, Wayve
Sarah Belghiti is an ML Platform Engineer at Wayve, a leading developer of embodied intelligence for autonomous vehicles. She works on the infrastructure, scheduling and monitoring of ML workloads. With GPUs becoming an increasingly scarce resource, her focus has been on building... Read More →
Friday August 23, 2024 15:15 - 15:50 HKT
Level 1 | Hung Hom Room 3

15:15 HKT

No More Runtime Setup! Let's Bundle, Distribute, Deploy, Scale LLMs Seamlessly with Ollama Operator | 无需运行时设置!让我们使用Ollama Operator轻松捆绑、分发、部署、扩展LLMs - Fanshi Zhang, DaoCloud
Friday August 23, 2024 15:15 - 15:50 HKT
Seeking out a way to ship LLMs more seamlessly? Way too complicated to manage, composite, and setup a runtime with Python, C++, CUDA, GPUs when deploying LLMs? Tired of fighting against dependencies, model sizes, syncing deliverable model images across nodes? It's true that people often find it hard to bundle, distribute, deploy, and scale their own LLM workloads, but no worries, here is Ollama Operator, a scheduler, and utilizer for LLM models powered by Modelfile introduced by Ollama. You can now enjoy then unified bundled, runtime powered by llama.cpp with simple lines of CRD definition or the natively included kollama CLI with single command line, bundling, distributing, deploying, scaling of LLMs can never be easily and seamlessly accomplished across OS and environments. Let's dive in and find out what Ollama Operator with Ollama can do to deploy our own large langaugae models, what can we do and combine these features with Modelfile then bring them into the Kubernetes world!

寻找一种更无缝地运输LLM的方式?在部署LLM时,使用Python、C++、CUDA、GPU设置运行时太复杂?厌倦了与依赖、模型大小、在节点间同步可交付模型图像等问题作斗争? 人们常常发现很难捆绑、分发、部署和扩展自己的LLM工作负载,但不用担心,这里有Ollama Operator,一个由Ollama引入的基于Modelfile的LLM模型调度器和利用者。现在,您可以通过简单的CRD定义行或内置的kollama CLI命令行,享受由llama.cpp提供支持的统一捆绑运行时,轻松实现LLM的捆绑、分发、部署和扩展,跨操作系统和环境都可以轻松实现。 让我们深入了解一下Ollama Operator与Ollama能够做些什么来部署我们自己的大型语言模型,我们可以如何结合这些功能与Modelfile,然后将它们带入Kubernetes世界!
Speakers
avatar for Neko Ayaka

Neko Ayaka

Software Engineer, DaoCloud
Cloud native developer, AI researcher, Gopher with 5 years of experience in loads of development fields across AI, data science, backend, frontend. Co-founder of https://github.com/nolebase
Friday August 23, 2024 15:15 - 15:50 HKT
Level 1 | Hung Hom Room 2
  KubeCon + CloudNativeCon Sessions, AI + ML

15:15 HKT

The Bang! - When Bad Things Happen to Your Data | 爆炸!- 当数据出问题时 - Kelvin Mun, Veeam Software
Friday August 23, 2024 15:15 - 15:50 HKT
Imagine the inevitable has already happened—you’ve had a security breach—and you’re now dealing with the aftermath. Organisations must act fast to ensure business returns to operations quickly while also figuring out how to prevent similar incidents in the future. By adopting new use cases, engineering teams are simultaneously accelerating the deployment of sensitive data across multi-cloud architectures and tapping into new risk factors. In this talk, we will use the “Data Security Bang” analogy and learnings from resilience engineering to answer questions such as: How could we do more left of bang (prevention) to help with the speed of right of bang (remediation)? The audience will be guided through a set of example scenarios in a 90s-style game, using Kanister, OPA, and Prometheus, in which they can make decisions on data security to guide the way towards a more robust infrastructure.

想象不可避免的事情已经发生了——您遭遇了安全漏洞——现在您正在处理后果。组织必须迅速采取行动,确保业务迅速恢复运营,同时还要想办法防止将来发生类似事件。通过采用新的用例,工程团队同时加速了跨多云架构部署敏感数据,并利用新的风险因素。 在这次演讲中,我们将使用“数据安全爆炸”的类比和弹性工程的经验教训来回答诸如:我们如何可以在爆炸之前做更多的事情(预防),以帮助加快爆炸之后的速度(补救)?观众将通过90年代风格的游戏中的一系列示例场景,使用Kanister、OPA和Prometheus,来做出关于数据安全的决策,引导通往更健壮基础设施的道路。
Speakers
Friday August 23, 2024 15:15 - 15:50 HKT
Level 1 | Hung Hom Room 5
  Open Source Summit Sessions, Supply Chain Security

16:05 HKT

Boosting LLM Development and Training Efficiency: Automated Parallelization with MindSpore | 提升LLM开发和培训效率:MindSpore自动并行化 - Yufeng Lyu, Huawei Technologies Co., Ltd
Friday August 23, 2024 16:05 - 16:40 HKT
With the popularity of LLM, large-scale pre-training has become an indispensable step in AI research and implementation. However, large-scale distributed parallel training requires developers to consider various factors affecting the efficiency of model development and training, such as partitioning and communication, and then modify the model accordingly. In this presentation, we will demonstrate an automatic parallelization approach that allows developers to focus on algorithm research without the need for intrusive model modifications. Distributed training on a large-scale cluster can be achieved simply by configuring strategies. Developers can also utilize MindSpore's hyperparameter search model to automatically find the best parallelization strategy. The parallel strategy obtained through search can achieve 90%-110% of the expert tuning performance, significantly reducing the time required for model modifications while efficiently accelerating LLM training.

随着LLM的流行,大规模预训练已成为人工智能研究和实施中不可或缺的一步。然而,大规模分布式并行训练需要开发人员考虑各种影响模型开发和训练效率的因素,如分区和通信,然后相应地修改模型。 在本次演示中,我们将展示一种自动并行化方法,使开发人员能够专注于算法研究,而无需进行侵入性的模型修改。通过配置策略,可以简单实现在大规模集群上的分布式训练。开发人员还可以利用MindSpore的超参数搜索模型自动找到最佳的并行化策略。通过搜索获得的并行策略可以实现专家调整性能的90%-110%,显著减少了模型修改所需的时间,同时有效加速LLM的训练。
Speakers
avatar for Yufeng Lyu

Yufeng Lyu

Senior Engineer, Huawei Technologies Co., Ltd
Lyu Yufeng, a technical architect at MindSpore and maintainer of the MindNLP framework, focuses his research on natural language processing and distributed parallelism for LLM. He possesses extensive experience in the development and implementation of LLM solutions.
Friday August 23, 2024 16:05 - 16:40 HKT
Level 1 | Hung Hom Room 3
 

Share Modal

Share this link via

Or copy link

Filter sessions
Apply filters to sessions.