Loading…
Attending this event?
In-person
21-23 August, 2024
Learn More and Register to Attend

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon + Open Source Summit + AI_Dev China 2024 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in Hong Kong Standard Time (UTC +8). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change and session seating is available on a first-come, first-served basis. 

亲临现场
2024年8月21-23日
了解更多并注册参加

Sched应用程序允许您创建自己的日程安排,但不能替代您的活动注册。您必须注册参加KubeCon + CloudNativeCon + Open Source Summit + AI_Dev China 2024,才能参加会议。如果您尚未注册但希望加入我们,请访问活动注册页面购买注册。

请注意:本日程自动显示为香港标准时间(UTC +8)。要查看您偏好的时区的日程,请从右侧“按日期筛选”上方的下拉菜单中选择。日程可能会有变动,会议席位先到先得。
KubeCon + CloudNativeCon Sessions clear filter
Thursday, August 22
 

11:00 HKT

Dollars and PPM's - Carbon Emissions and Cloud Spend | 美元和PPM - 碳排放和云支出 - Bryan Oliver, Thoughtworks
Thursday August 22, 2024 11:00 - 11:35 HKT
Cloud Carbon emissions are unfortunately not the priority of most enterprises. Costs, however, are. In the Cloud Native space, there is an ever-growing list of spend tracking and reduction tools. In this talk, we'll discuss several strategies you can adopt to unify the prioritization of cloud costs and carbon impact. We want to show how you can align with your business goal of simultaneously reducing cloud spend and overall carbon emissions.

云计算的碳排放很可惜并不是大多数企业的首要任务。成本,然而,是。在云原生领域,有越来越多的支出跟踪和降低工具。 在这次讨论中,我们将讨论几种您可以采用的策略,统一云成本和碳影响的优先级。我们希望展示如何与您同时降低云支出和整体碳排放的业务目标保持一致。
Speakers
avatar for Bryan Oliver

Bryan Oliver

Principal, Thoughtworks
Bryan is an experienced engineer and leader who designs and builds complex distributed systems. He has spent his career developing mobile and back-end systems whilst building autonomous teams. More recently he has been focused on delivery and cloud native at Thoughtworks. In his free... Read More →
Thursday August 22, 2024 11:00 - 11:35 HKT
Level 2 | Grand Ballroom 1-2
  KubeCon + CloudNativeCon Sessions, Observability

11:50 HKT

Beyond the Basics: Towards Making Thanos Production-Ready | 超越基础:朝着使Thanos达到生产就绪状态的方向前进 - Benjamin Huo & Junhao Zhang, QingCloud Technologies
Thursday August 22, 2024 11:50 - 12:25 HKT
As one of the most popular and powerful Prometheus long-term storage projects, Thanos is widely adopted by the community. But to use Thanos in production, there are still a lot of day-2 operations that need to be automated. In this talk, KubeSphere maintainers will share their experiences in using and maintaining Thanos in production including: - Kubernetes native definition of all Thanos components - Tenant isolation of ingestion, rule evaluation, compaction - Tenant-based autoscaling mechanism of Thanos Ingester, Ruler, and Compactor - The time-based partition of Thanos store - Tenant-based data lifetime management - The sharding mechanism of the global ruler to handle massive recording rules and alerting rules evaluation workload - The gateway & agent proxy mechanism for read/write with tenant access control - The basic_auth, built-in query UI, and external remote write and query support of the gateway - The tls support between Thanos components - The 3-tier config management

作为最受欢迎和强大的Prometheus长期存储项目之一,Thanos被社区广泛采用。但要在生产环境中使用Thanos,仍然需要自动化许多第二天的运维工作。在这次演讲中,KubeSphere的维护者将分享他们在生产环境中使用和维护Thanos的经验,包括: - 所有Thanos组件的Kubernetes本地定义 - 数据摄入、规则评估、压缩的租户隔离 - 基于租户的Thanos Ingester、Ruler和Compactor的自动扩展机制 - Thanos存储的基于时间的分区 - 基于租户的数据生命周期管理 - 全局规则分片机制,用于处理大量录制规则和警报规则评估工作负载 - 用于读写的网关和代理机制,带有租户访问控制 - 网关的basic_auth、内置查询UI以及外部远程写入和查询支持 - Thanos组件之间的tls支持 - 三层配置管理
Speakers
avatar for Benjamin Huo

Benjamin Huo

Manager of the Architect and Observability Team, QingCloud Technologies, QingCloud Technologies
Benjamin Huo leads QingCloud Technologies' Architect team and Observability Team. He is the founding member of KubeSphere and the co-author of Fluent Operator, Kube-Events, Notification Manager, OpenFunction, and most recently eBPFConductor. He loves cloud-native technologies especially... Read More →
avatar for Junhao Zhang

Junhao Zhang

Senior Software Engineer, QingCloud Technologies
Junhao Zhang, Senior Development Engineer at QingCloud Technologies, is responsible for the research and development of container platform monitoring, alerting, and other cloud-native services. With many years of industry experience, he has previously held positions at companies such... Read More →
Thursday August 22, 2024 11:50 - 12:25 HKT
Level 2 | Grand Ballroom 1-2
  KubeCon + CloudNativeCon Sessions, Observability

13:50 HKT

Implement Auto Instrumentation Under GraalVM Static Compilation on OTel Java Agent | GraalVM 静态编译下 OTel Java Agent 的自动增强方案与实现 - Zihao Rao & Ziyi Lin, Alibaba Cloud
Thursday August 22, 2024 13:50 - 14:25 HKT
GraalVM static compilation has a significant effect on improving Java application startup speed and runtime memory usage. It is very valuable for the Java to flourish in Cloud Native ecosystem. However, the automatic instrumentation originally provided based on Java Agent will become invalid after static compilation. We designed a static instrumentation solution in GraalVM to solve above problem. This speech will introduce the overall design idea of the solution and related test results in OTel Java Agent.

GraalVM静态编译对于提升Java应用的启动速度和运行时内存占用有着显著的效果,对于Java在云生态中的蓬勃发展有着十分宝贵的价值。然而,原本基于Java Agent提供的自动插桩功能在静态编译之后将会失效。针对上述问题我们在GraalVM中设计了静态插桩方案,本演讲将介绍该方案的整体设计思路以及在OTel Java Agent中的相关测试结果。
Speakers
avatar for Zihao Rao

Zihao Rao

Software Engineer, Alibaba Cloud
Zihao is a software engineer at Alibaba Cloud. Over the past few years, he has participated in several well-known open source projects, he is steering committee member of Spring Cloud Alibaba project, and is a triager for OpenTelemetry Java Instrumentation now.
avatar for Ziyi Lin

Ziyi Lin

Senior Software Engineer, Alibaba Cloud
Author of book "Static compilation for Java in GraalVM: the principles and practice". ACM SIGSOFT distinguished paper award winner (ICSE'23). Committor of Apache incubating Teaclave Java TEE SDK(https://github.com/apache/incubator-teaclave-java-tee-sdk). Active contributor of GraalVM(https://github.com/pulls?q=is%3Apr+org%3Aoracle+author%3Aziyilin... Read More →
Thursday August 22, 2024 13:50 - 14:25 HKT
Level 2 | Grand Ballroom 1-2
  KubeCon + CloudNativeCon Sessions, Observability

14:40 HKT

Kelemetry: Global Control Plane Tracing for Kubernetes | Kelemetry:面向Kubernetes控制面的全局追踪系统 - Wei Shao & Jonathan Chan, ByteDance
Thursday August 22, 2024 14:40 - 15:15 HKT
Debugging Kubernetes system issues is complicated: different controllers manipulate objects independently, sometimes triggering changes in other controllers. Unlike traditional RPC-based services, the relationship between components is not explicit; identifying which component causes an issue could be like finding a needle in a haystack. Components expose their own fragmented data, often limited to the lifecycle of a single request and fail to illustrate the bigger picture of asynchronous causal events. This talk introduces Kelemetry, a global tracing system for the Kubernetes control plane using scattered data sources from audit log, events, informers and component traces. Through several demonstrations of troubleshooting online problems, we will see how Kelemetry reveals the state transition of related objects over a long timespan and reconstructs the causal hierarchy of events to provide intuitive insight into the What, When and Why of everything going on in a Kubernetes system.

调试Kubernetes系统问题是复杂的:不同的控制器独立地操作对象,有时会触发其他控制器的变化。与传统的基于RPC的服务不同,组件之间的关系并不明确;确定哪个组件引起了问题就像在一堆草堆中找针一样困难。组件展示它们自己的碎片化数据,通常仅限于单个请求的生命周期,并未展示异步因果事件的整体情况。 本次演讲介绍了Kelemetry,这是一个利用审计日志、事件、通知器和组件跟踪的分散数据源的Kubernetes控制平面全局跟踪系统。通过几次在线问题排查演示,我们将看到Kelemetry如何揭示相关对象在长时间跨度内的状态转换,并重建事件的因果层次结构,以提供对Kubernetes系统中发生的一切的直观洞察。
Speakers
avatar for Wei Shao

Wei Shao

Senior Software Engineer, ByteDance
Wei Shao is a tech lead on the Orchestration & Scheduling team at ByteDance, and a maintainer of KubeWharf projects. Wei has 6+ years of experience in the cloud native area, focusing on resource management and performance-enhanced systems in K8s. Wei led the development of multiple... Read More →
avatar for Jonathan Chan

Jonathan Chan

Software engineer, ByteDance
Jonathan is a software engineer at ByteDance working on Kubernetes related infrastructure such as observability systems and cluster federation. He is also a passionate contributor to a number of open source projects.
Thursday August 22, 2024 14:40 - 15:15 HKT
Level 2 | Grand Ballroom 1-2
  KubeCon + CloudNativeCon Sessions, Observability

15:35 HKT

KubeSkoop: Deal with the Complexity of Network Issues and Monitoring with eBPF | KubeSkoop:使用eBPF处理网络问题和监控的复杂性 - Yutong Li, Alibaba Cloud & Bingshen Wang, AlibabaCloud
Thursday August 22, 2024 15:35 - 16:10 HKT
Troubleshooting network issues has always been one of the most difficult parts, especially on Kubernetes. Containerization and microservice results in a denser network topology and more dependencies on various layers of network stack modules, and the new network technology and architecture introduced by AI also provided a significant challenge in observability and diagnosis. We developed KubeSkoop, the networking monitoring and diagnosis suite for Kubernetes. With the eBPF technology, it provides a deep monitoring and tracing of Kubernetes network, to help users quickly locate the network jitter problem happened in the cluster. It also provides the network connectivity check ability, which can help users solve network connectivity issues by one click. This topic will introduce as follows: ● What makes Kubernetes networking complex. ● Introduction to KubeSkoop. ● How we use eBPF to monitor container networking. ● The practices of KubeSkoop in large-scale production environment.

网络问题的故障排除一直是最困难的部分之一,尤其是在Kubernetes上。容器化和微服务导致了更密集的网络拓扑结构,以及对各个网络堆栈模块的更多依赖,人工智能引入的新网络技术和架构也在可观察性和诊断方面提出了重大挑战。 我们开发了KubeSkoop,这是专为Kubernetes设计的网络监控和诊断套件。利用eBPF技术,它提供了对Kubernetes网络的深度监控和跟踪,帮助用户快速定位集群中发生的网络抖动问题。它还提供了网络连接性检查功能,可以帮助用户通过一键解决网络连接问题。 本主题将介绍以下内容: ● 什么使Kubernetes网络变得复杂。 ● KubeSkoop的介绍。 ● 我们如何使用eBPF来监控容器网络。 ● KubeSkoop在大规模生产环境中的实践。
Speakers
avatar for wang bingshen

wang bingshen

Senior Engineer, AlibabaCloud
Bingshen Wang is a Senior Engineer in Alibaba Could, a maintainer of KubeSkoop/Terway/OpenYurt, and a contributor of Kubernetes/Containerd. He mainly focuses on container networking and runtime, and has many years of experience around managing Alibaba Cloud Kubernetes clusters. He... Read More →
avatar for Tony Li

Tony Li

Software Engineer, Alibaba Cloud
Yutong Li is a Software Engineer at Alibaba Cloud. He is working on designing and maintaining container network for Alibaba Cloud Container Service, and open source Kubernetes networking diagnose tool KubeSkoop.
Thursday August 22, 2024 15:35 - 16:10 HKT
Level 2 | Grand Ballroom 1-2
  KubeCon + CloudNativeCon Sessions, Observability

15:35 HKT

OpAMP: Scaling OpenTelemetry with Flexibility | OpAMP:灵活扩展OpenTelemetry - Husni Alhamdani, Censhare & Herbert Sianturi, Krom Bank
Thursday August 22, 2024 15:35 - 16:10 HKT
In this session, we will delve into how OpAMP (Open Agent Management Protocol) revolutionizes the management of large fleets of data collection Agents and its pivotal role in scaling OpenTelemetry deployments with unparalleled flexibility. Discover how OpAMP empowers organizations to remotely manage diverse Agents, irrespective of vendor, through its vendor-agnostic protocol. Learn how OpAMP facilitates status reporting, telemetry reporting, centralized management, allowing for tailored configurations and efficient monitoring of individual Agents or types of Agents, management of downloadable Agent-specific packages, and robust connection credentials management. Join us to unleash the potential of OpAMP and revolutionize your OpenTelemetry scalability strategy.

在这场演讲中,我们将深入探讨OpAMP(开放式代理管理协议)如何革新大规模数据收集代理的管理,并在扩展OpenTelemetry部署中发挥关键作用,具有无与伦比的灵活性。 发现OpAMP如何赋予组织远程管理各种代理的能力,无论供应商如何,通过其供应商无关的协议。了解OpAMP如何促进状态报告、遥测报告、集中管理,允许定制配置和有效监控单个代理或代理类型,管理可下载的特定代理软件包,以及强大的连接凭证管理。 加入我们,释放OpAMP的潜力,革新您的OpenTelemetry可扩展性策略。
Speakers
avatar for Husni Alhamdani

Husni Alhamdani

Senior Site Reliability Engineer, Censhare
Husni is a CNCF Ambassador, and a Site Reliability Engineer at Censhare, where he is responsible for building and maintaining infrastructure platforms. In addition to these responsibilities, he primarily focuses on architecting Cloud-Native solutions. He also graduated from the LFX... Read More →
avatar for Herbert Sianturi

Herbert Sianturi

Senior DevOps Engineer, Krom Bank
Herbert Sianturi serves as a Senior DevOps Engineer at Krom Bank Indonesia, where he roles spearheads efforts in enhancing the quality of end-to-end application lifecycle and applying open source platform as a base. With years of expertise in container orchestration and cloud computing... Read More →
Thursday August 22, 2024 15:35 - 16:10 HKT
Level 1 | Hung Hom Room 6
  KubeCon + CloudNativeCon Sessions, Observability

16:25 HKT

Observability Supercharger: Build the Traffic Topology Map for Millions of Containers with Zero Code | 可观测性超级增强器:使用零代码为数百万个容器构建流量拓扑图 - Sheng Wei & Teck Chuan Lim, Shopee
Thursday August 22, 2024 16:25 - 17:00 HKT
Kubernetes makes container orchestration and management simple and easy. However, with the surge of applications and middleware onboard Kubernetes, it is difficult to analyze and identify the relationship and dependencies between huge amounts of services and middleware. The most general way requires the business side to make code changes to expose more information, which is impossible to cover for all applications. In this session, we will share: * How does Shopee leverage eBPF to build a universal map for a million containers in production environments? * How do we implement distributed tracing for arbitrary third-party middleware with different protocols and usage patterns? * How do we optimize eBPF code and Linux Kernel to minimize the impacts for injected containers? * How did we integrate with BigData and AI Stack to fully utilize the data for abnormal detection and incident troubleshooting?

Kubernetes使容器编排和管理变得简单易行。然而,随着应用程序和中间件在Kubernetes上的激增,分析和识别大量服务和中间件之间的关系和依赖关系变得困难。最常见的方法需要业务方进行代码更改以公开更多信息,这对所有应用程序来说是不可能覆盖的。 在本场演讲中,我们将分享: *Shopee如何利用eBPF在生产环境中为百万个容器构建通用映射? *我们如何为具有不同协议和使用模式的任意第三方中间件实现分布式跟踪? *我们如何优化eBPF代码和Linux内核以最小化对注入容器的影响? *我们如何与大数据和人工智能堆栈集成,充分利用数据进行异常检测和故障排除?
Speakers
avatar for Teck Chuan Lim

Teck Chuan Lim

Engineer, Shopee
Been working with Shopee since graduation in 2018. I am a long standing core team member of the engineering infrastructure team and took charge to drive Shopee's engineering infrastructure ecosystem from DevOps to DataOps. As of the moment, I am taking charge to drive forward towards... Read More →
Thursday August 22, 2024 16:25 - 17:00 HKT
Level 2 | Grand Ballroom 1-2
  KubeCon + CloudNativeCon Sessions, Observability

16:25 HKT

Uniting Sustainability and Edge Computing: Kepler & Open Horizon on RISC-V and Heterogeneous System | 团结可持续性和边缘计算:Kepler和Open Horizon在RISC-V和异构系统上 - Peng Hui Jiang & David Yao, IBM
Thursday August 22, 2024 16:25 - 17:00 HKT
The dynamic landscape of cloud-edge computing demands solutions to mitigate energy consumption and promote sustainability. Our proposal advocates for the integration of Kepler and Open Horizon with CNCF and LF Edge ecosystem to address diverse hardware requirements in Cloud and Edge deployments, including x86, arm, s390, and the emerging RISC-V architectures. Notably, the Chinese market, characterized by edge devices in manufacturing, retail and surveillance domains, stands to benefit significantly from this initiative. By using Kepler’s sophisticated energy estimation capabilities and Open Horizon’s autonomous workload management features, this proposal endeavors to optimize energy efficiency across heterogeneous edge environments. In the session, we will demonstrate one use case to build and integrate Kepler and Open Horizon to work on RISC-V platform, and monitor and optimize distributed and heterogeneous system to build a greener and more resilient cloud-edge computing paradigm.

云边计算的动态景观需要解决能源消耗问题并促进可持续发展。我们的提案主张将Kepler和Open Horizon与CNCF和LF Edge生态系统整合,以解决云和边缘部署中多样化的硬件需求,包括x86、arm、s390和新兴的RISC-V架构。值得注意的是,中国市场以制造、零售和监控领域的边缘设备为特征,这一举措将使其受益匪浅。通过利用Kepler的先进能源估算能力和Open Horizon的自主工作负载管理功能,本提案旨在优化异构边缘环境的能源效率。 在本场演讲中,我们将演示一个使用案例,展示如何构建和整合Kepler和Open Horizon在RISC-V平台上运行,并监控和优化分布式和异构系统,以构建更环保、更具弹性的云边计算范式。
Speakers
avatar for Peng Hui Jiang

Peng Hui Jiang

Architect, IBM
Peng Hui Jiang is working for IBM as Senior Software Engineer to build and operate Public Cloud services. He has rich experience in Cloud, Database, and Security. He is CNCF Kepler Maintainer and Apache CouchDB committer and Master Inventor in IBM holding more than 200 patents or... Read More →
avatar for 勇 姚

勇 姚

Program Director, IBM Cloud Platform, IBM
David Yao is the Program Director of IBM Cloud Platform in IBM China Development Lab, developing and managing the entire product development lifecycle and team for the dynamic cloud and edge environment. Passionate on learning open technology, building and transforming an open and... Read More →
Thursday August 22, 2024 16:25 - 17:00 HKT
Level 1 | Hung Hom Room 1
  KubeCon + CloudNativeCon Sessions, Observability

17:15 HKT

OpenTelemetry Amplified: Full Observability with EBPF-Enabled Distributed Tracing | OpenTelemetry放大:使用eBPF启用的分布式跟踪实现全面的可观测性 - Kai Liu, Alibaba Cloud & Wanqi Yang, Sun Yat
Thursday August 22, 2024 17:15 - 17:50 HKT
Within the cloud-native ecosystem, OpenTelemetry (otel) has established itself as the de facto standard for cross-language and cross-platform observability. By providing comprehensive tracing, metrics, and logging solutions for various programming languages, otel has empowered developers and operators with deep insights into complex systems. In recent years, otel has further expanded its observability frontiers by introducing innovative capabilities in the Linux kernel space using eBPF. However, this innovative journey has encountered new challenges, particularly in reducing the invasiveness in certain programming languages and correlating observability data between kernel and user spaces. This session chronicles Alibaba Cloud’s journey through these challenges. By leveraging eBPF technology, we've pioneered innovative solutions that redefine the landscape of system observability, presenting an integrated, less invasive approach for real-time insights into distributed systems.

在云原生生态系统中,OpenTelemetry(otel)已经成为跨语言和跨平台可观测性的事实标准。通过为各种编程语言提供全面的跟踪、度量和日志解决方案,otel为开发人员和运维人员提供了对复杂系统的深入洞察。近年来,otel通过在Linux内核空间引入eBPF的创新能力,进一步拓展了其可观测性边界。 然而,这种创新之旅遇到了新的挑战,特别是在减少某些编程语言中的侵入性和在内核和用户空间之间相关联可观测性数据方面。 本场演讲将记录阿里云在这些挑战中的旅程。通过利用eBPF技术,我们开创了重新定义系统可观测性景观的创新解决方案,提供了一种集成的、不那么侵入性的方法,实时洞察分布式系统。
Speakers
avatar for Kai Liu

Kai Liu

Senior Software Developer, Alibaba Cloud
Liu Kai, a senior software development engineer in the Cloud Native Observability team of Alibaba Cloud. With years of practical experience and insights in the field of monitoring and observability, Liu Kai continuously delves into the realm of observability solutions, including architectural... Read More →
avatar for Wanqi Yang

Wanqi Yang

Student, Sun Yat-sen University
Wanqi Yang received the B.S. degree in Computer Science and Technology from Sun Yat-Sen University, Guangzhou, China. She is currently working toward the PhD degree in Computer Science and Technology at School of Computer Science and Engineering, Sun Yat-Sen University. Her research... Read More →
Thursday August 22, 2024 17:15 - 17:50 HKT
Level 1 | Hung Hom Room 1
  KubeCon + CloudNativeCon Sessions, Observability
 
Friday, August 23
 

16:05 HKT

Unlocking LLM Performance with EBPF: Optimizing Training and Inference Pipelines | 通过eBPF解锁LLM性能:优化训练和推理管道 - Yang Xiang, Yunshan Networks, Inc.
Friday August 23, 2024 16:05 - 16:40 HKT
The training and inference processes of Large Language Models (LLMs) involve handling vast amounts of model data and training data, and consume significant GPU compute resources. However, enhancing GPU utilization becomes extremely challenging in the absence of observability. This presentation will introduce how to achieve observability in LLM training and inference processes with zero disruption using eBPF. This includes utilizing Memory Profiling to understand the loading performance of models and training data, Network Profiling to comprehend the data exchange performance, and GPU Profiling to analyze GPU's MFU (Model FLOPs Utilization) and performance bottlenecks. Additionally, we will share the practical effects of implementing observability in a PyTorch LLM application and the llm.c project using eBPF, aiming to enhance training and inference performance.

大型语言模型(LLMs)的训练和推断过程涉及处理大量的模型数据和训练数据,并消耗大量的GPU计算资源。然而,在缺乏可观察性的情况下,提高GPU利用率变得极具挑战性。 本次演讲将介绍如何利用eBPF在LLM训练和推理过程中实现零中断的可观察性。这包括利用内存分析来了解模型和训练数据的加载性能,网络分析来理解数据交换性能,以及GPU分析来分析GPU的MFU(模型FLOPs利用率)和性能瓶颈。 此外,我们将分享在PyTorch LLM应用程序和llm.c项目中使用eBPF实现可观察性的实际效果,旨在提高训练和推理性能。
Speakers
avatar for Yang Xiang

Yang Xiang

VP of Engineering, Yunshan Networks, Inc.
Received a Ph.D. from Tsinghua University, and currently serving as VP of Engineering at Yunshan Networks and the head of the DeepFlow open-source community. He has presented academic papers on topics such as application observability and network measurement at top international academic... Read More →
Friday August 23, 2024 16:05 - 16:40 HKT
Level 1 | Hung Hom Room 2
  KubeCon + CloudNativeCon Sessions, Observability
 

Share Modal

Share this link via

Or copy link

Filter sessions
Apply filters to sessions.