Name: AI Inference Performance Acceleration: Methods, Tools, and Deployment Workflows | AI推理性能加速：方法、工具和部署工作流程 - Yifei Zhang & 磊 钱, Bytedance
Start: 2024-08-21T11:50:00+0800
End: 2024-08-21T12:25:00+0800

In-person
21-23 August, 2024
Learn More and Register to Attend

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon + Open Source Summit + AI_Dev China 2024 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in Hong Kong Standard Time (UTC +8). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change and session seating is available on a first-come, first-served basis.

亲临现场

2024年8月21-23日

了解更多并注册参加

Sched应用程序允许您创建自己的日程安排，但不能替代您的活动注册。您必须注册参加KubeCon + CloudNativeCon + Open Source Summit + AI_Dev China 2024，才能参加会议。如果您尚未注册但希望加入我们，请访问活动注册页面购买注册。

请注意：本日程自动显示为香港标准时间（UTC +8）。要查看您偏好的时区的日程，请从右侧“按日期筛选”上方的下拉菜单中选择。日程可能会有变动，会议席位先到先得。

Wednesday August 21, 2024 11:50am - 12:25pm HKT

Level 1 | Hung Hom Room 7

As AI rapidly evolves and embraces cloud-native technologies, inference performance has become crucial for application value. GPU selection, serving framework configuration, and model/data loading significantly impact inference efficiency. We'll focus on cloud-native solutions to storage performance issues and tools for evaluating inference performance across configurations, offering optimal deployment setups integrated into cloud-native workflows. We'll discuss inference performance's impact on user experience and how optimization can reduce costs and improve efficiency. Using technologies like Fluid and model optimization, we'll share strategies to enhance inference performance. Based on performance and cost analysis of various GPUs, we'll guide AI engineers in hardware selection. Additionally, we'll introduce a performance testing tool to evaluate and recommend the best model, hardware, and acceleration scheme combinations, aligning with deployment workflows based on test results.

随着人工智能的快速发展和对云原生技术的采用，推理性能对应用价值变得至关重要。 GPU选择、服务框架配置以及模型/数据加载对推理效率有着重大影响。我们将专注于云原生解决方案，解决存储性能问题，并提供评估不同配置下推理性能的工具，为云原生工作流程提供最佳部署设置。我们将讨论推理性能对用户体验的影响，以及优化如何降低成本并提高效率。利用Fluid和模型优化等技术，我们将分享增强推理性能的策略。基于各种GPU的性能和成本分析，我们将指导人工智能工程师进行硬件选择。此外，我们将介绍一种性能测试工具，评估并推荐最佳模型、硬件和加速方案组合，根据测试结果与部署工作流程相匹配。

Speakers

Yifei Zhang

Software Engineer, Bytedance

Yifei Zhang, Software Engineer at Volcengine, focuses on technical research and product development in Kubernetes and AI, and has rich experience in public cloud, and is now fully working on VKE (Volcengine Kubernetes Engine), which is the managed Kubernetes product in Volcengine... Read More →

钱磊

Software Engineer, Bytedance

a kubernetes developer in bytedance. focus on building a stable kubernetes engine on public cloud.

Wednesday August 21, 2024 11:50am - 12:25pm HKT
Level 1 | Hung Hom Room 7

KubeCon + CloudNativeCon Sessions, AI + ML

Experience Level | 内容经验水平 中级 (Intermediate)
Language | 语言 中文 (Chinese)

KubeCon + CloudNativeCon + Open Source Summit + AI_dev China 2024

Yifei Zhang

钱磊

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!