Name: How Fast Can Your Model Composition Run in Serverless Inference? | 您的模型组合在无服务器推理中可以运行多快？ - Fog Dong, BentoML & Wenbo Qi, Ant Group
Start: 2024-08-21T15:35:00+0800
End: 2024-08-21T16:10:00+0800

In-person
21-23 August, 2024
Learn More and Register to Attend

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon + Open Source Summit + AI_Dev China 2024 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in Hong Kong Standard Time (UTC +8). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change and session seating is available on a first-come, first-served basis.

亲临现场

2024年8月21-23日

了解更多并注册参加

Sched应用程序允许您创建自己的日程安排，但不能替代您的活动注册。您必须注册参加KubeCon + CloudNativeCon + Open Source Summit + AI_Dev China 2024，才能参加会议。如果您尚未注册但希望加入我们，请访问活动注册页面购买注册。

请注意：本日程自动显示为香港标准时间（UTC +8）。要查看您偏好的时区的日程，请从右侧“按日期筛选”上方的下拉菜单中选择。日程可能会有变动，会议席位先到先得。

Wednesday August 21, 2024 3:35pm - 4:10pm HKT

Level 1 | Hung Hom Room 7

Are you struggling with slow deployment times, high operational costs, or scalability issues when serving your ML models? Now, imagine the added complexity when typical AI apps require not just one, but an interconnected suite of models. In this session, discover how the integration of BentoML with Dragonfly effectively addresses these challenges, transforming the landscape of multi-model composition and inference within serverless Kubernetes envs. Join the co-presentation by the BentoML and Dragonfly communities to explore a compelling case study: a RAG app that combines 3 models—LLM, embedding, and OCR. Learn how our framework not only packages these diverse models efficiently but also utilizes Dragonfly's innovative P2P network for swift distribution. We'll further delve into how other open-source technologies like JuiceFS and VLLM have enabled us to achieve remarkable deployment times of just 40 seconds and establish a scalable blueprint for multi-model composition deployments.

您是否在为机器学习模型的部署时间慢、运营成本高或可扩展性问题而苦恼？现在，想象一下当典型的人工智能应用程序不仅需要一个模型，而是一个相互连接的模型套件时所增加的复杂性。在本场演讲中，了解BentoML与Dragonfly的集成如何有效解决这些挑战，改变了无服务器Kubernetes环境中多模型组合和推理的格局。加入BentoML和Dragonfly社区的联合演示，探索一个引人注目的案例研究：一个结合了LLM、嵌入和OCR三个模型的RAG应用程序。了解我们的框架不仅高效打包这些多样化的模型，还利用Dragonfly创新的P2P网络进行快速分发。我们还将深入探讨其他开源技术，如JuiceFS和VLLM，如何帮助我们实现仅需40秒的部署时间，并为多模型组合部署建立可扩展的蓝图。

Speakers

Wenbo Qi

Senior Software Engineer, Ant Group

Wenbo Qi is a software engineer at Ant Group working on Dragonfly. He is a maintainer of the Dragonfly. He hopes to do some positive contributions to open source software and believe that fear springs from ignorance.

Fog Dong

Senior Software Engineer, BentoML

Fog Dong, a Senior Engineer at BentoML, KubeVela maintainer, CNCF Ambassador, and LFAPAC Evangelist, has a rich background in cloud native. Previously instrumental in developing Alibaba's large-scale Serverless Application Engine workflows and Bytedance's cloud-native CI/CD platform... Read More →

Wednesday August 21, 2024 3:35pm - 4:10pm HKT
Level 1 | Hung Hom Room 7

KubeCon + CloudNativeCon Sessions, AI + ML

Experience Level | 内容经验水平 高级 (Advanced)
Language | 语言 中文 (Chinese)

KubeCon + CloudNativeCon + Open Source Summit + AI_dev China 2024

Wenbo Qi

Fog Dong

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!