Loading…
Attending this event?
In-person
21-23 August, 2024
Learn More and Register to Attend

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon + Open Source Summit + AI_Dev China 2024 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in Hong Kong Standard Time (UTC +8). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change and session seating is available on a first-come, first-served basis. 

亲临现场
2024年8月21-23日
了解更多并注册参加

Sched应用程序允许您创建自己的日程安排,但不能替代您的活动注册。您必须注册参加KubeCon + CloudNativeCon + Open Source Summit + AI_Dev China 2024,才能参加会议。如果您尚未注册但希望加入我们,请访问活动注册页面购买注册。

请注意:本日程自动显示为香港标准时间(UTC +8)。要查看您偏好的时区的日程,请从右侧“按日期筛选”上方的下拉菜单中选择。日程可能会有变动,会议席位先到先得。
Thursday August 22, 2024 3:35pm - 4:10pm HKT
LLMs have heightened public expectations of generative models. However, as noted in the Gartner report, running AI applications in production poses significant challenges. To tackle the challenges, we have redesigned and optimized the software capabilities of Cloud Native AI Technologies. By extending KServe to handle OpenAI's streaming requests, it can accommodate the inference load of LLM. With Fluid and Vineyard, It shows a result of reducing Llama-30B model loading time from 10 minutes to under 25 seconds. However, the above optimizations do not stop there. Since LLM loading is not a high-frequency operation,It is crucial to utilize cronHPA for timed auto-scaling in order to achieve a balance between cost and performance, and to evaluate the cost-effectiveness of the scaling process. As KServe and Fluid's reviewer and maintainer, we share our insights on the challenges in the session. We will showcase effective use of Cloud Native AI and share our experiences in production.

LLM让公众对生成式大模型的期望提高。然而,正如Gartner报告所指出的,将AI应用程序投入生产中存在重大挑战。为了解决这些挑战,我们重新设计和优化了云原生AI技术的软件能力。通过扩展KServe以处理OpenAI的流式请求,它可以容纳LLM的推理负载。通过Fluid和Vineyard,我们成功将Llama-30B模型的加载时间从10分钟缩短到不到25秒。然而,上述优化并不止于此。由于LLM加载不是高频操作,利用cronHPA进行定时自动扩展至关重要,以实现成本和性能之间的平衡,并评估扩展过程的成本效益。作为KServe和Fluid的审阅者和维护者,我们在本场演讲中分享了对挑战的见解。我们将展示云原生AI的有效使用,并分享我们在生产中的经验。
Speakers
avatar for Yang Che

Yang Che

senior engineer, Alibaba Cloud Intelligence
Yang Che, is a senior engineer of Alibaba Cloud. He works in Alibaba cloud container service team, and focuses on Kubernetes and container related product development. Yang also works on building elastic machine learning platform on those technologies. He is an active contributor... Read More →
avatar for Lize Cai

Lize Cai

Senior Software Engineer, SAP
Lize is a senior software engineer at SAP, based in Singapore. With a strong product mindset, Lize has extensive experience in building enterprise-grade machine learning platforms. A passionate advocate for open source technology, Lize actively contributes to various projects, including... Read More →
Thursday August 22, 2024 3:35pm - 4:10pm HKT
Level 1 | Hung Hom Room 3

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link