The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon + Open Source Summit + AI_Dev China 2024 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.
Please note: This schedule is automatically displayed in Hong Kong Standard Time (UTC +8). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change and session seating is available on a first-come, first-served basis.
亲临现场
2024年8月21-23日
了解更多并注册参加
Sched应用程序允许您创建自己的日程安排,但不能替代您的活动注册。您必须注册参加KubeCon + CloudNativeCon + Open Source Summit + AI_Dev China 2024,才能参加会议。如果您尚未注册但希望加入我们,请访问活动注册页面购买注册。
With the popularity of LLM apps, there is an increasing demand for running and scaling AI workloads in the cloud and on edge devices. Rust and Wasm offer a solution by providing a portable bytecode that abstracts hardware complexities. LlamaEdge is a lightweight, high-performance and cross-platform LLM inference runtime. Written in Rust and built on WasmEdge, LlamaEdge provides a standard WASI-NN API to developers. Developers only need to write against the API and compile to Wasm. The Wasm file can run on any device, where WasmEdge translates and routes Wasm calls to the underlying native libraries such as llama.cpp. This talk will discuss the design and implementation of LlamaEdge and show how it enables cross-platform LLM app development and deployment. We will also walk through several code examples from a basic sentence completion app, to a chat bot, to an RAG agent app with external knowledge in vector databases, to a Kubernetes managed app across a heterogeneous cluster.