Name: Effortless Scalability: Orchestrating Large Language Model Inference with Kubernetes | 无缝扩展性：使用Kubernetes编排大型语言模型推理 - Joinal Ahmed & Nirav Kumar, Navatech Group
Start: 2024-08-22T16:25:00+0800
End: 2024-08-22T17:00:00+0800

In-person
21-23 August, 2024
Learn More and Register to Attend

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon + Open Source Summit + AI_Dev China 2024 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in Hong Kong Standard Time (UTC +8). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change and session seating is available on a first-come, first-served basis.

亲临现场

2024年8月21-23日

了解更多并注册参加

Sched应用程序允许您创建自己的日程安排，但不能替代您的活动注册。您必须注册参加KubeCon + CloudNativeCon + Open Source Summit + AI_Dev China 2024，才能参加会议。如果您尚未注册但希望加入我们，请访问活动注册页面购买注册。

请注意：本日程自动显示为香港标准时间（UTC +8）。要查看您偏好的时区的日程，请从右侧“按日期筛选”上方的下拉菜单中选择。日程可能会有变动，会议席位先到先得。

Thursday August 22, 2024 4:25pm - 5:00pm HKT

Level 1 | Hung Hom Room 3

In the dynamic landscape of AI/ML, deploying and orchestrating large open-source inference models on Kubernetes has become paramount. This talk delves into the intricacies of automating the deployment of heavyweight models like Falcon and Llama 2, leveraging Kubernetes Custom Resource Definitions (CRDs) to manage large model files seamlessly through container images. The deployment is streamlined with an HTTP server facilitating inference calls using the model library. This session will explore eliminating manual tuning of deployment parameters to fit GPU hardware by providing preset configurations. Learn how to auto-provision GPU nodes based on specific model requirements, ensuring optimal utilization of resources. We'll discuss empowering users to deploy their containerized models effortlessly by allowing them to provide a pod template in the workspace custom resource inference field. The controller dynamically, in turn, creates deployment workloads utilizing all GPU nodes.

在AI/ML不断发展的领域中，在Kubernetes上部署和编排大型开源推理模型变得至关重要。本次演讲将深入探讨自动化部署像Falcon和Llama 2这样的重型模型的复杂性，利用Kubernetes自定义资源定义（CRDs）通过容器镜像无缝管理大型模型文件。部署通过HTTP服务器简化，以便使用模型库进行推理调用。本场演讲将探讨通过提供预设配置来消除手动调整部署参数以适应GPU硬件的需求。了解如何根据特定模型要求自动配置GPU节点，确保资源的最佳利用。我们将讨论如何赋予用户轻松部署其容器化模型的能力，允许他们在工作区自定义资源推理字段中提供一个pod模板。控制器动态地创建部署工作负载，利用所有GPU节点。

Speakers

Joinal Ahmed

AI Architect, Navatech Group

Joinal is a seasoned Data Science expert passionate about rapid prototyping, community involvement, and driving technology adoption. With a robust technical background, he excels in leading diverse teams through ML projects, recruiting and mentoring talent, optimizing workflows, and... Read More →

Nirav Kumar

Head of AI and Engineering, Navatech Group

Nirav Kumar is a leader in the field of Artificial Intelligence with over 13 years of experience in data science and machine learning. As Head of AI and Engineering at Navatech Group, he spearheads cutting-edge research and development initiatives aimed at pushing the boundaries of... Read More →

Thursday August 22, 2024 4:25pm - 5:00pm HKT
Level 1 | Hung Hom Room 3

AI_dev: Open Source GenAI & ML Summit Sessions, MLOps + GenOps + DataOps

Experience Level | 内容经验水平 中级 (Intermediate)
Language | 语言 英语 (English)

KubeCon + CloudNativeCon + Open Source Summit + AI_dev China 2024

Joinal Ahmed

Nirav Kumar

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!