Loading…
Attending this event?
In-person
21-23 August, 2024
Learn More and Register to Attend

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon + Open Source Summit + AI_Dev China 2024 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in Hong Kong Standard Time (UTC +8). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change and session seating is available on a first-come, first-served basis. 

亲临现场
2024年8月21-23日
了解更多并注册参加

Sched应用程序允许您创建自己的日程安排,但不能替代您的活动注册。您必须注册参加KubeCon + CloudNativeCon + Open Source Summit + AI_Dev China 2024,才能参加会议。如果您尚未注册但希望加入我们,请访问活动注册页面购买注册。

请注意:本日程自动显示为香港标准时间(UTC +8)。要查看您偏好的时区的日程,请从右侧“按日期筛选”上方的下拉菜单中选择。日程可能会有变动,会议席位先到先得。
AI_dev: Open Source GenAI & ML Summit Sessions clear filter
Thursday, August 22
 

15:35 HKT

Empower Large Language Models (LLMs) Serving in Production with Cloud Native AI Technologies | 利用云原生人工智能技术在生产环境中赋能大型语言模型(LLMs) - Lize Cai, SAP & Yang Che, Alibaba Cloud Intelligence
Thursday August 22, 2024 15:35 - 16:10 HKT
LLMs have heightened public expectations of generative models. However, as noted in the Gartner report, running AI applications in production poses significant challenges. To tackle the challenges, we have redesigned and optimized the software capabilities of Cloud Native AI Technologies. By extending KServe to handle OpenAI's streaming requests, it can accommodate the inference load of LLM. With Fluid and Vineyard, It shows a result of reducing Llama-30B model loading time from 10 minutes to under 25 seconds. However, the above optimizations do not stop there. Since LLM loading is not a high-frequency operation,It is crucial to utilize cronHPA for timed auto-scaling in order to achieve a balance between cost and performance, and to evaluate the cost-effectiveness of the scaling process. As KServe and Fluid's reviewer and maintainer, we share our insights on the challenges in the session. We will showcase effective use of Cloud Native AI and share our experiences in production.

LLM让公众对生成式大模型的期望提高。然而,正如Gartner报告所指出的,将AI应用程序投入生产中存在重大挑战。为了解决这些挑战,我们重新设计和优化了云原生AI技术的软件能力。通过扩展KServe以处理OpenAI的流式请求,它可以容纳LLM的推理负载。通过Fluid和Vineyard,我们成功将Llama-30B模型的加载时间从10分钟缩短到不到25秒。然而,上述优化并不止于此。由于LLM加载不是高频操作,利用cronHPA进行定时自动扩展至关重要,以实现成本和性能之间的平衡,并评估扩展过程的成本效益。作为KServe和Fluid的审阅者和维护者,我们在本场演讲中分享了对挑战的见解。我们将展示云原生AI的有效使用,并分享我们在生产中的经验。
Speakers
avatar for Yang Che

Yang Che

senior engineer, Alibaba Cloud Intelligence
Yang Che, is a senior engineer of Alibaba Cloud. He works in Alibaba cloud container service team, and focuses on Kubernetes and container related product development. Yang also works on building elastic machine learning platform on those technologies. He is an active contributor... Read More →
avatar for Lize Cai

Lize Cai

Senior Software Engineer, SAP
Lize is a senior software engineer at SAP, based in Singapore. With a strong product mindset, Lize has extensive experience in building enterprise-grade machine learning platforms. A passionate advocate for open source technology, Lize actively contributes to various projects, including... Read More →
Thursday August 22, 2024 15:35 - 16:10 HKT
Level 1 | Hung Hom Room 3

16:25 HKT

Effortless Scalability: Orchestrating Large Language Model Inference with Kubernetes | 无缝扩展性:使用Kubernetes编排大型语言模型推理 - Joinal Ahmed & Nirav Kumar, Navatech Group
Thursday August 22, 2024 16:25 - 17:00 HKT
In the dynamic landscape of AI/ML, deploying and orchestrating large open-source inference models on Kubernetes has become paramount. This talk delves into the intricacies of automating the deployment of heavyweight models like Falcon and Llama 2, leveraging Kubernetes Custom Resource Definitions (CRDs) to manage large model files seamlessly through container images. The deployment is streamlined with an HTTP server facilitating inference calls using the model library. This session will explore eliminating manual tuning of deployment parameters to fit GPU hardware by providing preset configurations. Learn how to auto-provision GPU nodes based on specific model requirements, ensuring optimal utilization of resources. We'll discuss empowering users to deploy their containerized models effortlessly by allowing them to provide a pod template in the workspace custom resource inference field. The controller dynamically, in turn, creates deployment workloads utilizing all GPU nodes.

在AI/ML不断发展的领域中,在Kubernetes上部署和编排大型开源推理模型变得至关重要。本次演讲将深入探讨自动化部署像Falcon和Llama 2这样的重型模型的复杂性,利用Kubernetes自定义资源定义(CRDs)通过容器镜像无缝管理大型模型文件。部署通过HTTP服务器简化,以便使用模型库进行推理调用。 本场演讲将探讨通过提供预设配置来消除手动调整部署参数以适应GPU硬件的需求。了解如何根据特定模型要求自动配置GPU节点,确保资源的最佳利用。我们将讨论如何赋予用户轻松部署其容器化模型的能力,允许他们在工作区自定义资源推理字段中提供一个pod模板。控制器动态地创建部署工作负载,利用所有GPU节点。
Speakers
avatar for Joinal Ahmed

Joinal Ahmed

AI Architect, Navatech Group
Joinal is a seasoned Data Science expert passionate about rapid prototyping, community involvement, and driving technology adoption. With a robust technical background, he excels in leading diverse teams through ML projects, recruiting and mentoring talent, optimizing workflows, and... Read More →
avatar for Nirav Kumar

Nirav Kumar

Head of AI and Engineering, Navatech Group
Nirav Kumar is a leader in the field of Artificial Intelligence with over 13 years of experience in data science and machine learning. As Head of AI and Engineering at Navatech Group, he spearheads cutting-edge research and development initiatives aimed at pushing the boundaries of... Read More →
Thursday August 22, 2024 16:25 - 17:00 HKT
Level 1 | Hung Hom Room 3
 
Friday, August 23
 

15:15 HKT

Detecting and Overcoming GPU Failures During ML Training | 在ML训练过程中检测和克服GPU故障 - Ganeshkumar Ashokavardhanan, Microsoft & Sarah Belghiti, Wayve
Friday August 23, 2024 15:15 - 15:50 HKT
Scaling ML training demands powerful GPU infrastructure, and as model sizes and training scale increases, GPU failures become an expensive risk. From outright hardware faults to subtle performance degradation, undetected GPU problems can sabotage training jobs, inflating costs and slowing development. This talk dives into GPU failure challenges in the context of ML training, particularly distributed training. We will explore the spectrum of GPU issues, and why even minor performance drops can cripple large jobs. Learn how observability (leveraging tools like NVIDIA DCGM) enables proactive problem detection through GPU health checks. Understand principles of fault-tolerant distributed training to mitigate GPU failure fallout. Drawing on cloud provider and autonomous vehicle company experience, we will share best practices for efficient identification, remediation, and prevention of GPU failures. We will also explore cutting-edge ideas like CRIU and task pre-emption for GPU workloads.

随着模型规模和训练规模的增加,机器学习训练需要强大的GPU基础设施,而GPU故障成为一种昂贵的风险。从硬件故障到性能逐渐下降,未被发现的GPU问题可能会破坏训练任务,增加成本并减缓开发速度。本次演讲将深入探讨在机器学习训练中GPU故障所带来的挑战,特别是在分布式训练中。我们将探讨各种GPU问题的范围,以及为什么即使是轻微的性能下降也可能瘫痪大型任务。 了解如何通过观测性(利用诸如NVIDIA DCGM之类的工具)通过GPU健康检查实现问题的主动检测。了解容错分布式训练的原则,以减轻GPU故障的后果。借鉴云服务提供商和自动驾驶汽车公司的经验,我们将分享高效识别、纠正和预防GPU故障的最佳实践。我们还将探讨像CRIU和任务抢占等尖端想法,以应对GPU工作负载。
Speakers
avatar for Ganeshkumar Ashokavardhanan

Ganeshkumar Ashokavardhanan

Software Engineer, Microsoft
Ganesh is a Software Engineer on the Azure Kubernetes Service team at Microsoft, working on node lifecycle, and is the lead for the GPU workload experience on this kubernetes platform. He collaborates with partners in the ecosystem like NVIDIA to support operator models for machine... Read More →
avatar for Sarah Belghiti

Sarah Belghiti

ML Platform Engineer, Wayve
Sarah Belghiti is an ML Platform Engineer at Wayve, a leading developer of embodied intelligence for autonomous vehicles. She works on the infrastructure, scheduling and monitoring of ML workloads. With GPUs becoming an increasingly scarce resource, her focus has been on building... Read More →
Friday August 23, 2024 15:15 - 15:50 HKT
Level 1 | Hung Hom Room 3
 

Share Modal

Share this link via

Or copy link

Filter sessions
Apply filters to sessions.