Workshop: Triton Language & vLLM

vLLM Meets Qwen: What Have We Done in Alibaba?

Speakers
Date / Time
2024-10-17
15:00
Presentation Slides
Presentation Video

The talk will share insights into the optimizations made within Alibaba's Tongyi Lab to support the efficient service of the Qwen model on vLLM. These optimizations encompass model quantization, framework performance enhancements, and algorithmic improvements. Additionally, the presentation will highlight the team's contributions to the vLLM open-source community and conclude with a discussion on potential new challenges that the Qwen series of models may pose to vLLM in the future.