Workshop: Triton Language & vLLM

vLLM Meets Qwen: What Have We Done in Alibaba?

Speakers

Tao He

Date / Time

2024-10-17

15:00

Presentation Slides

Presentation Video

YouTube

The talk will share insights into the optimizations made within Alibaba's Tongyi Lab to support the efficient service of the Qwen model on vLLM. These optimizations encompass model quantization, framework performance enhancements, and algorithmic improvements. Additionally, the presentation will highlight the team's contributions to the vLLM open-source community and conclude with a discussion on potential new challenges that the Qwen series of models may pose to vLLM in the future.