Workshop: Triton Language & vLLM
vLLM Meets Qwen: What Have We Done in Alibaba?
Speakers
Presentation Slides
Presentation Video
The talk will share insights into the optimizations made within Alibaba's Tongyi Lab to support the efficient service of the Qwen model on vLLM. These optimizations encompass model quantization, framework performance enhancements, and algorithmic improvements. Additionally, the presentation will highlight the team's contributions to the vLLM open-source community and conclude with a discussion on potential new challenges that the Qwen series of models may pose to vLLM in the future.