AI Models & Infra

Practical Exploration of XVERSE Large Language Model Series

Speakers

John Xuan

Date / Time

2024-10-17

17:10

Presentation Slides

Presentation Video

YouTube

This presentation primarily showcases the hands-on exploration of developing the XVERSE large language model series, covering various stages from ideation to realization and evolving from dense models to Mixture-of-Experts (MoE) models. In terms of data, we will outline the necessary datasets for effective training, describe preprocessing protocols, and discuss approaches for continual model improvement. Regarding the model aspect, there will be a focus on adapting from standard dense layers to implementing MoE structures, highlighting decision criteria for specifying expert capacities and allocating importance among them. Furthermore, addressing architectural facets, we will articulate methodologies designed to maximize training productivity and secure consistent learning performance.