AI Models & Infra

Practical Exploration of XVERSE Large Language Model Series

Speakers
Date / Time
2024-10-17
17:10

This presentation primarily showcases the hands-on exploration of developing the XVERSE large language model series, covering various stages from ideation to realization and evolving from dense models to Mixture-of-Experts (MoE) models. In terms of data, we will outline the necessary datasets for effective training, describe preprocessing protocols, and discuss approaches for continual model improvement. Regarding the model aspect, there will be a focus on adapting from standard dense layers to implementing MoE structures, highlighting decision criteria for specifying expert capacities and allocating importance among them. Furthermore, addressing architectural facets, we will articulate methodologies designed to maximize training productivity and secure consistent learning performance.