Next Generation Media & Device

OminiX: Towards Unified Library and Acceleration Framework for Generative AI Models on Different Hardware Platforms

Speakers

Yanzhi Wang

Date / Time

2024-10-18

10:10

Presentation Slides

Presentation Video

YouTube

In the generative AI era, general users need to apply different base models, finetuned checkpoints, and LoRAs. Also the data privacy and real-time requirement will favor on-device, local deployment of large-scale generative AI models. It is desirable to develop a "plug-and-play" framework such that users can download any generative AI model, click and run on their own device. This poses significant challenge to the current AI deployment frameworks, which are typically time-consuming and requires human expertise of hardware and code generation. We present our effort of OminiX, which is a first step towards unified library and acceleartion of generative AI models across various hardware platforms. Integrating our unique front-end library and back-end instantaneous acceleration techniques, which will be open-source soon, we show capability of plug-and-play deployment and state-of-the-art acceleration of various generative AI models, starting from image generation, large language models, multi-model language models, speech generation and voice cloning, real-time chatting engine, real-time translation, video generation, real-time avatar, to name a few. This can be achieved without server, just on everyone's own platform.