OminiX: Towards Unified Library and Acceleration Framework
for Generative AI Models on Different Hardware Platforms
In the generative AI era, general users
need to apply different base models, finetuned checkpoints, and LoRAs. Also the data
privacy and real-time requirement will favor on-device, local deployment of
large-scale generative AI models. It is desirable to develop a
"
plug-and-play"
framework such that users can download any generative AI
model, click and run on their own device. This poses significant challenge to the
current AI deployment frameworks, which are typically time-consuming and requires
human expertise of hardware and code generation. We present our effort of OminiX,
which is a first step towards unified library and acceleartion of generative AI
models across various hardware platforms. Integrating our unique front-end library
and back-end instantaneous acceleration techniques, which will be open-source soon,
we show capability of plug-and-play deployment and state-of-the-art acceleration of
various generative AI models, starting from image generation, large language models,
multi-model language models, speech generation and voice cloning, real-time chatting
engine, real-time translation, video generation, real-time avatar, to name a few.
This can be achieved without server, just on everyone'
s own platform.