A minimal, cross-platform LLM chat app with BELLE using quantized on-device offline models and Flutter UI, running on macOS (done), Windows, Android, iOS(see Known Issues) and more.
Please refer to Releases.
Downloading and usage for different platforms: Usage.
Only macOS supported by now. More platforms coming soon!
You can download from huggingface repo, ChatBELLE-int4
You need to first execute the ChatBELLE app, which will create a folder~/Library/Containers/com.barius.chatbelle. Then rename the downloaded model and move it to the path displayed on the app. The default is ~/Library/Containers/com.barius.chatbell/Data/belle-model.bin.
Utilizes llama.cpp's 4bit quantization to optimize on-device inferencing speed and RAM occupation. Quantization leads to accuracy loss and model performance degradation. 4-bit quantization trades accuracy for model size, our current 4-bit model sees significant performance gap compared with fp32 or fp16 ones and is just for users to take a try. With better algorithms being developed and more powerful chips landing on mobile devices, we believe on-device model performance will thrive and will keep a close track on this.
GPTQ employs one-shot quantization to achieve lower accuracy loss or higher model compression rate. We will keep track of this line of work.
Recommend using M1/M2 series CPU with 16GB RAM to have the best experience. If you encounter slow inference, try closing other apps to release more memory. Inference on 8G RAM will be very slow. Intel CPUs could possibly run as well (not tested) but could be very slow.
Chat Belle.dmg
into Applications
folder.Chat Belle
app in Applications
folder by right click then Ctrl-click Open
, then click Open
.~/Library/Containers/com.barius.chatbelle/Data/belle-model.bin
.This program is for learning and research purposes only. The devs take no responsibilities in any damage caused by using or distributing this program.
Вы можете оставить комментарий после Вход в систему
Неприемлемый контент может быть отображен здесь и не будет показан на странице. Вы можете проверить и изменить его с помощью соответствующей функции редактирования.
Если вы подтверждаете, что содержание не содержит непристойной лексики/перенаправления на рекламу/насилия/вульгарной порнографии/нарушений/пиратства/ложного/незначительного или незаконного контента, связанного с национальными законами и предписаниями, вы можете нажать «Отправить» для подачи апелляции, и мы обработаем ее как можно скорее.
Опубликовать ( 0 )