How to run on Mac or without cuda?
Hi Team,
First of all, thank you to the team for creating such an amazing model for Vietnamese users. Additionally, I would like to ask if it is possible to run this model on servers without a GPU or on a MacBook? How can we resolve the flash attention issue?
Wishing the team good health and success in producing more great models for Vietnamese users.
Hi,
Thank you so much for your kind words and encouragement! the next version in this month of our model will be released with better performance and support for running on torch CPU. However, running on a CPU can be quite slow. To enable faster inference without CUDA, we plan to convert the model to run on MacBookβs XML architecture or in GGUF format after the improved version is complete.
π«‘πππͺ Happy new year π!