AetherArchitectural
/

GGUF-Quantization-Script

Text Generation

GGUF

quantized

text-generation-inference

Model card Files Files and versions Community

FantasiaFoundry commited on Aug 9, 2024

Commit

ff89e0c

verified ·

1 Parent(s): e7620c4

Update README.md

Browse files

Thanks,

@Virt-io
!

Files changed (1) hide show

README.md +11 -3

README.md CHANGED Viewed

@@ -9,11 +9,11 @@ tags:
 ---
 > [!TIP]
-> **Credits:**
->
-> Made with love by [**@Lewdiculous**](https://huggingface.co/Lewdiculous) with the handy contributions by [**@SolidSnacke**](https://huggingface.co/SolidSnacke). <br>
 > If this proves useful for you, feel free to credit and share the repository and authors.
 > [!WARNING]
 > **[Important] Llama-3:**
 >
@@ -22,11 +22,19 @@ tags:
 >
 > Basically, make sure you're in the latest llama.cpp repo commit, then run the new `convert-hf-to-gguf-update.py` script inside the repo (you will need to provide a huggingface-read-token, and you need to have access to the Meta-Llama-3 repositories – [here](https://huggingface.co/meta-llama/Meta-Llama-3-8B) and [here](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) – to be sure, so fill the access request forms right away to be able to fetch the necessary files, you also might need to refresh the tokens if it stops working after some time), afterwards you need to manually copy the config files from `llama.cpp\models\tokenizers\llama-bpe` into your downloaded **model** folder, replacing the existing ones. <br>
 > Try again and the conversion procress should work as expected.
 > [!WARNING]
 > **Experimental:** <br>
 > There is a new experimental script added, `gguf-imat-lossless-for-BF16.py`, which performs the conversions directly from a BF16 GGUF to hopefully generate lossless, or as close to that for now, Llama-3 model quantizations avoiding the recent talked about issues on that topic, it is more resource intensive and will generate more writes in the drive as there's a whole additional conversion step that isn't performed in the previous version. This should only be necessary until we have GPU support for BF16 to run directly without conversion.
 Pull Requests with your own features and improvements to this script are always welcome.

 ---
 > [!TIP]
+> **Credits:** <br>
+> Made with love by [**@Lewdiculous**](https://huggingface.co/Lewdiculous) with the handy contributions by [**@SolidSnacke**](https://huggingface.co/SolidSnacke) and [**@Virt-io**](https://huggingface.co/Virt-io). <br>
 > If this proves useful for you, feel free to credit and share the repository and authors.
+<!--
 > [!WARNING]
 > **[Important] Llama-3:**
 >
 >
 > Basically, make sure you're in the latest llama.cpp repo commit, then run the new `convert-hf-to-gguf-update.py` script inside the repo (you will need to provide a huggingface-read-token, and you need to have access to the Meta-Llama-3 repositories – [here](https://huggingface.co/meta-llama/Meta-Llama-3-8B) and [here](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) – to be sure, so fill the access request forms right away to be able to fetch the necessary files, you also might need to refresh the tokens if it stops working after some time), afterwards you need to manually copy the config files from `llama.cpp\models\tokenizers\llama-bpe` into your downloaded **model** folder, replacing the existing ones. <br>
 > Try again and the conversion procress should work as expected.
+-->
+<!--
 > [!WARNING]
 > **Experimental:** <br>
 > There is a new experimental script added, `gguf-imat-lossless-for-BF16.py`, which performs the conversions directly from a BF16 GGUF to hopefully generate lossless, or as close to that for now, Llama-3 model quantizations avoiding the recent talked about issues on that topic, it is more resource intensive and will generate more writes in the drive as there's a whole additional conversion step that isn't performed in the previous version. This should only be necessary until we have GPU support for BF16 to run directly without conversion.
+-->
+> [!NOTE]
+> **Linux support (experimental):** <br>
+> There's an experimental script for Linux, `gguf-imat-lossless-for-BF16-linux.py`. <br>
+> While I personally can't attest for it, it's worth trying and you can report how well it worked, or not, in your case. <br>
+> Improvements are very welcome!
 Pull Requests with your own features and improvements to this script are always welcome.