nxnhjrjtbjfzhrovwl
commited on
Commit
·
c294a1d
1
Parent(s):
dec8172
Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,12 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
This repository contains the unquantized [Hermes+LIMARP merge](https://huggingface.co/Oniichat/hermes-limarp-13b-merged) in ggml format.
|
2 |
+
|
3 |
+
You can quantize the f16 ggml to the quantization of your choice by following the below steps:
|
4 |
+
|
5 |
+
1. Download and extract the [llama.cpp binaries](https://github.com/ggerganov/llama.cpp/releases/download/master-41c6741/llama-master-41c6741-bin-win-avx2-x64.zip) ([or compile it yourself if you're on Linux](https://github.com/ggerganov/llama.cpp#build))
|
6 |
+
2. Move the "quantize" executable to the same folder where you downloaded the f16 ggml model.
|
7 |
+
3. Open a command prompt window in that same folder and write the following command, making the changes that you see fit.
|
8 |
+
```bash
|
9 |
+
quantize.exe hermes-limarp-13b.ggmlv3.f16.bin hermes-limarp-13b.ggmlv3.q4_0.bin q4_0
|
10 |
+
```
|
11 |
+
4. Press enter to run the command and the quantized model will be generated in the folder.
|
12 |
+
|