nxnhjrjtbjfzhrovwl commited on
Commit
c294a1d
·
1 Parent(s): dec8172

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -0
README.md ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ This repository contains the unquantized [Hermes+LIMARP merge](https://huggingface.co/Oniichat/hermes-limarp-13b-merged) in ggml format.
2
+
3
+ You can quantize the f16 ggml to the quantization of your choice by following the below steps:
4
+
5
+ 1. Download and extract the [llama.cpp binaries](https://github.com/ggerganov/llama.cpp/releases/download/master-41c6741/llama-master-41c6741-bin-win-avx2-x64.zip) ([or compile it yourself if you're on Linux](https://github.com/ggerganov/llama.cpp#build))
6
+ 2. Move the "quantize" executable to the same folder where you downloaded the f16 ggml model.
7
+ 3. Open a command prompt window in that same folder and write the following command, making the changes that you see fit.
8
+ ```bash
9
+ quantize.exe hermes-limarp-13b.ggmlv3.f16.bin hermes-limarp-13b.ggmlv3.q4_0.bin q4_0
10
+ ```
11
+ 4. Press enter to run the command and the quantized model will be generated in the folder.
12
+