pere commited on
Commit
a08c945
·
verified ·
1 Parent(s): 3745c8d

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +31 -31
README.md CHANGED
@@ -1,50 +1,50 @@
1
  ---
2
- base_model: north/nb-llama-3.1-8B-Instruct
3
  tags:
4
  - llama-cpp
5
- - gguf-my-repo
 
6
  ---
7
 
8
- # pere/nb-llama-3.1-8B-Instruct-Q4_K_M-GGUF
9
- This model was converted to GGUF format from [`north/nb-llama-3.1-8B-Instruct`](https://huggingface.co/north/nb-llama-3.1-8B-Instruct) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
10
- Refer to the [original model card](https://huggingface.co/north/nb-llama-3.1-8B-Instruct) for more details on the model.
11
 
12
- ## Use with llama.cpp
13
- Install llama.cpp through brew (works on Mac and Linux)
 
 
 
 
 
 
 
 
14
 
15
  ```bash
16
  brew install llama.cpp
17
-
18
  ```
19
- Invoke the llama.cpp server or the CLI.
20
 
21
- ### CLI:
 
 
 
 
 
22
  ```bash
23
- llama-cli --hf-repo pere/nb-llama-3.1-8B-Instruct-Q4_K_M-GGUF --hf-file nb-llama-3.1-8b-instruct-q4_k_m.gguf -p "The meaning to life and the universe is"
24
  ```
25
 
26
- ### Server:
27
  ```bash
28
- llama-server --hf-repo pere/nb-llama-3.1-8B-Instruct-Q4_K_M-GGUF --hf-file nb-llama-3.1-8b-instruct-q4_k_m.gguf -c 2048
29
  ```
30
 
31
- Note: You can also use this checkpoint directly through the [usage steps](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#usage) listed in the Llama.cpp repo as well.
32
-
33
- Step 1: Clone llama.cpp from GitHub.
34
- ```
35
- git clone https://github.com/ggerganov/llama.cpp
36
- ```
37
 
38
- Step 2: Move into the llama.cpp folder and build it with `LLAMA_CURL=1` flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux).
39
- ```
40
- cd llama.cpp && LLAMA_CURL=1 make
41
- ```
42
 
43
- Step 3: Run inference through the main binary.
44
- ```
45
- ./llama-cli --hf-repo pere/nb-llama-3.1-8B-Instruct-Q4_K_M-GGUF --hf-file nb-llama-3.1-8b-instruct-q4_k_m.gguf -p "The meaning to life and the universe is"
46
- ```
47
- or
48
- ```
49
- ./llama-server --hf-repo pere/nb-llama-3.1-8B-Instruct-Q4_K_M-GGUF --hf-file nb-llama-3.1-8b-instruct-q4_k_m.gguf -c 2048
50
- ```
 
1
  ---
2
+ base_model: "NB-Llama-3.1-8B-Instruct"
3
  tags:
4
  - llama-cpp
5
+ - gguf
6
+ - quantization
7
  ---
8
 
9
+ # NB-Llama-3.1-8B-Instruct-Q4_K_M-GGUF
10
+ This model is a **quantized** version of the original [NB-Llama-3.1-8B-Instruct](https://huggingface.co/north/nb-llama-3.1-8B-Instruct), converted into the **GGUF format** using [llama.cpp](https://github.com/ggerganov/llama.cpp). Quantization significantly reduces the model's memory footprint, enabling efficient inference on a wide range of hardware, including personal devices, without compromising too much quality. These quantized models are mainly provided so that people can test out the models with moderate hardware. If you want to benchmark the models or further finetune the models, we strongly recommend the non-quantized versions.
 
11
 
12
+ ## What is `llama.cpp`?
13
+ [`llama.cpp`](https://github.com/ggerganov/llama.cpp) is a versatile tool for running large language models optimized for efficiency. It supports multiple quantization formats (e.g., GGML and GGUF) and provides inference capabilities on diverse hardware, including CPUs, GPUs, and mobile devices. The GGUF format is the latest evolution, designed to enhance compatibility and performance.
14
+
15
+ ## Benefits of This Model
16
+ - **High Performance**: Achieves similar quality to the original model while using significantly less memory.
17
+ - **Hardware Compatibility**: Optimized for running on a variety of hardware, including low-resource systems.
18
+ - **Ease of Use**: Seamlessly integrates with `llama.cpp` for fast and efficient inference.
19
+
20
+ ## Installation
21
+ Install `llama.cpp` using Homebrew (works on Mac and Linux):
22
 
23
  ```bash
24
  brew install llama.cpp
 
25
  ```
 
26
 
27
+ ## Usage Instructions
28
+
29
+ ### Using with `llama.cpp`
30
+ To use this quantized model with `llama.cpp`, follow the steps below:
31
+
32
+ #### CLI:
33
  ```bash
34
+ llama-cli --hf-repo north/nb-llama-3.1-8B-Instruct-Q4_K_M-GGUF --hf-file nb-llama-3.1-8b-instruct-q4_k_m.gguf -p "Your prompt here"
35
  ```
36
 
37
+ #### Server:
38
  ```bash
39
+ llama-server --hf-repo north/nb-llama-3.1-8B-Instruct-Q4_K_M-GGUF --hf-file nb-llama-3.1-8b-instruct-q4_k_m.gguf -c 2048
40
  ```
41
 
42
+ For more information, refer to the [llama.cpp repository](https://github.com/ggerganov/llama.cpp).
 
 
 
 
 
43
 
44
+ ## Additional Resources
45
+ - [Original Model Card](https://huggingface.co/north/nb-llama-3.1-8B-Instruct)
46
+ - [llama.cpp Repository](https://github.com/ggerganov/llama.cpp)
47
+ - [GGUF Format Documentation](https://huggingface.co/docs/transformers/main/en/model_doc/llama)
48
 
49
+ ### Citing & Authors
50
+ The model was trained and documentation written by Per Egil Kummervold.