File size: 6,372 Bytes
07ad002 e046e74 07ad002 6c2c795 07ad002 f939989 6c2c795 f939989 6c2c795 f939989 6c2c795 f939989 6c2c795 07ad002 6c2c795 07ad002 6c2c795 f939989 6c2c795 f939989 07ad002 6c2c795 f939989 6c2c795 07ad002 f939989 6c2c795 f939989 07ad002 6c2c795 f939989 6c2c795 f939989 07ad002 6430380 07ad002 6c2c795 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 |
---
language:
- en
license: cc-by-nc-4.0
model_name: Octopus-V4-GGUF
base_model: NexaAIDev/Octopus-v4
inference: false
model_creator: NexaAIDev
quantized_by: Nexa AI, Inc.
tags:
- function calling
- on-device language model
- gguf
- llama cpp
---
# Octopus V4-GGUF: Graph of language models
<p align="center">
- <a href="https://huggingface.co/NexaAIDev/Octopus-v4" target="_blank">Original Model</a>
- <a href="https://www.nexa4ai.com/" target="_blank">Nexa AI Website</a>
- <a href="https://github.com/NexaAI/octopus-v4" target="_blank">Octopus-v4 Github</a>
- <a href="https://arxiv.org/abs/2404.19296" target="_blank">ArXiv</a>
- <a href="https://huggingface.co/spaces/NexaAIDev/domain_llm_leaderboard" target="_blank">Domain LLM Leaderbaord</a>
</p>
<p align="center" width="100%">
<a><img src="octopus-v4-logo.png" alt="nexa-octopus" style="width: 40%; min-width: 300px; display: block; margin: auto;"></a>
</p>
**Acknowledgement**:
We sincerely thank our community members, [Mingyuan](https://huggingface.co/ThunderBeee) and [Zoey](https://huggingface.co/ZY6), for their extraordinary contributions to this quantization effort. Please explore [Octopus-v4](https://huggingface.co/NexaAIDev/Octopus-v4) for our original huggingface model.
## Get Started
To run the models, please download them to your local machine using either git clone or [Hugging Face Hub](https://huggingface.co/docs/huggingface_hub/en/guides/download)
```
git clone https://huggingface.co/NexaAIDev/octopus-v4-gguf
```
## Run with [llama.cpp](https://github.com/ggerganov/llama.cpp) (Recommended)
1. **Clone and compile:**
```bash
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
# Compile the source code:
make
```
2. **Execute the Model:**
Run the following command in the terminal:
```bash
./main -m ./path/to/octopus-v4-Q4_K_M.gguf -n 256 -p "<|system|>You are a router. Below is the query from the users, please call the correct function and generate the parameters to call the function.<|end|><|user|>Tell me the result of derivative of x^3 when x is 2?<|end|><|assistant|>"
```
## Run with [Ollama](https://github.com/ollama/ollama)
Since our models have not been uploaded to the Ollama server, please download the models and manually import them into Ollama by following these steps:
1. Install Ollama on your local machine. You can also following the guide from [Ollama GitHub repository](https://github.com/ollama/ollama/blob/main/docs/import.md)
```bash
git clone https://github.com/ollama/ollama.git ollama
```
2. Locate the local Ollama directory:
```bash
cd ollama
```
3. Create a `Modelfile` in your directory
```bash
touch Modelfile
```
4. In the Modelfile, include a `FROM` statement with the path to your local model, and the default parameters:
```bash
FROM ./path/to/octopus-v4-Q4_K_M.gguf
PARAMETER temperature 0
PARAMETER num_ctx 1024
PARAMETER stop <nexa_end>
```
5. Use the following command to add the model to Ollama:
```bash
ollama create octopus-v4-Q4_K_M -f Modelfile
```
6. Verify that the model has been successfully imported:
```bash
ollama ls
```
7. Run the model
```bash
ollama run octopus-v4-Q4_K_M "<|system|>You are a router. Below is the query from the users, please call the correct function and generate the parameters to call the function.<|end|><|user|>Tell me the result of derivative of x^3 when x is 2?<|end|><|assistant|>"
```
### Dataset and Benchmark
* Utilized questions from [MMLU](https://github.com/hendrycks/test) to evaluate the performances.
* Evaluated with the Ollama [llm-benchmark](https://github.com/MinhNgyuen/llm-benchmark) method.
## Quantized GGUF Models
| Name | Quant method | Bits | Size | Respons (token/second) | Use Cases |
| ---------------------- | ------------ | ---- | ------- | ---------------------- | ----------------------------------------- |
| Octopus-v4.gguf | | | 7.64 GB | 27.64 | extremely large |
| Octopus-v4-Q2_K.gguf | Q2_K | 2 | 1.42 GB | 54.20 | extremely not recommended, high loss |
| Octopus-v4-Q3_K.gguf | Q3_K | 3 | 1.96 GB | 51.22 | not recommended |
| Octopus-v4-Q3_K_S.gguf | Q3_K_S | 3 | 1.68 GB | 51.78 | not very recommended |
| Octopus-v4-Q3_K_M.gguf | Q3_K_M | 3 | 1.96 GB | 50.86 | not very recommended |
| Octopus-v4-Q3_K_L.gguf | Q3_K_L | 3 | 2.09 GB | 50.05 | not very recommended |
| Octopus-v4-Q4_0.gguf | Q4_0 | 4 | 2.18 GB | 65.76 | good quality, recommended |
| Octopus-v4-Q4_1.gguf | Q4_1 | 4 | 2.41 GB | 69.01 | slow, good quality, recommended |
| Octopus-v4-Q4_K.gguf | Q4_K | 4 | 2.39 GB | 55.76 | slow, good quality, recommended |
| Octopus-v4-Q4_K_S.gguf | Q4_K_S | 4 | 2.19 GB | 53.98 | high quality, recommended |
| Octopus-v4-Q4_K_M.gguf | Q4_K_M | 4 | 2.39 GB | 58.39 | some functions loss, not very recommended |
| Octopus-v4-Q5_0.gguf | Q5_0 | 5 | 2.64 GB | 61.98 | slow, good quality |
| Octopus-v4-Q5_1.gguf | Q5_1 | 5 | 2.87 GB | 63.44 | slow, good quality |
| Octopus-v4-Q5_K.gguf | Q5_K | 5 | 2.82 GB | 58.28 | moderate speed, recommended |
| Octopus-v4-Q5_K_S.gguf | Q5_K_S | 5 | 2.64 GB | 59.95 | moderate speed, recommended |
| Octopus-v4-Q5_K_M.gguf | Q5_K_M | 5 | 2.82 GB | 53.31 | fast, good quality, recommended |
| Octopus-v4-Q6_K.gguf | Q6_K | 6 | 3.14 GB | 52.15 | large, not very recommended |
| Octopus-v4-Q8_0.gguf | Q8_0 | 8 | 4.06 GB | 50.10 | very large, good quality |
| Octopus-v4-f16.gguf | f16 | 16 | 7.64 GB | 30.61 | extremely large |
_Quantized with llama.cpp_
|