|
--- |
|
library_name: transformers |
|
tags: |
|
- not-for-all-audiences |
|
- llama-cpp |
|
- gguf-my-repo |
|
license: llama3.1 |
|
base_model: Hastagaras/Llama-3.1-Jamet-8B-MK.I |
|
--- |
|
|
|
# Triangle104/Llama-3.1-Jamet-8B-MK.I-Q4_K_M-GGUF |
|
This model was converted to GGUF format from [`Hastagaras/Llama-3.1-Jamet-8B-MK.I`](https://huggingface.co/Hastagaras/Llama-3.1-Jamet-8B-MK.I) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space. |
|
Refer to the [original model card](https://huggingface.co/Hastagaras/Llama-3.1-Jamet-8B-MK.I) for more details on the model. |
|
|
|
--- |
|
Model details: |
|
- |
|
System: |
|
|
|
### Roleplay Instructions |
|
|
|
- Be {{char}}, naturally and consistently |
|
- React realistically to {{user}}, never control their actions |
|
- Stay in character at all times |
|
|
|
or something similar, just make sure to add: ### Roleplay Instructions |
|
|
|
this model is uncensored, maybe too much... in RP scenario (for me) |
|
|
|
dataset: |
|
|
|
C2logs that I cleaned a long time ago |
|
Freedom RP, but it seems it’s already removed from HF |
|
Stories from Reddit |
|
Gemma data from: argilla-warehouse/magpie-ultra-v1.0-gemma, just a small subset |
|
Reflection data, from here: PJMixers-Dev/Weyaxi_HelpSteer-filtered-Reflection-Gemini-1.5-Flash-ShareGPT. It’s generated by Gemini, and I was like, “Oh, I can make a Google-themed model with this and Gemma data.” |
|
Toxic data: NobodyExistsOnTheInternet/ToxicQAFinal to make it toxic |
|
And lastly, just my dump—RP, general, etc., with some of it also generated by Gemini. |
|
|
|
so yeah, most of the data is from Google, and only the RP data is from Claude. |
|
|
|
you can expect some differences in terms of style (a lot of markdown), but don’t expect this model to be as smart as the instruct |
|
|
|
Feedback is greatly appreciated for future improvements (hopefully) |
|
|
|
Technical Details: |
|
|
|
Base model |
|
v |
|
finetuned the lm_head, embed_tokens and first layer (0) |
|
v |
|
finetune it again, layer 1-2 |
|
v |
|
again, but this time using Lora, 64 rank |
|
v |
|
then merge the lora |
|
--- |
|
the abliterated instruct |
|
v |
|
same, finetuned the lm_head, embed_tokens and first layer (0) |
|
v |
|
still the same, finetune it again, layer 1-2 |
|
v |
|
finetune middle layers |
|
v |
|
merged the previous Lora with this finetuned abliterated model |
|
--- |
|
finnaly, merge the two model using ties |
|
|
|
--- |
|
## Use with llama.cpp |
|
Install llama.cpp through brew (works on Mac and Linux) |
|
|
|
```bash |
|
brew install llama.cpp |
|
|
|
``` |
|
Invoke the llama.cpp server or the CLI. |
|
|
|
### CLI: |
|
```bash |
|
llama-cli --hf-repo Triangle104/Llama-3.1-Jamet-8B-MK.I-Q4_K_M-GGUF --hf-file llama-3.1-jamet-8b-mk.i-q4_k_m.gguf -p "The meaning to life and the universe is" |
|
``` |
|
|
|
### Server: |
|
```bash |
|
llama-server --hf-repo Triangle104/Llama-3.1-Jamet-8B-MK.I-Q4_K_M-GGUF --hf-file llama-3.1-jamet-8b-mk.i-q4_k_m.gguf -c 2048 |
|
``` |
|
|
|
Note: You can also use this checkpoint directly through the [usage steps](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#usage) listed in the Llama.cpp repo as well. |
|
|
|
Step 1: Clone llama.cpp from GitHub. |
|
``` |
|
git clone https://github.com/ggerganov/llama.cpp |
|
``` |
|
|
|
Step 2: Move into the llama.cpp folder and build it with `LLAMA_CURL=1` flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux). |
|
``` |
|
cd llama.cpp && LLAMA_CURL=1 make |
|
``` |
|
|
|
Step 3: Run inference through the main binary. |
|
``` |
|
./llama-cli --hf-repo Triangle104/Llama-3.1-Jamet-8B-MK.I-Q4_K_M-GGUF --hf-file llama-3.1-jamet-8b-mk.i-q4_k_m.gguf -p "The meaning to life and the universe is" |
|
``` |
|
or |
|
``` |
|
./llama-server --hf-repo Triangle104/Llama-3.1-Jamet-8B-MK.I-Q4_K_M-GGUF --hf-file llama-3.1-jamet-8b-mk.i-q4_k_m.gguf -c 2048 |
|
``` |
|
|