|
--- |
|
base_model: tohur/natsumura-assistant-1.1-llama-3.1-8b |
|
license: llama3.1 |
|
datasets: |
|
- tohur/natsumura-identity |
|
- cognitivecomputations/dolphin |
|
- tohur/ultrachat_uncensored_sharegpt |
|
- cognitivecomputations/dolphin-coder |
|
- tohur/OpenHermes-2.5-Uncensored-ShareGPT |
|
- tohur/Internal-Knowledge-Map-sharegpt |
|
- m-a-p/Code-Feedback |
|
- m-a-p/CodeFeedback-Filtered-Instruction |
|
- cognitivecomputations/open-instruct-uncensored |
|
- microsoft/orca-math-word-problems-200k |
|
--- |
|
# natsumura-assistant-1.1-llama-3.1-8b-GGUF |
|
This is the main model for my Natsumura series of 8B models. Updated and further finetuned to provide a great expirence.This is an general purpose assistant model with up to 128k context. |
|
|
|
- **Developed by:** Tohur |
|
- **License:** llama3.1 |
|
- **Finetuned from model :** meta-llama/Meta-Llama-3.1-8B-Instruct |
|
|
|
This model is based on meta-llama/Meta-Llama-3.1-8B-Instruct, and is governed by [Llama 3.1 Community License](https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/LICENSE) |
|
Natsumura is uncensored, which makes the model compliant.It will be highly compliant with any requests, even unethical ones. |
|
You are responsible for any content you create using this model. Please use it responsibly. |
|
|
|
|
|
## Usage |
|
|
|
If you are unsure how to use GGUF files, refer to one of [TheBloke's |
|
READMEs](https://huggingface.co/TheBloke/KafkaLM-70B-German-V0.1-GGUF) for |
|
more details, including on how to concatenate multi-part files. |
|
|
|
## Provided Quants |
|
|
|
(sorted by quality.) |
|
|
|
| Quant | Notes | |
|
|:-----|:-----| |
|
| Q2_K | |
|
| Q3_K_S | |
|
| Q3_K_M | lower quality | |
|
| Q3_K_L | | |
|
| Q4_0 | | |
|
| Q4_K_S | fast, recommended | |
|
| Q4_K_M | fast, recommended | |
|
| Q5_0 | | |
|
| Q5_K_S | | |
|
| Q5_K_M | | |
|
| Q6_K | very good quality | |
|
| Q8_0 | fast, best quality | |
|
| f16 | 16 bpw, overkill | |
|
|
|
# use in ollama |
|
``` |
|
ollama pull Tohur/natsumura-storytelling-rp-llama-3.1 |
|
``` |
|
|
|
# Datasets used: |
|
- tohur/natsumura-identity |
|
- cognitivecomputations/dolphin |
|
- tohur/ultrachat_uncensored_sharegpt |
|
- cognitivecomputations/dolphin-coder |
|
- tohur/OpenHermes-2.5-Uncensored-ShareGPT |
|
- tohur/Internal-Knowledge-Map-sharegpt |
|
- m-a-p/Code-Feedback |
|
- m-a-p/CodeFeedback-Filtered-Instruction |
|
- cognitivecomputations/open-instruct-uncensored |
|
- microsoft/orca-math-word-problems-200k |
|
|
|
The following parameters were used in [Llama Factory](https://github.com/hiyouga/LLaMA-Factory) during training: |
|
- per_device_train_batch_size=2 |
|
- gradient_accumulation_steps=4 |
|
- lr_scheduler_type="cosine" |
|
- logging_steps=10 |
|
- warmup_ratio=0.1 |
|
- save_steps=1000 |
|
- learning_rate=2e-5 |
|
- num_train_epochs=3.0 |
|
- max_samples=500 |
|
- max_grad_norm=1.0 |
|
- quantization_bit=4 |
|
- loraplus_lr_ratio=16.0 |
|
- fp16=True |
|
|
|
## Inference |
|
|
|
I use the following settings for inference: |
|
``` |
|
"temperature": 1.0, |
|
"repetition_penalty": 1.05, |
|
"top_p": 0.95 |
|
"top_k": 40 |
|
"min_p": 0.05 |
|
``` |
|
|
|
# Prompt template: llama3 |
|
|
|
``` |
|
<|begin_of_text|><|start_header_id|>system<|end_header_id|> |
|
|
|
{system_prompt}<|eot_id|><|start_header_id|>user<|end_header_id|> |
|
|
|
{input}<|eot_id|><|start_header_id|>assistant<|end_header_id|> |
|
|
|
{output}<|eot_id|> |
|
``` |