tohur
/

natsumura-assistant-1.1-llama-3.1-8b-GGUF

GGUF

Inference Endpoints

conversational

Model card Files Files and versions Community

tohur commited on Aug 3, 2024

Commit

6b9ecab

verified ·

1 Parent(s): 07e3a8e

Update README.md

Browse files

Files changed (1) hide show

README.md +108 -3

README.md CHANGED Viewed

@@ -1,3 +1,108 @@
----
-license: llama3.1
----

+---
+base_model: tohur/natsumura-assistant-1.1-llama-3.1-8b
+license: llama3.1
+datasets:
+- tohur/natsumura-identity
+- cognitivecomputations/dolphin
+- tohur/ultrachat_uncensored_sharegpt
+- cognitivecomputations/dolphin-coder
+- tohur/OpenHermes-2.5-Uncensored-ShareGPT
+- tohur/Internal-Knowledge-Map-sharegpt
+- m-a-p/Code-Feedback
+- m-a-p/CodeFeedback-Filtered-Instruction
+- cognitivecomputations/open-instruct-uncensored
+- microsoft/orca-math-word-problems-200k
+---
+# natsumura-assistant-1.1-llama-3.1-8b-GGUF
+  This is my Storytelling/RP model for my Natsumura series of 8b models. This model is finetuned on storytelling and roleplaying datasets so should be a great model
+  to use for character chatbots in applications such as Sillytavern, Agnai, RisuAI and more. And should be a great model to use for fictional writing. Up to a 128k context.
+- **Developed by:** Tohur
+- **License:** llama3.1
+- **Finetuned from model :** meta-llama/Meta-Llama-3.1-8B-Instruct
+  This model is based on meta-llama/Meta-Llama-3.1-8B-Instruct, and is governed by [Llama 3.1 Community License](https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/LICENSE)
+  Natsumura is uncensored, which makes the model compliant.It will be highly compliant with any requests, even unethical ones.
+  You are responsible for any content you create using this model. Please use it responsibly.
+## Usage
+If you are unsure how to use GGUF files, refer to one of [TheBloke's
+READMEs](https://huggingface.co/TheBloke/KafkaLM-70B-German-V0.1-GGUF) for
+more details, including on how to concatenate multi-part files.
+## Provided Quants
+(sorted by quality.)
+| Quant | Notes |
+|:-----|:-----|
+| Q2_K |
+| Q3_K_S |
+| Q3_K_M | lower quality |
+| Q3_K_L |  |
+| Q4_0 |  |
+| Q4_K_S | fast, recommended |
+| Q4_K_M | fast, recommended |
+| Q5_0 |  |
+| Q5_K_S |  |
+| Q5_K_M |  |
+| Q6_K | very good quality |
+| Q8_0 | fast, best quality |
+| f16 | 16 bpw, overkill |
+# use in ollama
+```
+ollama pull Tohur/natsumura-storytelling-rp-llama-3.1
+```
+# Datasets used:
+- tohur/natsumura-identity
+- cognitivecomputations/dolphin
+- tohur/ultrachat_uncensored_sharegpt
+- cognitivecomputations/dolphin-coder
+- tohur/OpenHermes-2.5-Uncensored-ShareGPT
+- tohur/Internal-Knowledge-Map-sharegpt
+- m-a-p/Code-Feedback
+- m-a-p/CodeFeedback-Filtered-Instruction
+- cognitivecomputations/open-instruct-uncensored
+- microsoft/orca-math-word-problems-200k
+The following parameters were used in [Llama Factory](https://github.com/hiyouga/LLaMA-Factory) during training:
+- per_device_train_batch_size=2
+- gradient_accumulation_steps=4
+- lr_scheduler_type="cosine"
+- logging_steps=10
+- warmup_ratio=0.1
+- save_steps=1000
+- learning_rate=2e-5
+- num_train_epochs=3.0
+- max_samples=500
+- max_grad_norm=1.0
+- quantization_bit=4
+- loraplus_lr_ratio=16.0
+- fp16=True
+## Inference
+I use the following settings for inference:
+```
+"temperature": 1.0,
+"repetition_penalty": 1.05,
+"top_p": 0.95
+"top_k": 40
+"min_p": 0.05
+```
+# Prompt template: llama3
+```
+<|begin_of_text|><|start_header_id|>system<|end_header_id|>
+{system_prompt}<|eot_id|><|start_header_id|>user<|end_header_id|>
+{input}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
+{output}<|eot_id|>
+```