zarakiquemparte
commited on
Commit
·
1266055
1
Parent(s):
bed6b0f
Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,72 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: other
|
3 |
+
tags:
|
4 |
+
- llama-2
|
5 |
+
---
|
6 |
+
# Model Card: Pygmalion LRP Grad L2 7B
|
7 |
+
This model uses [Pygmalion 2 7B](https://huggingface.co/PygmalionAI/pygmalion-2-7b) as a base and merged with LimaRP(52%) Lora original from [Suikamelon](https://huggingface.co/lemonilia) customized with Metharme format
|
8 |
+
|
9 |
+
This merge of Lora with Model was done with this [script](https://github.com/zarakiquemparte/zaraki-tools/blob/main/apply-lora-weight-ltl.py)
|
10 |
+
|
11 |
+
- Credits to [Suikamelon](https://huggingface.co/lemonilia) for the LimaRP dataset
|
12 |
+
- Credits to [Pygmalion AI](https://huggingface.co/PygmalionAI) for the base model
|
13 |
+
|
14 |
+
|
15 |
+
## Weights of Lora merge:
|
16 |
+
|
17 |
+
```
|
18 |
+
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.5,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
|
19 |
+
```
|
20 |
+
|
21 |
+
## Prompting
|
22 |
+
|
23 |
+
The model has been trained on prompts using three different roles, which are denoted by the following tokens: `<|system|>`, `<|user|>` and `<|model|>`.
|
24 |
+
|
25 |
+
The `<|system|>` prompt can be used to inject out-of-channel information behind the scenes, while the `<|user|>` prompt should be used to indicate user input.
|
26 |
+
The `<|model|>` token should then be used to indicate that the model should generate a response. These tokens can happen multiple times and be chained up to
|
27 |
+
form a conversation history.
|
28 |
+
|
29 |
+
### Prompting example
|
30 |
+
|
31 |
+
The system prompt has been designed to allow the model to "enter" various modes and dictate the reply length. Here's an example:
|
32 |
+
|
33 |
+
```
|
34 |
+
<|system|>Enter RP mode. Pretend to be {{char}} whose persona follows:
|
35 |
+
{{persona}}
|
36 |
+
|
37 |
+
You shall reply to the user while staying in character, and generate long responses.
|
38 |
+
```
|
39 |
+
|
40 |
+
## Bias, Risks, and Limitations
|
41 |
+
|
42 |
+
The intended use-case for this model is fictional writing for entertainment purposes. Any other sort of usage is out of scope.
|
43 |
+
|
44 |
+
As such, it was **not** fine-tuned to be safe and harmless: the base model _and_ this fine-tune have been trained on data known to contain profanity and texts that
|
45 |
+
are lewd or otherwise offensive. It may produce socially unacceptable or undesirable text, even if the prompt itself does not include anything explicitly offensive.
|
46 |
+
Outputs might often be factually wrong or misleading.
|
47 |
+
|
48 |
+
## Training Details
|
49 |
+
|
50 |
+
This model use LimaRP by [Suikamelon](https://huggingface.co/lemonilia) converted to metharme prompt format
|
51 |
+
This model is merged and can be reproduced using the tools mentioned above. Please refer to all provided links for extra model-specific details.
|
52 |
+
|
53 |
+
## Training Hyperparameters
|
54 |
+
|
55 |
+
```
|
56 |
+
load_in_8bit: true
|
57 |
+
adapter: lora
|
58 |
+
lora_r: 8
|
59 |
+
lora_alpha: 16
|
60 |
+
lora_dropout: 0.01
|
61 |
+
gradient_accumulation_steps: 1
|
62 |
+
micro_batch_size: 1
|
63 |
+
num_epochs: 3
|
64 |
+
optimizer: adamw_torch
|
65 |
+
lr_scheduler: cosine
|
66 |
+
learning_rate: 0.000065
|
67 |
+
bf16: true
|
68 |
+
tf32: true
|
69 |
+
```
|
70 |
+
|
71 |
+
## Environmental Impact
|
72 |
+
Finetuning the LimaRP Lora on 1 x NVIDIA L40 takes about 1h45m
|