zarakiquemparte
/

pygmalion-lrp-grad-l2-7b

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

zarakiquemparte commited on Sep 7, 2023

Commit

1266055

·

1 Parent(s): bed6b0f

Create README.md

Files changed (1) hide show

README.md +72 -0

README.md ADDED Viewed

	@@ -0,0 +1,72 @@

+---
+license: other
+tags:
+- llama-2
+---
+# Model Card: Pygmalion LRP Grad L2 7B
+This model uses [Pygmalion 2 7B](https://huggingface.co/PygmalionAI/pygmalion-2-7b) as a base and merged with LimaRP(52%) Lora original from [Suikamelon](https://huggingface.co/lemonilia) customized with Metharme format
+This merge of Lora with Model was done with this [script](https://github.com/zarakiquemparte/zaraki-tools/blob/main/apply-lora-weight-ltl.py)
+- Credits to [Suikamelon](https://huggingface.co/lemonilia) for the LimaRP dataset
+- Credits to [Pygmalion AI](https://huggingface.co/PygmalionAI) for the base model
+## Weights of Lora merge:
+```
+0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.5,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
+```
+## Prompting
+The model has been trained on prompts using three different roles, which are denoted by the following tokens: `<|system|>`, `<|user|>` and `<|model|>`.
+The `<|system|>` prompt can be used to inject out-of-channel information behind the scenes, while the `<|user|>` prompt should be used to indicate user input.
+The `<|model|>` token should then be used to indicate that the model should generate a response. These tokens can happen multiple times and be chained up to
+form a conversation history.
+### Prompting example
+The system prompt has been designed to allow the model to "enter" various modes and dictate the reply length. Here's an example:
+```
+<|system|>Enter RP mode. Pretend to be {{char}} whose persona follows:
+{{persona}}
+You shall reply to the user while staying in character, and generate long responses.
+```
+## Bias, Risks, and Limitations
+The intended use-case for this model is fictional writing for entertainment purposes. Any other sort of usage is out of scope.
+As such, it was **not** fine-tuned to be safe and harmless: the base model _and_ this fine-tune have been trained on data known to contain profanity and texts that
+are lewd or otherwise offensive. It may produce socially unacceptable or undesirable text, even if the prompt itself does not include anything explicitly offensive.
+Outputs might often be factually wrong or misleading.
+## Training Details
+This model use LimaRP by [Suikamelon](https://huggingface.co/lemonilia) converted to metharme prompt format
+This model is merged and can be reproduced using the tools mentioned above. Please refer to all provided links for extra model-specific details.
+## Training Hyperparameters
+```
+load_in_8bit: true
+adapter: lora
+lora_r: 8
+lora_alpha: 16
+lora_dropout: 0.01
+gradient_accumulation_steps: 1
+micro_batch_size: 1
+num_epochs: 3
+optimizer: adamw_torch
+lr_scheduler: cosine
+learning_rate: 0.000065
+bf16: true
+tf32: true
+```
+## Environmental Impact
+Finetuning the LimaRP Lora on 1 x NVIDIA L40 takes about 1h45m