SnakyMcSnekFace
/

Psyfighter2-13B-vore

Text Generation

Not-For-All-Audiences

text-generation-inference

Model card Files Files and versions Community

SnakyMcSnekFace commited on Jun 10, 2024

Commit

f19846b

·

verified ·

1 Parent(s): 4a7d23e

Updated Readme.txt

Files changed (1) hide show

README.md +5 -3

README.md CHANGED Viewed

@@ -52,20 +52,21 @@ By design, this model has a strong vorny bias. It's not intended for use by anyo
 The model was fine-tuned using a [rank-stabilized](https://arxiv.org/abs/2312.03732) [QLoRA adapter](https://arxiv.org/abs/2305.14314). Training was performed using [Unsloth AI](https://github.com/unslothai/unsloth) library on `Ubuntu 22.04.4 LTS` with `CUDA 12.1` and `Pytorch 2.3.0`.
-The total training time on NVIDIA GeForce RTX 4060 Ti is about 24 hours.
 After training, the adapter weights were merged into the dequantized model as described in [ChrisHayduk's GitHub gist](https://gist.github.com/ChrisHayduk/1a53463331f52dca205e55982baf9930).
 The quantized version of the model was prepared using [llama.cpp](https://github.com/ggerganov/llama.cpp).
-### LoRa adapter configuration
 - Rank: 64
 - Alpha: 16
 - Dropout rate: 0.1
-- Target weights: `["q_proj", "k_proj", "o_proj", "gate_proj", "up_proj"]`,
 - `use_rslora=True`
 ### Domain adaptation
@@ -91,6 +92,7 @@ The raw-text stories in dataset were edited as follows:
 - Batch size: 1
 - Gradient accumulation steps: 1
 #### Plots

 The model was fine-tuned using a [rank-stabilized](https://arxiv.org/abs/2312.03732) [QLoRA adapter](https://arxiv.org/abs/2305.14314). Training was performed using [Unsloth AI](https://github.com/unslothai/unsloth) library on `Ubuntu 22.04.4 LTS` with `CUDA 12.1` and `Pytorch 2.3.0`.
+The total training time on NVIDIA GeForce RTX 4060 Ti is about 26 hours.
 After training, the adapter weights were merged into the dequantized model as described in [ChrisHayduk's GitHub gist](https://gist.github.com/ChrisHayduk/1a53463331f52dca205e55982baf9930).
 The quantized version of the model was prepared using [llama.cpp](https://github.com/ggerganov/llama.cpp).
+### QLoRa adapter configuration
 - Rank: 64
 - Alpha: 16
 - Dropout rate: 0.1
+- Target weights: `["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]`,
 - `use_rslora=True`
+Targeting all projections for QLoRA adapter resulted in the smallest loss compared to other combinations, even compared to larger rank adapters.
 ### Domain adaptation
 - Batch size: 1
 - Gradient accumulation steps: 1
+The training takes ~24 hours on NVIDIA GeForce RTX 4060 Ti.
 #### Plots