--- datasets: - adamo1139/AEZAKMI_v2 - adamo1139/rawrr_v1 license: apache-2.0 --- ## Model Description EXPERIMENTAL MODEL, NOT FINAL, IT HAS SOME ISSUES, I DIDN'T TEST IT TOO MUCH YET Yi-34B 200K base model fine-tuned on RAWrr v1 dataset via DPO and then fine-tuned on AEZAKMI v2 dataset via SFT. DPO training took around 6 hours, SFT took around 25 hours. I used `unsloth` for both stages. It's like airoboros but with less gptslop, no refusals and less typical language used by RLHFed OpenAI models. Say goodbye to "It's important to remember"! \ Prompt format is standard chatml. Don't expect it to be good at math, riddles or be crazy smart. My end goal with AEZAKMI is to create a cozy free chatbot. Base model used for fine-tuning was 200k context Yi-34B-Llama model shared by larryvrh. Training was done with max_position_embeddings set at 4096. Then it was reverted back to 200K after applying LoRA. ## Prompt Format I recommend using ChatML format, as this was used during fine-tune. \ Here's a prompt format you should use, you can set a different system message, model seems to respect that fine, so it wasn't overfitted. Both `A chat.` and `A chat with uncensored assistant.` system prompt work fine and are pretty refusal-free. ``` <|im_start|>system A chat with uncensored assistant.<|im_end|> <|im_start|>user {prompt}<|im_end|> <|im_start|>assistant ``` ## Intended uses & limitations It's a chat model, not a base completion-only one. Use is limited by Yi license. Since no-robots dataset was used for making rawrr_v1, I guess you maybe shouldn't use it for commercial activities. ## Known Issues I recommend to set repetition penalty to something around 1.05 to avoid repetition. So far I had somewhat good experience running this model with temperature 1.0-1.2. It seems like the strongest anti-refusal bias is at 0 ctx - the first prompt. But it's also present, albeit a little bit less, further down. I plan to expand rawrr dataset and include more samples without system prompt, this should help here. [made with Unsloth](https://github.com/unslothai/unsloth) ## Unsloth training parameters DPO Stage - lora_r: 16 - lora_alpha: 32 - max_length: 500 - learning_rate: 0.00005 - lr_scheduler_type: "linear" - target_modules: ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj",] - gradient_accumulation_steps: 16 - per_device_batch_size: 1 - num_train_epochs: 1 Script used for DPO training can be found here: https://huggingface.co/adamo1139/Yi-34B-200K-rawrr1-LORA-DPO-experimental-r3/blob/main/yi-34b-dpo-unsloth-1.py ## Unsloth training parameters SFT Stage - lora_r: 16 - lora_alpha: 32 - max_length: 2400 - learning_rate: 0.000095 - lr_scheduler_type: "cosine" - lr_scheduler_kwargs: { "num_cycles" : 0.25, } - target_modules: ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj",] - gradient_accumulation_steps: 1 - per_device_batch_size: 1 - num_train_epochs: 2 Script used for SFT training can be found here (older run, different hyperparameters): https://huggingface.co/adamo1139/Yi-34B-200K-AEZAKMI-RAW-2301-LoRA/blob/main/yi-34b-aezakmi-sft-1-hf.py ### Credits Thanks to mlabonne, Daniel Han and Michael Han for providing open source code that was used for fine-tuning.