MiniSymposium-Demo / README.md
kalomaze's picture
Update README.md
5f7be4d
|
raw
history blame
5.57 kB
metadata
license: apache-2.0

MiniSymposium Demo Release

MiniSymposium is an experimental QLora model that I created based on Mistral 7b. I created it attempting to achieve these goals:

  1. Demonstrate the untapped potential of using a small, focused dataset of handwritten examples instead of training on a large amount of synthetic GPT outputs
  2. Create a dataset that allows the model to explore different possible answers from multiple perspectives before reaching a conclusion.
  3. Develop a model that performs well across various prompt formats, rather than overfitting to a specific kind of format

The current trend in QLora/Lora-based finetuning (and finetuning in general for local LLMs) is to use large synthetic datasets. These are usually GPT datasets that are trained with higher learning rates.

However, I believe there is a lot of potential in using small, hand-written datasets with low learning rates, even if it's for general-purpose instruction following, as long as you train it for many epochs on a learning rate low enough to avoid overfitting.

This approach, I hypothesize, helps the model to leam the deeper pattem of instruction following instead of fitting toward shallow data biases (like "As an AI made by OpenAI" and other GPT-isms) that ignore deeper instruction following patterns.

My initial configuration for this QLora model used a constant learning rate of 1e-6 (0.000001), which resulted in overfitting after approximately 100 epochs. The model started reproducing the original dataset amost verbatim and exhibited poor generalization across different prompt formats, including obvious hallucinations & also Chinese language outputs for some reason.

However, turning down the learning rate to 1/10th of (1e-7, which is 0.0000001) significantly improved the model. I trained for about ~10 hours on my RTX 3060 to 600 epochs; I think it's still a little undertrained, but I encourage people to try the demo model out.

Inference Examples

image/png

image/png

Dataset Examples

The dataset looked like this, and is about 200 lines worth of data (13 prompts total). The dataset has the 'special' format that sometimes goes through multiple perspectives to the same answer before reaching a conclusion.

image/png

image/png

Prompt Format

The model design is reminiscent of the Alpaca format, except it's designed to be adaptable on the fly to create new roles as necessary.

It's meant to be highly adaptable to different prompting formats, so I encourage you to get creative. But as a baseline, you can try:

### New Query
*request goes here*

### Complete Response

But a query for programming, for example, might fare better with:

### New Query
*programming request*

### Software Engineer

You could even try something like, ### Stack Overflow Solution instead; the end goal is to make a model that adapts to unique instruction formats and is overall more coherent at following those instructions.

LORA Hyperparameter Setup (for Axolotl)

The MiniSymposium_axolotl_demo.yml is included as it was in this repo, but let's elaborate on why I made the choices I did for interpretability:

This constant LR helped avoid overfitting, as mentioned earlier in the Model card: image/png

This blog post claims from their testing that lower Lora ranks (and an alpha size 2x whatever your Lora rank is set to) seemed to be harder to overfit, with the downside of less nuanced data learning. Considering the purpose of the model is to be adaptable and general-purpose, and we aren't attempting to add domain specific knowledge (as this would cause a higher degree of catastrophic forgetting), I opted to go for: image/png

Sample packing was turned off to prevent errors while attempting to train in Axolotl (dataset too smol): image/png

A batch size of 1 was used because I am GPU-poor have 12GB of VRAM (RTX 3060): image/png

And none of the dataset was used for evaluation loss, which was turned off entirely. It's too small of a dataset to be a meaningful metric, I assume. (I manually tested two merges at different points instead). image/png

Notice for SillyTavern users

The typical Alpaca format that has ### Instruction and ### Response will literally interpret what you say as an instruction in a character chat. I would recommend changing this to have the character names instead, e.g {{user}} in place of Instruction, and {{char}} in place of Response. That seemed to improve results as a result, and the character card is also followed well.