license: apache-2.0
datasets:
- trollek/SimpleInstructionJudge-v01
language:
- en
base_model: h2oai/h2o-danube3-4b-base
LittleInstructionJudge-4B-v0.1
Update: The instruct_reward is all out of wack due to a misunderstanding on my part caused by lazyness. The other values are fine, though not as useful if I had actually just read more. Any model with the right prompt is better. Even CleverQwen2-1.5B. The next version will be better.
A BAdam fine-tuned danube3-4b-base to do one thing, and one thing only: Being a lightweight LLM-as-a-Judge for instruction prompts.
The purpose of training this model is to have a small language model that can filter away the worst offenders when creating datasets using the Magpie method in hardware constrained environments.
Important note: For reasons I don't know, I have issues running models like danube3 in LM Studio. Ollama runs them fine though. LMS reports my VRAM as expected, mostly free since it can't load the model, but unexpectedly only about 90 kB unused RAM, even though it knows damn well that there are over 20 gigs worth of memory real estate available.
Promt template
Judge the instruction below using the following json format:
{
"intent": <the intent of the users instruction>,
"knowledge": <the knowledge required to respond to the instruction>,
"task_category": <the primary category that the instruction can be put in>,
"other_task_category": [<a list of other task categories that the instruction belongs to>],
"difficulty": <a rating of easy, medium or hard>,
"quality_explanation": <an explanation of the quality of the users instruction>,
"instruct_reward": <an integer between -10 and 10 reflecting the quality of the instruction>
}
This is the instruction I need you to judge:
{{instruction}}
Quants
LLama-Factory training config
### model
model_name_or_path: danube3/chatml-base
### method
stage: sft
do_train: true
finetuning_type: full
use_badam: true
badam_switch_mode: ascending
badam_switch_interval: 50
badam_start_block: 6
badam_verbose: 1
seed: 8
### dataset
dataset: balanced_instruction_judge
template: chatml
cutoff_len: 4096
overwrite_cache: false
preprocessing_num_workers: 12
### output
output_dir: danube3/trained/LittleInstructionJudge-4B-v0.1
logging_steps: 5
save_steps: 1
save_strategy: epoch
plot_loss: true
overwrite_output_dir: false
### train
per_device_train_batch_size: 1
gradient_accumulation_steps: 4
learning_rate: 0.0000015
num_train_epochs: 1
lr_scheduler_type: cosine
warmup_ratio: 0.01
pure_bf16: true
flash_attn: fa2
### eval
val_size: 0.02
per_device_eval_batch_size: 1
eval_strategy: steps
eval_steps: 1000
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
0.4062 | 0.0441 | 1000 | 0.3899 |
0.3346 | 0.0882 | 2000 | 0.3520 |
0.3192 | 0.1323 | 3000 | 0.3342 |
0.3007 | 0.1763 | 4000 | 0.3239 |
0.2792 | 0.2204 | 5000 | 0.3165 |
0.2957 | 0.2645 | 6000 | 0.3111 |
0.3254 | 0.3086 | 7000 | 0.3064 |
0.3058 | 0.3527 | 8000 | 0.3033 |
0.298 | 0.3968 | 9000 | 0.3011 |
0.3157 | 0.4409 | 10000 | 0.2995 |
0.3314 | 0.4849 | 11000 | 0.2979 |
0.301 | 0.5290 | 12000 | 0.2965 |
0.2927 | 0.5731 | 13000 | 0.2957 |
0.3199 | 0.6172 | 14000 | 0.2950 |
0.2924 | 0.6613 | 15000 | 0.2948 |
0.2784 | 0.7054 | 16000 | 0.2945 |
0.3069 | 0.7495 | 17000 | 0.2943 |
0.2813 | 0.7935 | 18000 | 0.2943 |
0.2934 | 0.8376 | 19000 | 0.2942 |
0.2762 | 0.8817 | 20000 | 0.2942 |
0.2792 | 0.9258 | 21000 | 0.2942 |
0.3057 | 0.9699 | 22000 | 0.2942 |