trollek
/

LittleInstructionJudge-4B-v0.1

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

trollek commited on Aug 7, 2024

Commit

2aa5b45

·

verified ·

1 Parent(s): 2b7e690

Update README.md

Files changed (1) hide show

README.md +6 -0

README.md CHANGED Viewed

@@ -8,6 +8,8 @@ base_model: h2oai/h2o-danube3-4b-base
 ---
 # LittleInstructionJudge-4B-v0.1
 A BAdam fine-tuned danube3-4b-base to do one thing, and one thing only: Being a lightweight LLM-as-a-Judge for instruction prompts.
 The purpose of training this model is to have a small language model that can filter away the worst offenders when creating datasets using the Magpie method in hardware constrained environments.
@@ -33,6 +35,10 @@ This is the instruction I need you to judge:
 {{instruction}}
 ```
 ### LLama-Factory training config
 ```yaml

 ---
 # LittleInstructionJudge-4B-v0.1
+**Update:** The instruct_reward is all out of wack due to a misunderstanding on my part caused by lazyness. The other values are fine, though not as useful if I had actually just read more. Any model with the right prompt is better. Even [CleverQwen2-1.5B](https://huggingface.co/trollek/CleverQwen2-1.5B). The next version will be better.
 A BAdam fine-tuned danube3-4b-base to do one thing, and one thing only: Being a lightweight LLM-as-a-Judge for instruction prompts.
 The purpose of training this model is to have a small language model that can filter away the worst offenders when creating datasets using the Magpie method in hardware constrained environments.
 {{instruction}}
 ```
+### Quants
+* [mradermacher/LittleInstructionJudge-4B-v0.1-GGUF](https://huggingface.co/mradermacher/LittleInstructionJudge-4B-v0.1-GGUF)
 ### LLama-Factory training config
 ```yaml