Update README.md
Browse files
README.md
CHANGED
@@ -8,6 +8,8 @@ base_model: h2oai/h2o-danube3-4b-base
|
|
8 |
---
|
9 |
# LittleInstructionJudge-4B-v0.1
|
10 |
|
|
|
|
|
11 |
A BAdam fine-tuned danube3-4b-base to do one thing, and one thing only: Being a lightweight LLM-as-a-Judge for instruction prompts.
|
12 |
|
13 |
The purpose of training this model is to have a small language model that can filter away the worst offenders when creating datasets using the Magpie method in hardware constrained environments.
|
@@ -33,6 +35,10 @@ This is the instruction I need you to judge:
|
|
33 |
{{instruction}}
|
34 |
```
|
35 |
|
|
|
|
|
|
|
|
|
36 |
### LLama-Factory training config
|
37 |
|
38 |
```yaml
|
|
|
8 |
---
|
9 |
# LittleInstructionJudge-4B-v0.1
|
10 |
|
11 |
+
**Update:** The instruct_reward is all out of wack due to a misunderstanding on my part caused by lazyness. The other values are fine, though not as useful if I had actually just read more. Any model with the right prompt is better. Even [CleverQwen2-1.5B](https://huggingface.co/trollek/CleverQwen2-1.5B). The next version will be better.
|
12 |
+
|
13 |
A BAdam fine-tuned danube3-4b-base to do one thing, and one thing only: Being a lightweight LLM-as-a-Judge for instruction prompts.
|
14 |
|
15 |
The purpose of training this model is to have a small language model that can filter away the worst offenders when creating datasets using the Magpie method in hardware constrained environments.
|
|
|
35 |
{{instruction}}
|
36 |
```
|
37 |
|
38 |
+
### Quants
|
39 |
+
|
40 |
+
* [mradermacher/LittleInstructionJudge-4B-v0.1-GGUF](https://huggingface.co/mradermacher/LittleInstructionJudge-4B-v0.1-GGUF)
|
41 |
+
|
42 |
### LLama-Factory training config
|
43 |
|
44 |
```yaml
|