cstr commited on
Commit
13cb50f
·
verified ·
1 Parent(s): 3b7821b

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +42 -0
README.md ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ - de
5
+ license: apache-2.0
6
+ tags:
7
+ - text-generation-inference
8
+ - transformers
9
+ - unsloth
10
+ - llama
11
+ - trl
12
+ - orpo
13
+ base_model: cstr/phi-3-orpo-v8_16
14
+ ---
15
+
16
+ # Model details
17
+
18
+ These are the q4 in GGUF of a quick experiment on llamafied phi-3 with only 1000 orpo steps from an azureml translated german orca binarized-dataset (johannhartmann/mistralorpo), with original phi-3 prompt template. The immediate result is not really good, but also not bad enough to disencourage further experiments.
19
+
20
+ # Benchmark results
21
+
22
+ This was an experiment on a german dataset snippet which, as expected, worsened results on english benchmarks:
23
+
24
+ | Metric |Value|
25
+ |---------------------------------|----:|
26
+ |Avg. |64.40|
27
+ |AI2 Reasoning Challenge (25-Shot)|60.41|
28
+ |HellaSwag (10-Shot) |78.37|
29
+ |MMLU (5-Shot) |65.26|
30
+ |TruthfulQA (0-shot) |49.76|
31
+ |Winogrande (5-shot) |70.24|
32
+ |GSM8k (5-shot) |62.32|
33
+
34
+ On german EQ-Bench (v2_de) 51.82 (insignificant over 51.41 for original llamafied but significantly better than intermediate cstr/phi-3-orpo-v8_16 which after initial 150 test steps achieved 46.38) but with still only 164/171 correctly parsed.
35
+
36
+ Note: We can improve the correctness of parsing, i.a., by only a few SFT steps, as shown with cas/phi3-mini-4k-llamafied-sft-v3 (170/171 correct but with then only 39.46 score in v2_de, which was also an experiment in changing the prompt template).
37
+ All that was quickly done with bnb and q4 quants only, which might, in theory, affect especially such small dense models significantly.
38
+ But it served the intention for both proof-of-concept-experiments at least. Probably it would easily be possible to further improve results, but that would take some time and compute.
39
+
40
+ # Training setup
41
+
42
+ This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.