cstr
/

phi-3-orpo-v9_16-GGUF

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

cstr commited on May 1, 2024

Commit

13cb50f

·

verified ·

1 Parent(s): 3b7821b

Create README.md

Files changed (1) hide show

README.md +42 -0

README.md ADDED Viewed

	@@ -0,0 +1,42 @@

+---
+language:
+- en
+- de
+license: apache-2.0
+tags:
+- text-generation-inference
+- transformers
+- unsloth
+- llama
+- trl
+- orpo
+base_model: cstr/phi-3-orpo-v8_16
+---
+# Model details
+These are the q4 in GGUF of a quick experiment on llamafied phi-3 with only 1000 orpo steps from an azureml translated german orca binarized-dataset (johannhartmann/mistralorpo), with original phi-3 prompt template. The immediate result is not really good, but also not bad enough to disencourage further experiments.
+# Benchmark results
+This was an experiment on a german dataset snippet which, as expected, worsened results on english benchmarks:
+|             Metric              |Value|
+|---------------------------------|----:|
+|Avg.                             |64.40|
+|AI2 Reasoning Challenge (25-Shot)|60.41|
+|HellaSwag (10-Shot)              |78.37|
+|MMLU (5-Shot)                    |65.26|
+|TruthfulQA (0-shot)              |49.76|
+|Winogrande (5-shot)              |70.24|
+|GSM8k (5-shot)                   |62.32|
+On german EQ-Bench (v2_de) 51.82 (insignificant over 51.41 for original llamafied but significantly better than intermediate cstr/phi-3-orpo-v8_16 which after initial 150 test steps achieved 46.38) but with still only 164/171 correctly parsed.
+Note: We can improve the correctness of parsing, i.a., by only a few SFT steps, as shown with cas/phi3-mini-4k-llamafied-sft-v3 (170/171 correct but with then only 39.46 score in v2_de, which was also an experiment in changing the prompt template).
+All that was quickly done with bnb and q4 quants only, which might, in theory, affect especially such small dense models significantly.
+But it served the intention for both proof-of-concept-experiments at least. Probably it would easily be possible to further improve results, but that would take some time and compute.
+# Training setup
+This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.