Update README.md
Browse files
README.md
CHANGED
@@ -12,18 +12,9 @@ pipeline_tag: text-generation
|
|
12 |
|
13 |
# lgaalves/gpt2_camel_physics-platypus
|
14 |
|
15 |
-
**lgaalves/gpt2_camel_physics-
|
16 |
|
17 |
|
18 |
-
### Benchmark Metrics
|
19 |
-
|
20 |
-
| Metric |lgaalves/gpt2_camel_physics-platypus | gpt2 (base) |
|
21 |
-
|-----------------------|-------|-------|
|
22 |
-
| Avg. | - | 29.9 |
|
23 |
-
| ARC (25-shot) | - | 21.84 |
|
24 |
-
| HellaSwag (10-shot) | - | 31.6 |
|
25 |
-
| MMLU (5-shot) | - | 25.86 |
|
26 |
-
| TruthfulQA (0-shot) | - | 40.67 |
|
27 |
|
28 |
We use state-of-the-art [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) to run the benchmark tests above, using the same version as the HuggingFace LLM Leaderboard. Please see below for detailed instructions on reproducing benchmark results.
|
29 |
|
@@ -68,6 +59,8 @@ the GPT4 generated dataset [lgaalves/camel-physics](https://huggingface.co/datas
|
|
68 |
# Intended uses, limitations & biases
|
69 |
|
70 |
You can use the raw model for text generation or fine-tune it to a downstream task. The model was not extensively tested and may produce false information. It contains a lot of unfiltered content from the internet, which is far from neutral.
|
|
|
|
|
71 |
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
|
72 |
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_lgaalves__gpt2_camel_physics-platypus)
|
73 |
|
|
|
12 |
|
13 |
# lgaalves/gpt2_camel_physics-platypus
|
14 |
|
15 |
+
**lgaalves/gpt2_camel_physics-platypus** is an instruction fine-tuned model based on the GPT-2 transformer architecture.
|
16 |
|
17 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
18 |
|
19 |
We use state-of-the-art [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) to run the benchmark tests above, using the same version as the HuggingFace LLM Leaderboard. Please see below for detailed instructions on reproducing benchmark results.
|
20 |
|
|
|
59 |
# Intended uses, limitations & biases
|
60 |
|
61 |
You can use the raw model for text generation or fine-tune it to a downstream task. The model was not extensively tested and may produce false information. It contains a lot of unfiltered content from the internet, which is far from neutral.
|
62 |
+
|
63 |
+
|
64 |
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
|
65 |
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_lgaalves__gpt2_camel_physics-platypus)
|
66 |
|