Add custom README
Browse files
README.md
CHANGED
@@ -7,29 +7,34 @@ tags:
|
|
7 |
- trl
|
8 |
- sft
|
9 |
license: apache-2.0
|
|
|
|
|
|
|
10 |
---
|
11 |
|
12 |
-
#
|
13 |
|
14 |
-
This model
|
15 |
-
|
16 |
-
|
17 |
-
## Quick start
|
18 |
|
19 |
-
|
20 |
-
|
21 |
-
|
22 |
-
question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
|
23 |
-
generator = pipeline("text-generation", model="Alepach/notHumpback-M1", device="cuda")
|
24 |
-
output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
|
25 |
-
print(output["generated_text"])
|
26 |
-
```
|
27 |
|
28 |
-
|
|
|
|
|
29 |
|
|
|
|
|
|
|
|
|
30 |
|
|
|
|
|
31 |
|
32 |
-
|
|
|
|
|
33 |
|
34 |
### Framework versions
|
35 |
|
@@ -41,7 +46,18 @@ This model was trained with SFT.
|
|
41 |
|
42 |
## Citations
|
43 |
|
|
|
44 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
45 |
|
46 |
Cite TRL as:
|
47 |
|
|
|
7 |
- trl
|
8 |
- sft
|
9 |
license: apache-2.0
|
10 |
+
datasets:
|
11 |
+
- OpenAssistant/oasst1
|
12 |
+
- allenai/c4
|
13 |
---
|
14 |
|
15 |
+
# notHumpback-M1
|
16 |
|
17 |
+
This model follows the Humpback architecture, proposed in the paper [Self-Alignment with Instruction Backtranslation](https://arxiv.org/pdf/2308.06259)
|
18 |
+
by Li et al.
|
|
|
|
|
19 |
|
20 |
+
It represents the resulting model after the first iteration of self-curation, which is trained on a small amount of gold data
|
21 |
+
and a set of generated data curated by the ["seed model"](https://huggingface.co/Alepach/notHumpback-M0).
|
|
|
|
|
|
|
|
|
|
|
|
|
22 |
|
23 |
+
This model can be used for instruction-following.
|
24 |
+
It may also be used to, again, score the instruction-response pairs
|
25 |
+
generated by the ["backward model"](https://huggingface.co/Alepach/notHumpback-Myx) for a second iteration of self-curation.
|
26 |
|
27 |
+
Humpback uses instruction backtranslation on a web corpus to generate input-output pairs (self-augmentation),
|
28 |
+
creating a richer dataset for fine-tuning models without the need for additional manual annotation.
|
29 |
+
The model then iteratively curates the created dataset, scoring the pairs by quality, and is then finetuned on the resulting subset
|
30 |
+
of all pairs with the highest possible score (self-curation).
|
31 |
|
32 |
+
Varying from the original paper, this model is a fine-tuned version of [meta-llama/Llama-3.2-3B](https://huggingface.co/meta-llama/Llama-3.2-3B).
|
33 |
+
It has been trained using [TRL](https://github.com/huggingface/trl).
|
34 |
|
35 |
+
The dataset used to train this model is a combination of data sampled from the [oasst1](https://huggingface.co/datasets/OpenAssistant/oasst1)
|
36 |
+
dataset and the synthetic dataset which was mentioned above. Latter has been created by applying self-augmentation and self-curation
|
37 |
+
on 502k entries from the english subset ("en") of the [c4](https://huggingface.co/datasets/allenai/c4) dataset.
|
38 |
|
39 |
### Framework versions
|
40 |
|
|
|
46 |
|
47 |
## Citations
|
48 |
|
49 |
+
Original paper:
|
50 |
|
51 |
+
```bibtex
|
52 |
+
@misc{li2023selfalignment,
|
53 |
+
title={Self-Alignment with Instruction Backtranslation},
|
54 |
+
author={Xian Li and Ping Yu and Chunting Zhou and Timo Schick and Luke Zettlemoyer and Omer Levy and Jason Weston and Mike Lewis},
|
55 |
+
year={2023},
|
56 |
+
eprint={2308.06259},
|
57 |
+
archivePrefix={arXiv},
|
58 |
+
primaryClass={cs.CL}
|
59 |
+
}
|
60 |
+
```
|
61 |
|
62 |
Cite TRL as:
|
63 |
|