Alepach commited on
Commit
7bc8118
·
verified ·
1 Parent(s): 2c8a276

Add custom README

Browse files
Files changed (1) hide show
  1. README.md +31 -15
README.md CHANGED
@@ -7,29 +7,34 @@ tags:
7
  - trl
8
  - sft
9
  license: apache-2.0
 
 
 
10
  ---
11
 
12
- # Model Card for notHumpback-M1
13
 
14
- This model is a fine-tuned version of [meta-llama/Llama-3.2-3B](https://huggingface.co/meta-llama/Llama-3.2-3B).
15
- It has been trained using [TRL](https://github.com/huggingface/trl).
16
-
17
- ## Quick start
18
 
19
- ```python
20
- from transformers import pipeline
21
-
22
- question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
23
- generator = pipeline("text-generation", model="Alepach/notHumpback-M1", device="cuda")
24
- output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
25
- print(output["generated_text"])
26
- ```
27
 
28
- ## Training procedure
 
 
29
 
 
 
 
 
30
 
 
 
31
 
32
- This model was trained with SFT.
 
 
33
 
34
  ### Framework versions
35
 
@@ -41,7 +46,18 @@ This model was trained with SFT.
41
 
42
  ## Citations
43
 
 
44
 
 
 
 
 
 
 
 
 
 
 
45
 
46
  Cite TRL as:
47
 
 
7
  - trl
8
  - sft
9
  license: apache-2.0
10
+ datasets:
11
+ - OpenAssistant/oasst1
12
+ - allenai/c4
13
  ---
14
 
15
+ # notHumpback-M1
16
 
17
+ This model follows the Humpback architecture, proposed in the paper [Self-Alignment with Instruction Backtranslation](https://arxiv.org/pdf/2308.06259)
18
+ by Li et al.
 
 
19
 
20
+ It represents the resulting model after the first iteration of self-curation, which is trained on a small amount of gold data
21
+ and a set of generated data curated by the ["seed model"](https://huggingface.co/Alepach/notHumpback-M0).
 
 
 
 
 
 
22
 
23
+ This model can be used for instruction-following.
24
+ It may also be used to, again, score the instruction-response pairs
25
+ generated by the ["backward model"](https://huggingface.co/Alepach/notHumpback-Myx) for a second iteration of self-curation.
26
 
27
+ Humpback uses instruction backtranslation on a web corpus to generate input-output pairs (self-augmentation),
28
+ creating a richer dataset for fine-tuning models without the need for additional manual annotation.
29
+ The model then iteratively curates the created dataset, scoring the pairs by quality, and is then finetuned on the resulting subset
30
+ of all pairs with the highest possible score (self-curation).
31
 
32
+ Varying from the original paper, this model is a fine-tuned version of [meta-llama/Llama-3.2-3B](https://huggingface.co/meta-llama/Llama-3.2-3B).
33
+ It has been trained using [TRL](https://github.com/huggingface/trl).
34
 
35
+ The dataset used to train this model is a combination of data sampled from the [oasst1](https://huggingface.co/datasets/OpenAssistant/oasst1)
36
+ dataset and the synthetic dataset which was mentioned above. Latter has been created by applying self-augmentation and self-curation
37
+ on 502k entries from the english subset ("en") of the [c4](https://huggingface.co/datasets/allenai/c4) dataset.
38
 
39
  ### Framework versions
40
 
 
46
 
47
  ## Citations
48
 
49
+ Original paper:
50
 
51
+ ```bibtex
52
+ @misc{li2023selfalignment,
53
+ title={Self-Alignment with Instruction Backtranslation},
54
+ author={Xian Li and Ping Yu and Chunting Zhou and Timo Schick and Luke Zettlemoyer and Omer Levy and Jason Weston and Mike Lewis},
55
+ year={2023},
56
+ eprint={2308.06259},
57
+ archivePrefix={arXiv},
58
+ primaryClass={cs.CL}
59
+ }
60
+ ```
61
 
62
  Cite TRL as:
63