Update README.md
Browse files
README.md
CHANGED
@@ -25,10 +25,11 @@ SteamSHP-XL is a preference model trained to predict human preferences, given so
|
|
25 |
It can be used for NLG evaluation or to train a smaller reward model for RLHF.
|
26 |
|
27 |
It is a FLAN-T5-xl model (3B parameters) finetuned on:
|
28 |
-
1. The [Stanford Human Preferences Dataset (SHP)](https://huggingface.co/datasets/stanfordnlp/SHP), which contains
|
29 |
2. The helpfulness data in [Anthropic's HH-RLHF](https://huggingface.co/datasets/Anthropic/hh-rlhf) dataset.
|
30 |
|
31 |
-
There is a smaller variant called [SteamSHP-Large](https://huggingface.co/kawine/SteamSHP-flan-t5-large) that was made by finetuning FLAN-T5-large (780M parameters)
|
|
|
32 |
|
33 |
|
34 |
## Usage
|
@@ -52,7 +53,7 @@ Here's how to use the model:
|
|
52 |
```python
|
53 |
|
54 |
>> from transformers import T5ForConditionalGeneration, T5Tokenizer
|
55 |
-
>> device = 'cuda'
|
56 |
|
57 |
>> tokenizer = T5Tokenizer.from_pretrained('stanfordnlp/SteamSHP-flan-t5-xl')
|
58 |
>> model = T5ForConditionalGeneration.from_pretrained('stanfordnlp/SteamSHP-flan-t5-xl').to(device)
|
@@ -112,7 +113,7 @@ Biases in the datasets used to train SteamSHP-XL may be propagated downstream to
|
|
112 |
Although SHP filtered out posts with NSFW (over 18) content, chose subreddits that were well-moderated and had policies against harassment and bigotry, some of the data may contain discriminatory or harmful language.
|
113 |
Reddit users on the subreddits covered by SHP are also not representative of the broader population. They are disproportionately from developed, Western, and English-speaking countries.
|
114 |
|
115 |
-
It is also worth noting that the more preferred response in SHP or HH-RLHF is not necessarily the more correct one -- the data just reflects the
|
116 |
[Past work](https://www.anthropic.com/model-written-evals.pdf) by Anthropic has found that models optimized for human preference can be obsequious, at the expense of the truth.
|
117 |
|
118 |
|
|
|
25 |
It can be used for NLG evaluation or to train a smaller reward model for RLHF.
|
26 |
|
27 |
It is a FLAN-T5-xl model (3B parameters) finetuned on:
|
28 |
+
1. The [Stanford Human Preferences Dataset (SHP)](https://huggingface.co/datasets/stanfordnlp/SHP), which contains collective human preferences sourced from 18 different communities on Reddit (e.g., `askculinary`, `legaladvice`, etc.).
|
29 |
2. The helpfulness data in [Anthropic's HH-RLHF](https://huggingface.co/datasets/Anthropic/hh-rlhf) dataset.
|
30 |
|
31 |
+
There is a smaller variant called [SteamSHP-Large](https://huggingface.co/kawine/SteamSHP-flan-t5-large) that was made by finetuning FLAN-T5-large (780M parameters).
|
32 |
+
Despite being 1/4 of the size, it is on average only 0.75 points less accurate on the SHP + Anthropic test data (across all domains).
|
33 |
|
34 |
|
35 |
## Usage
|
|
|
53 |
```python
|
54 |
|
55 |
>> from transformers import T5ForConditionalGeneration, T5Tokenizer
|
56 |
+
>> device = 'cuda' # if you have a GPU
|
57 |
|
58 |
>> tokenizer = T5Tokenizer.from_pretrained('stanfordnlp/SteamSHP-flan-t5-xl')
|
59 |
>> model = T5ForConditionalGeneration.from_pretrained('stanfordnlp/SteamSHP-flan-t5-xl').to(device)
|
|
|
113 |
Although SHP filtered out posts with NSFW (over 18) content, chose subreddits that were well-moderated and had policies against harassment and bigotry, some of the data may contain discriminatory or harmful language.
|
114 |
Reddit users on the subreddits covered by SHP are also not representative of the broader population. They are disproportionately from developed, Western, and English-speaking countries.
|
115 |
|
116 |
+
It is also worth noting that the more preferred response in SHP or HH-RLHF is not necessarily the more correct one -- the data just reflects the collective preference of Reddit users (in SHP's case) and individual preferences (in HH-RLHF's case).
|
117 |
[Past work](https://www.anthropic.com/model-written-evals.pdf) by Anthropic has found that models optimized for human preference can be obsequious, at the expense of the truth.
|
118 |
|
119 |
|