kawine commited on
Commit
865c6c9
·
1 Parent(s): d60ccc4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -4
README.md CHANGED
@@ -25,10 +25,11 @@ SteamSHP-XL is a preference model trained to predict human preferences, given so
25
  It can be used for NLG evaluation or to train a smaller reward model for RLHF.
26
 
27
  It is a FLAN-T5-xl model (3B parameters) finetuned on:
28
- 1. The [Stanford Human Preferences Dataset (SHP)](https://huggingface.co/datasets/stanfordnlp/SHP), which contains aggregate human preferences sourced from 18 different communities on Reddit (e.g., `askculinary`, `legaladvice`, etc.).
29
  2. The helpfulness data in [Anthropic's HH-RLHF](https://huggingface.co/datasets/Anthropic/hh-rlhf) dataset.
30
 
31
- There is a smaller variant called [SteamSHP-Large](https://huggingface.co/kawine/SteamSHP-flan-t5-large) that was made by finetuning FLAN-T5-large (780M parameters), which is 0.75 percentage points less accurate on the test data.
 
32
 
33
 
34
  ## Usage
@@ -52,7 +53,7 @@ Here's how to use the model:
52
  ```python
53
 
54
  >> from transformers import T5ForConditionalGeneration, T5Tokenizer
55
- >> device = 'cuda'
56
 
57
  >> tokenizer = T5Tokenizer.from_pretrained('stanfordnlp/SteamSHP-flan-t5-xl')
58
  >> model = T5ForConditionalGeneration.from_pretrained('stanfordnlp/SteamSHP-flan-t5-xl').to(device)
@@ -112,7 +113,7 @@ Biases in the datasets used to train SteamSHP-XL may be propagated downstream to
112
  Although SHP filtered out posts with NSFW (over 18) content, chose subreddits that were well-moderated and had policies against harassment and bigotry, some of the data may contain discriminatory or harmful language.
113
  Reddit users on the subreddits covered by SHP are also not representative of the broader population. They are disproportionately from developed, Western, and English-speaking countries.
114
 
115
- It is also worth noting that the more preferred response in SHP or HH-RLHF is not necessarily the more correct one -- the data just reflects the aggregate preference of Reddit users (in SHP's case) and individuals' preferences (in HH-RLHF's case).
116
  [Past work](https://www.anthropic.com/model-written-evals.pdf) by Anthropic has found that models optimized for human preference can be obsequious, at the expense of the truth.
117
 
118
 
 
25
  It can be used for NLG evaluation or to train a smaller reward model for RLHF.
26
 
27
  It is a FLAN-T5-xl model (3B parameters) finetuned on:
28
+ 1. The [Stanford Human Preferences Dataset (SHP)](https://huggingface.co/datasets/stanfordnlp/SHP), which contains collective human preferences sourced from 18 different communities on Reddit (e.g., `askculinary`, `legaladvice`, etc.).
29
  2. The helpfulness data in [Anthropic's HH-RLHF](https://huggingface.co/datasets/Anthropic/hh-rlhf) dataset.
30
 
31
+ There is a smaller variant called [SteamSHP-Large](https://huggingface.co/kawine/SteamSHP-flan-t5-large) that was made by finetuning FLAN-T5-large (780M parameters).
32
+ Despite being 1/4 of the size, it is on average only 0.75 points less accurate on the SHP + Anthropic test data (across all domains).
33
 
34
 
35
  ## Usage
 
53
  ```python
54
 
55
  >> from transformers import T5ForConditionalGeneration, T5Tokenizer
56
+ >> device = 'cuda' # if you have a GPU
57
 
58
  >> tokenizer = T5Tokenizer.from_pretrained('stanfordnlp/SteamSHP-flan-t5-xl')
59
  >> model = T5ForConditionalGeneration.from_pretrained('stanfordnlp/SteamSHP-flan-t5-xl').to(device)
 
113
  Although SHP filtered out posts with NSFW (over 18) content, chose subreddits that were well-moderated and had policies against harassment and bigotry, some of the data may contain discriminatory or harmful language.
114
  Reddit users on the subreddits covered by SHP are also not representative of the broader population. They are disproportionately from developed, Western, and English-speaking countries.
115
 
116
+ It is also worth noting that the more preferred response in SHP or HH-RLHF is not necessarily the more correct one -- the data just reflects the collective preference of Reddit users (in SHP's case) and individual preferences (in HH-RLHF's case).
117
  [Past work](https://www.anthropic.com/model-written-evals.pdf) by Anthropic has found that models optimized for human preference can be obsequious, at the expense of the truth.
118
 
119