Update README.md
Browse files
README.md
CHANGED
@@ -21,8 +21,8 @@ tags:
|
|
21 |
|
22 |
<!-- Provide a quick summary of what the model is/does. -->
|
23 |
|
24 |
-
SteamSHP-XL is a preference model trained to predict
|
25 |
-
It can be used for NLG evaluation or to train a smaller reward model for RLHF.
|
26 |
|
27 |
It is a FLAN-T5-xl model (3B parameters) finetuned on:
|
28 |
1. The [Stanford Human Preferences Dataset (SHP)](https://huggingface.co/datasets/stanfordnlp/SHP), which contains collective human preferences sourced from 18 different communities on Reddit (e.g., `askculinary`, `legaladvice`, etc.).
|
@@ -109,12 +109,15 @@ SteamSHP-XL gets an average 72.8% accuracy across all domains:
|
|
109 |
|
110 |
### Biases and Limitations
|
111 |
|
112 |
-
|
|
|
|
|
|
|
113 |
Although SHP filtered out posts with NSFW (over 18) content, chose subreddits that were well-moderated and had policies against harassment and bigotry, some of the data may contain discriminatory or harmful language.
|
114 |
-
|
115 |
|
116 |
-
|
117 |
-
|
118 |
|
119 |
|
120 |
## Contact
|
|
|
21 |
|
22 |
<!-- Provide a quick summary of what the model is/does. -->
|
23 |
|
24 |
+
SteamSHP-XL is a preference model trained to predict -- given some context and two possible responses -- which response humans will find more helpful.
|
25 |
+
It can be used for NLG evaluation, question-answering evalation, or to train a smaller reward model for RLHF.
|
26 |
|
27 |
It is a FLAN-T5-xl model (3B parameters) finetuned on:
|
28 |
1. The [Stanford Human Preferences Dataset (SHP)](https://huggingface.co/datasets/stanfordnlp/SHP), which contains collective human preferences sourced from 18 different communities on Reddit (e.g., `askculinary`, `legaladvice`, etc.).
|
|
|
109 |
|
110 |
### Biases and Limitations
|
111 |
|
112 |
+
SteamSHP is trained to predict which of two responses humans will find *more helpful*, not which response is *less harmful*.
|
113 |
+
It should not be used to detect toxicity, make ethical judgments, or for a similar purpose.
|
114 |
+
|
115 |
+
Biases and misinformation in the datasets used to train SteamSHP may also be propagated downstream to the model predictions.
|
116 |
Although SHP filtered out posts with NSFW (over 18) content, chose subreddits that were well-moderated and had policies against harassment and bigotry, some of the data may contain discriminatory or harmful language.
|
117 |
+
The responses that humans collectively found more helpful are also not guaranteed to be more factual.
|
118 |
|
119 |
+
The people whose preferences are captured in SHP and HH-RLHF are not representative of the broader population.
|
120 |
+
Although specific demographic information is not available, overall, the Reddit users whose preferences are captured in SHP are disproportionately male and from developed, Western, and English-speaking countries (Pew Research).
|
121 |
|
122 |
|
123 |
## Contact
|