Update README.md
Browse files
README.md
CHANGED
@@ -22,7 +22,7 @@ tags:
|
|
22 |
<!-- Provide a quick summary of what the model is/does. -->
|
23 |
|
24 |
SteamSHP-XL is a preference model trained to predict -- given some context and two possible responses -- which response humans will find more helpful.
|
25 |
-
It can be used for NLG evaluation
|
26 |
|
27 |
It is a FLAN-T5-xl model (3B parameters) finetuned on:
|
28 |
1. The [Stanford Human Preferences Dataset (SHP)](https://huggingface.co/datasets/stanfordnlp/SHP), which contains collective human preferences sourced from 18 different communities on Reddit (e.g., `askculinary`, `legaladvice`, etc.).
|
@@ -34,6 +34,8 @@ Despite being 1/4 of the size, it is on average only 0.75 points less accurate o
|
|
34 |
|
35 |
## Usage
|
36 |
|
|
|
|
|
37 |
The input text should be of the format:
|
38 |
|
39 |
```
|
@@ -68,6 +70,40 @@ Here's how to use the model:
|
|
68 |
If the input exceeds the 512 token limit, you can use [pybsd](https://github.com/nipunsadvilkar/pySBD) to break the input up into sentences and only include what fits into 512 tokens.
|
69 |
When trying to cram an example into 512 tokens, we recommend truncating the context as much as possible and leaving the responses as untouched as possible.
|
70 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
71 |
|
72 |
## Training and Evaluation
|
73 |
|
@@ -105,6 +141,8 @@ SteamSHP-XL gets an average 72.8% accuracy across all domains:
|
|
105 |
| anthropic (helpfulness) | 0.7310 |
|
106 |
| ALL (unweighted) | 0.7278 |
|
107 |
|
|
|
|
|
108 |
|
109 |
|
110 |
## Biases and Limitations
|
|
|
22 |
<!-- Provide a quick summary of what the model is/does. -->
|
23 |
|
24 |
SteamSHP-XL is a preference model trained to predict -- given some context and two possible responses -- which response humans will find more helpful.
|
25 |
+
It can be used for NLG evaluation or as a reward model for RLHF.
|
26 |
|
27 |
It is a FLAN-T5-xl model (3B parameters) finetuned on:
|
28 |
1. The [Stanford Human Preferences Dataset (SHP)](https://huggingface.co/datasets/stanfordnlp/SHP), which contains collective human preferences sourced from 18 different communities on Reddit (e.g., `askculinary`, `legaladvice`, etc.).
|
|
|
34 |
|
35 |
## Usage
|
36 |
|
37 |
+
### Normal Usage
|
38 |
+
|
39 |
The input text should be of the format:
|
40 |
|
41 |
```
|
|
|
70 |
If the input exceeds the 512 token limit, you can use [pybsd](https://github.com/nipunsadvilkar/pySBD) to break the input up into sentences and only include what fits into 512 tokens.
|
71 |
When trying to cram an example into 512 tokens, we recommend truncating the context as much as possible and leaving the responses as untouched as possible.
|
72 |
|
73 |
+
### Reward Model Usage
|
74 |
+
|
75 |
+
If you want to use SteamSHP-XL as a reward model -- to get a score for a single response -- then you need to structure the input such that RESPONSE A is what you want to score and RESPONSE B is just an empty input:
|
76 |
+
|
77 |
+
```
|
78 |
+
POST: { the context, such as the 'history' column in SHP }
|
79 |
+
|
80 |
+
RESPONSE A: { continuation }
|
81 |
+
|
82 |
+
RESPONSE B: .
|
83 |
+
|
84 |
+
Which response is better? RESPONSE
|
85 |
+
```
|
86 |
+
|
87 |
+
Then calculate the probability assigned to the label A.
|
88 |
+
This probability (or the logit, depending on what you want) is the score for the response:
|
89 |
+
|
90 |
+
```python
|
91 |
+
|
92 |
+
>> input_text = "POST: Instacart gave me 50 pounds of limes instead of 5 pounds... what the hell do I do with 50 pounds of limes? I've already donated a bunch and gave a bunch away. I'm planning on making a bunch of lime-themed cocktails, but... jeez. Ceviche? \n\n RESPONSE A: Lime juice, and zest, then freeze in small quantities.\n\n RESPONSE B: .\n\n Which response is better? RESPONSE"
|
93 |
+
>> x = tokenizer([input_text], return_tensors='pt').input_ids.to(device)
|
94 |
+
>> outputs = model.generate(x, return_dict_in_generate=True, output_scores=True, max_new_tokens=1)
|
95 |
+
>> torch.exp(outputs.scores[0][:, 71]) / torch.exp(outputs.scores[0][:,:]).sum(axis=1).item() # index 71 corresponds to the token for 'A'
|
96 |
+
0.819
|
97 |
+
```
|
98 |
+
|
99 |
+
The probability will almost always be high (in the range of 0.8 to 1.0), since RESPONSE B is just a null input.
|
100 |
+
Therefore you may want to normalize the probability.
|
101 |
+
|
102 |
+
You can also compare the two probabilities assigned independently to each response (given the same context) to infer the preference label.
|
103 |
+
For example, if one response has probability 0.95 and the other has 0.80, the former will be preferred.
|
104 |
+
Inferring the preference label in this way only leads to a 0.5 drop in accuracy on the SHP + HH-RLHF test data on average across all domains, meaning that there's only a very small penalty for using SteamSHP as a reward model instead of as a preference model.
|
105 |
+
|
106 |
+
|
107 |
|
108 |
## Training and Evaluation
|
109 |
|
|
|
141 |
| anthropic (helpfulness) | 0.7310 |
|
142 |
| ALL (unweighted) | 0.7278 |
|
143 |
|
144 |
+
As mentioned previously, if you use SteamSHP as a reward model and try to infer the preference label based on the probability assigned to each response independently, that could also work!
|
145 |
+
But doing so will lead to a 0.5 drop in accuracy on the test data (on average across all domains), meaning that there is a small penalty.
|
146 |
|
147 |
|
148 |
## Biases and Limitations
|