kawine commited on
Commit
ee33554
·
1 Parent(s): a3ed2b5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +39 -1
README.md CHANGED
@@ -22,7 +22,7 @@ tags:
22
  <!-- Provide a quick summary of what the model is/does. -->
23
 
24
  SteamSHP-XL is a preference model trained to predict -- given some context and two possible responses -- which response humans will find more helpful.
25
- It can be used for NLG evaluation, question-answering evalation, or to train a smaller reward model for RLHF.
26
 
27
  It is a FLAN-T5-xl model (3B parameters) finetuned on:
28
  1. The [Stanford Human Preferences Dataset (SHP)](https://huggingface.co/datasets/stanfordnlp/SHP), which contains collective human preferences sourced from 18 different communities on Reddit (e.g., `askculinary`, `legaladvice`, etc.).
@@ -34,6 +34,8 @@ Despite being 1/4 of the size, it is on average only 0.75 points less accurate o
34
 
35
  ## Usage
36
 
 
 
37
  The input text should be of the format:
38
 
39
  ```
@@ -68,6 +70,40 @@ Here's how to use the model:
68
  If the input exceeds the 512 token limit, you can use [pybsd](https://github.com/nipunsadvilkar/pySBD) to break the input up into sentences and only include what fits into 512 tokens.
69
  When trying to cram an example into 512 tokens, we recommend truncating the context as much as possible and leaving the responses as untouched as possible.
70
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
71
 
72
  ## Training and Evaluation
73
 
@@ -105,6 +141,8 @@ SteamSHP-XL gets an average 72.8% accuracy across all domains:
105
  | anthropic (helpfulness) | 0.7310 |
106
  | ALL (unweighted) | 0.7278 |
107
 
 
 
108
 
109
 
110
  ## Biases and Limitations
 
22
  <!-- Provide a quick summary of what the model is/does. -->
23
 
24
  SteamSHP-XL is a preference model trained to predict -- given some context and two possible responses -- which response humans will find more helpful.
25
+ It can be used for NLG evaluation or as a reward model for RLHF.
26
 
27
  It is a FLAN-T5-xl model (3B parameters) finetuned on:
28
  1. The [Stanford Human Preferences Dataset (SHP)](https://huggingface.co/datasets/stanfordnlp/SHP), which contains collective human preferences sourced from 18 different communities on Reddit (e.g., `askculinary`, `legaladvice`, etc.).
 
34
 
35
  ## Usage
36
 
37
+ ### Normal Usage
38
+
39
  The input text should be of the format:
40
 
41
  ```
 
70
  If the input exceeds the 512 token limit, you can use [pybsd](https://github.com/nipunsadvilkar/pySBD) to break the input up into sentences and only include what fits into 512 tokens.
71
  When trying to cram an example into 512 tokens, we recommend truncating the context as much as possible and leaving the responses as untouched as possible.
72
 
73
+ ### Reward Model Usage
74
+
75
+ If you want to use SteamSHP-XL as a reward model -- to get a score for a single response -- then you need to structure the input such that RESPONSE A is what you want to score and RESPONSE B is just an empty input:
76
+
77
+ ```
78
+ POST: { the context, such as the 'history' column in SHP }
79
+
80
+ RESPONSE A: { continuation }
81
+
82
+ RESPONSE B: .
83
+
84
+ Which response is better? RESPONSE
85
+ ```
86
+
87
+ Then calculate the probability assigned to the label A.
88
+ This probability (or the logit, depending on what you want) is the score for the response:
89
+
90
+ ```python
91
+
92
+ >> input_text = "POST: Instacart gave me 50 pounds of limes instead of 5 pounds... what the hell do I do with 50 pounds of limes? I've already donated a bunch and gave a bunch away. I'm planning on making a bunch of lime-themed cocktails, but... jeez. Ceviche? \n\n RESPONSE A: Lime juice, and zest, then freeze in small quantities.\n\n RESPONSE B: .\n\n Which response is better? RESPONSE"
93
+ >> x = tokenizer([input_text], return_tensors='pt').input_ids.to(device)
94
+ >> outputs = model.generate(x, return_dict_in_generate=True, output_scores=True, max_new_tokens=1)
95
+ >> torch.exp(outputs.scores[0][:, 71]) / torch.exp(outputs.scores[0][:,:]).sum(axis=1).item() # index 71 corresponds to the token for 'A'
96
+ 0.819
97
+ ```
98
+
99
+ The probability will almost always be high (in the range of 0.8 to 1.0), since RESPONSE B is just a null input.
100
+ Therefore you may want to normalize the probability.
101
+
102
+ You can also compare the two probabilities assigned independently to each response (given the same context) to infer the preference label.
103
+ For example, if one response has probability 0.95 and the other has 0.80, the former will be preferred.
104
+ Inferring the preference label in this way only leads to a 0.5 drop in accuracy on the SHP + HH-RLHF test data on average across all domains, meaning that there's only a very small penalty for using SteamSHP as a reward model instead of as a preference model.
105
+
106
+
107
 
108
  ## Training and Evaluation
109
 
 
141
  | anthropic (helpfulness) | 0.7310 |
142
  | ALL (unweighted) | 0.7278 |
143
 
144
+ As mentioned previously, if you use SteamSHP as a reward model and try to infer the preference label based on the probability assigned to each response independently, that could also work!
145
+ But doing so will lead to a 0.5 drop in accuracy on the test data (on average across all domains), meaning that there is a small penalty.
146
 
147
 
148
  ## Biases and Limitations