Update README.md
Browse files
README.md
CHANGED
@@ -101,7 +101,7 @@ Therefore you may want to normalize the probability.
|
|
101 |
|
102 |
You can also compare the two probabilities assigned independently to each response (given the same context) to infer the preference label.
|
103 |
For example, if one response has probability 0.95 and the other has 0.80, the former will be preferred.
|
104 |
-
Inferring the preference label in this way only leads to a 0.
|
105 |
|
106 |
|
107 |
|
@@ -142,7 +142,7 @@ SteamSHP-XL gets an average 72.8% accuracy across all domains:
|
|
142 |
| ALL (unweighted) | 0.7278 |
|
143 |
|
144 |
As mentioned previously, if you use SteamSHP as a reward model and try to infer the preference label based on the probability assigned to each response independently, that could also work!
|
145 |
-
But doing so will lead to a 0.
|
146 |
|
147 |
|
148 |
## Biases and Limitations
|
|
|
101 |
|
102 |
You can also compare the two probabilities assigned independently to each response (given the same context) to infer the preference label.
|
103 |
For example, if one response has probability 0.95 and the other has 0.80, the former will be preferred.
|
104 |
+
Inferring the preference label in this way only leads to a 0.006 drop in accuracy on the SHP + HH-RLHF test data on average across all domains, meaning that there's only a very small penalty for using SteamSHP-XL as a reward model instead of as a preference model.
|
105 |
|
106 |
|
107 |
|
|
|
142 |
| ALL (unweighted) | 0.7278 |
|
143 |
|
144 |
As mentioned previously, if you use SteamSHP as a reward model and try to infer the preference label based on the probability assigned to each response independently, that could also work!
|
145 |
+
But doing so will lead to a 0.006 drop in accuracy on the test data (on average across all domains), meaning that there is a small penalty.
|
146 |
|
147 |
|
148 |
## Biases and Limitations
|