Update README.md
Browse files
README.md
CHANGED
@@ -39,11 +39,11 @@ Despite being 1/4 of the size, it is on average only 0.75 points less accurate o
|
|
39 |
The input text should be of the format:
|
40 |
|
41 |
```
|
42 |
-
POST: { the context, such as the 'history' column in SHP }
|
43 |
|
44 |
-
RESPONSE A: { first possible continuation }
|
45 |
|
46 |
-
RESPONSE B: { second possible continuation }
|
47 |
|
48 |
Which response is better? RESPONSE
|
49 |
```
|
@@ -75,9 +75,9 @@ When trying to cram an example into 512 tokens, we recommend truncating the cont
|
|
75 |
If you want to use SteamSHP-XL as a reward model -- to get a score for a single response -- then you need to structure the input such that RESPONSE A is what you want to score and RESPONSE B is just an empty input:
|
76 |
|
77 |
```
|
78 |
-
POST: { the context, such as the 'history' column in SHP }
|
79 |
|
80 |
-
RESPONSE A: { continuation }
|
81 |
|
82 |
RESPONSE B: .
|
83 |
|
|
|
39 |
The input text should be of the format:
|
40 |
|
41 |
```
|
42 |
+
POST: { the context, such as the 'history' column in SHP (not containing any newlines \n) }
|
43 |
|
44 |
+
RESPONSE A: { first possible continuation (not containing any newlines \n) }
|
45 |
|
46 |
+
RESPONSE B: { second possible continuation (not containing any newlines \n) }
|
47 |
|
48 |
Which response is better? RESPONSE
|
49 |
```
|
|
|
75 |
If you want to use SteamSHP-XL as a reward model -- to get a score for a single response -- then you need to structure the input such that RESPONSE A is what you want to score and RESPONSE B is just an empty input:
|
76 |
|
77 |
```
|
78 |
+
POST: { the context, such as the 'history' column in SHP (not containing any newlines \n) }
|
79 |
|
80 |
+
RESPONSE A: { continuation (not containing any newlines \n) }
|
81 |
|
82 |
RESPONSE B: .
|
83 |
|