Update README.md
Browse files
README.md
CHANGED
@@ -28,15 +28,6 @@ Inspired by [DeBERTa Reward Model Series](https://huggingface.co/OpenAssistant/r
|
|
28 |
- Paper: [https://arxiv.org/abs/2306.02561](https://arxiv.org/abs/2306.02561)
|
29 |
- Space Demo: [https://huggingface.co/spaces/llm-blender/LLM-Blender](https://huggingface.co/spaces/llm-blender/LLM-Blender)
|
30 |
|
31 |
-
|
32 |
-
## Statistics
|
33 |
-
|
34 |
-
### Context length
|
35 |
-
| PairRanker type | Source max length | Candidate max length | Total max length |
|
36 |
-
|:-----------------:|:-----------------:|----------------------|------------------|
|
37 |
-
| [pair-ranker](https://huggingface.co/llm-blender/pair-ranker) | 128 | 128 | 384 |
|
38 |
-
| [PairRM](https://huggingface.co/llm-blender/pair-reward-model/) (This model) | 1224 | 412 | 2048 |
|
39 |
-
|
40 |
## Usage Example
|
41 |
|
42 |
### Installation
|
@@ -141,6 +132,18 @@ With a `blender.compare()` function, you can easily apply PairRM to poopular RLH
|
|
141 |
|
142 |
Learn more in our LLM-Blender Github [README.md](https://github.com/yuchenlin/LLM-Blender#rank-and-fusion)
|
143 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
144 |
### Performance
|
145 |
PairRM has been trained on various high-quality and large-scale dataset with human preference annotations and exhibits great correlation with human preferences
|
146 |
with an extremly small model size (0.4B), approching the performance of GPT-4.
|
|
|
28 |
- Paper: [https://arxiv.org/abs/2306.02561](https://arxiv.org/abs/2306.02561)
|
29 |
- Space Demo: [https://huggingface.co/spaces/llm-blender/LLM-Blender](https://huggingface.co/spaces/llm-blender/LLM-Blender)
|
30 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
31 |
## Usage Example
|
32 |
|
33 |
### Installation
|
|
|
132 |
|
133 |
Learn more in our LLM-Blender Github [README.md](https://github.com/yuchenlin/LLM-Blender#rank-and-fusion)
|
134 |
|
135 |
+
|
136 |
+
|
137 |
+
|
138 |
+
## Statistics
|
139 |
+
|
140 |
+
### Context length
|
141 |
+
| PairRanker type | Source max length | Candidate max length | Total max length |
|
142 |
+
|:-----------------:|:-----------------:|----------------------|------------------|
|
143 |
+
| [pair-ranker](https://huggingface.co/llm-blender/pair-ranker) | 128 | 128 | 384 |
|
144 |
+
| [PairRM](https://huggingface.co/llm-blender/pair-reward-model/) (This model) | 1224 | 412 | 2048 |
|
145 |
+
|
146 |
+
|
147 |
### Performance
|
148 |
PairRM has been trained on various high-quality and large-scale dataset with human preference annotations and exhibits great correlation with human preferences
|
149 |
with an extremly small model size (0.4B), approching the performance of GPT-4.
|