Skywork
/

Skywork-Critic-Llama-3.1-70B

Text Generation

Model card Files Files and versions Community

liang.zhao commited on Sep 19, 2024

Commit

ea950d1

·

1 Parent(s): a9cbfba

update model and config

Files changed (1) hide show

README.md +4 -2

README.md CHANGED Viewed

@@ -34,10 +34,11 @@ We evaluate our models on [RewardBench](https://huggingface.co/spaces/allenai/re
 As of September 2024, Skywork-Critic-Llama3.1-70B **ranks first** on RewardBench for generative models across all sizes, while Skywork-Critic-Llama3.1-8B tops the list for generative models under 10B parameters. (Note: An asterisk (*) indicates an open-source model.)
 | Model                           | Chat  | Chat Hard | Safety | Reasoning | Overall Score |
 | ------------------------------- | :---: | :-------: | :----: | :-------: | :---: |
 | **Skywork-Critic-Llama3.1-70B**  *      | **96.9**  |   **88.4**    |  **93.2**  |   **95.4**    | **93.4**  |
-| Salesforce/SFR-LLaMa-3.1-70B-Judge-r      | 96.9 | 84.8 | 86.2 | 95.1    | 92.7  |
 | Salesforce/SFR-nemo-12B-Judge-r      | 97.2 | 82.2 | 86.5 | 95.1    | 90.3  |
 | **Skywork-Critic-Llama3.1-8B**  *      | **93.6**  |   **81.4**    |  **91.1**  |   **89.8**    | **89.0**  |
 | Salesforce/SFR-LLaMa-3.1-8B-Judge-r      | 95.5 | 77.7 | 86.2 | 95.1    | 88.7  |
@@ -51,10 +52,11 @@ As of September 2024, Skywork-Critic-Llama3.1-70B **ranks first** on RewardBench
 | NCSOFT/Llama-3-OffsetBias-8B *       | 92.5  |   80.3    |  86.8  |   76.4    | 84.0  |
 # Demo Code
 Below is an example of obtaining the critic of two conversations.
-```
 import torch
 from transformers import AutoModelForCausalLM, AutoTokenizer

 As of September 2024, Skywork-Critic-Llama3.1-70B **ranks first** on RewardBench for generative models across all sizes, while Skywork-Critic-Llama3.1-8B tops the list for generative models under 10B parameters. (Note: An asterisk (*) indicates an open-source model.)
 | Model                           | Chat  | Chat Hard | Safety | Reasoning | Overall Score |
 | ------------------------------- | :---: | :-------: | :----: | :-------: | :---: |
 | **Skywork-Critic-Llama3.1-70B**  *      | **96.9**  |   **88.4**    |  **93.2**  |   **95.4**    | **93.4**  |
+| Salesforce/SFR-LLaMa-3.1-70B-Judge-r      | 96.9 | 84.8 | 91.6 | 97.6    | 92.7  |
 | Salesforce/SFR-nemo-12B-Judge-r      | 97.2 | 82.2 | 86.5 | 95.1    | 90.3  |
 | **Skywork-Critic-Llama3.1-8B**  *      | **93.6**  |   **81.4**    |  **91.1**  |   **89.8**    | **89.0**  |
 | Salesforce/SFR-LLaMa-3.1-8B-Judge-r      | 95.5 | 77.7 | 86.2 | 95.1    | 88.7  |
 | NCSOFT/Llama-3-OffsetBias-8B *       | 92.5  |   80.3    |  86.8  |   76.4    | 84.0  |
 # Demo Code
 Below is an example of obtaining the critic of two conversations.
+```python
 import torch
 from transformers import AutoModelForCausalLM, AutoTokenizer