liang.zhao commited on
Commit
ea950d1
·
1 Parent(s): a9cbfba

update model and config

Browse files
Files changed (1) hide show
  1. README.md +4 -2
README.md CHANGED
@@ -34,10 +34,11 @@ We evaluate our models on [RewardBench](https://huggingface.co/spaces/allenai/re
34
 
35
  As of September 2024, Skywork-Critic-Llama3.1-70B **ranks first** on RewardBench for generative models across all sizes, while Skywork-Critic-Llama3.1-8B tops the list for generative models under 10B parameters. (Note: An asterisk (*) indicates an open-source model.)
36
 
 
37
  | Model | Chat | Chat Hard | Safety | Reasoning | Overall Score |
38
  | ------------------------------- | :---: | :-------: | :----: | :-------: | :---: |
39
  | **Skywork-Critic-Llama3.1-70B** * | **96.9** | **88.4** | **93.2** | **95.4** | **93.4** |
40
- | Salesforce/SFR-LLaMa-3.1-70B-Judge-r | 96.9 | 84.8 | 86.2 | 95.1 | 92.7 |
41
  | Salesforce/SFR-nemo-12B-Judge-r | 97.2 | 82.2 | 86.5 | 95.1 | 90.3 |
42
  | **Skywork-Critic-Llama3.1-8B** * | **93.6** | **81.4** | **91.1** | **89.8** | **89.0** |
43
  | Salesforce/SFR-LLaMa-3.1-8B-Judge-r | 95.5 | 77.7 | 86.2 | 95.1 | 88.7 |
@@ -51,10 +52,11 @@ As of September 2024, Skywork-Critic-Llama3.1-70B **ranks first** on RewardBench
51
  | NCSOFT/Llama-3-OffsetBias-8B * | 92.5 | 80.3 | 86.8 | 76.4 | 84.0 |
52
 
53
 
 
54
  # Demo Code
55
  Below is an example of obtaining the critic of two conversations.
56
 
57
- ```
58
  import torch
59
  from transformers import AutoModelForCausalLM, AutoTokenizer
60
 
 
34
 
35
  As of September 2024, Skywork-Critic-Llama3.1-70B **ranks first** on RewardBench for generative models across all sizes, while Skywork-Critic-Llama3.1-8B tops the list for generative models under 10B parameters. (Note: An asterisk (*) indicates an open-source model.)
36
 
37
+
38
  | Model | Chat | Chat Hard | Safety | Reasoning | Overall Score |
39
  | ------------------------------- | :---: | :-------: | :----: | :-------: | :---: |
40
  | **Skywork-Critic-Llama3.1-70B** * | **96.9** | **88.4** | **93.2** | **95.4** | **93.4** |
41
+ | Salesforce/SFR-LLaMa-3.1-70B-Judge-r | 96.9 | 84.8 | 91.6 | 97.6 | 92.7 |
42
  | Salesforce/SFR-nemo-12B-Judge-r | 97.2 | 82.2 | 86.5 | 95.1 | 90.3 |
43
  | **Skywork-Critic-Llama3.1-8B** * | **93.6** | **81.4** | **91.1** | **89.8** | **89.0** |
44
  | Salesforce/SFR-LLaMa-3.1-8B-Judge-r | 95.5 | 77.7 | 86.2 | 95.1 | 88.7 |
 
52
  | NCSOFT/Llama-3-OffsetBias-8B * | 92.5 | 80.3 | 86.8 | 76.4 | 84.0 |
53
 
54
 
55
+
56
  # Demo Code
57
  Below is an example of obtaining the critic of two conversations.
58
 
59
+ ```python
60
  import torch
61
  from transformers import AutoModelForCausalLM, AutoTokenizer
62