XuyaoWang commited on
Commit
0aca220
·
verified ·
1 Parent(s): 1fa8a51

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +126 -2
README.md CHANGED
@@ -1,5 +1,25 @@
1
  # AnyRewardModel
2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ## Usage
4
  ```python
5
  from transformers import AutoModel, AutoProcessor
@@ -8,10 +28,114 @@ model = AutoModel.from_pretrained("PKU-Alignment/AnyRewardModel", trust_remote_c
8
  processor = AutoProcessor.from_pretrained("PKU-Alignment/AnyRewardModel", trust_remote_code=True)
9
  ```
10
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  ## Note:
 
 
 
 
 
 
 
 
 
 
 
 
12
 
13
- If you encounter the following error:
14
  ```
15
  ModuleNotFoundError: No module named 'torchvision.transforms.functional_tensor'
16
  ```
17
- Please refer to guide at [blog](https://blog.csdn.net/lanxing147/article/details/136625264) for detailed resolution steps.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  # AnyRewardModel
2
 
3
+ <span style="color: red;">All-Modality Generation benchmark evaluates a model's ability to follow instructions, automatically select appropriate modalities, and create synergistic outputs across different modalities (text, visual, audio) while avoiding redundancy.</span>
4
+
5
+ [🏠 Homepage](https://github.com/PKU-Alignment/align-anything) | [👍 Our Official Code Repo](https://github.com/PKU-Alignment/align-anything)
6
+
7
+ [🤗 All-Modality Understanding Benchmark](https://huggingface.co/datasets/PKU-Alignment/EvalAnything-AMU)
8
+
9
+ [🤗 All-Modality Generation Benchmark (Instruction Following Part)](https://huggingface.co/datasets/PKU-Alignment/EvalAnything-InstructionFollowing)
10
+
11
+ [🤗 All-Modality Generation Benchmark (Modality Selection and Synergy Part)](https://huggingface.co/datasets/PKU-Alignment/EvalAnything-Selection_Synergy)
12
+
13
+ [🤗 All-Modality Generation Reward Model](https://huggingface.co/PKU-Alignment/AnyRewardModel)
14
+
15
+
16
+
17
+ ## Data Example
18
+
19
+ <div align="center">
20
+ <img src="example-amg.png" width="100%"/>
21
+ </div>
22
+
23
  ## Usage
24
  ```python
25
  from transformers import AutoModel, AutoProcessor
 
28
  processor = AutoProcessor.from_pretrained("PKU-Alignment/AnyRewardModel", trust_remote_code=True)
29
  ```
30
 
31
+ For Image-Audio Modality Synergy scoring:
32
+ ```python
33
+ user_prompt: str = 'USER: {input}'
34
+ assistant_prompt: str = '\nASSISTANT:\n{modality}{text_response}'
35
+
36
+ def sigmoid(x):
37
+ return 1 / (1 + math.exp(-x))
38
+
39
+ def process_ia(prompt, image_path, audio_path):
40
+ image_pixel_values = processor(data_paths = image_path, modality="image").pixel_values
41
+ audio_pixel_values = processor(data_paths = audio_path, modality="audio").pixel_values
42
+
43
+ text_input = processor(
44
+ text = user_prompt.format(input = prompt) + \
45
+ assistant_prompt.format(modality = "<image><audio>", text_response = ""),
46
+ modality="text"
47
+ )
48
+ return {
49
+ "input_ids": text_input.input_ids,
50
+ "attention_mask": text_input.attention_mask,
51
+ "pixel_values_1": image_pixel_values.unsqueeze(0),
52
+ "pixel_values_2": audio_pixel_values.unsqueeze(0),
53
+ "modality": [["image", "audio"]]
54
+ }
55
+
56
+
57
+ score = sigmoid(model(**process_ia(prompt, image_path, audio_path)).end_scores.squeeze(dim=-1).item())
58
+ ```
59
+
60
+ For Text-Image Modality Synergy scoring:
61
+ ```python
62
+ user_prompt: str = 'USER: {input}'
63
+ assistant_prompt: str = '\nASSISTANT:\n{modality}{text_response}'
64
+
65
+ def sigmoid(x):
66
+ return 1 / (1 + math.exp(-x))
67
+
68
+ def process_ti(prompt, response, image_path):
69
+ image_pixel_values = processor(data_paths = image_path, modality="image").pixel_values
70
+ text_input = processor(
71
+ text = user_prompt.format(input = prompt) + \
72
+ assistant_prompt.format(modality = "<image>", text_response = response),
73
+ modality="text"
74
+ )
75
+ return {
76
+ "input_ids": text_input.input_ids,
77
+ "attention_mask": text_input.attention_mask,
78
+ "pixel_values_1": image_pixel_values.unsqueeze(0),
79
+ "modality": [["image", "text"]]
80
+ }
81
+
82
+ score = sigmoid(model(**process_ti(prompt, response, image_path)).end_scores.squeeze(dim=-1).item())
83
+ ```
84
+
85
+ For Text-Audio Modality Synergy scoring:
86
+ ```python
87
+ user_prompt: str = 'USER: {input}'
88
+ assistant_prompt: str = '\nASSISTANT:\n{modality}{text_response}'
89
+
90
+ def sigmoid(x):
91
+ return 1 / (1 + math.exp(-x))
92
+
93
+ def process_ta(prompt, response, audio_path):
94
+ audio_pixel_values = processor(data_paths = audio_path, modality="audio").pixel_values
95
+ text_input = processor(
96
+ text = user_prompt.format(input = prompt) + \
97
+ assistant_prompt.format(modality = "<audio>", text_response = response),
98
+ modality="text"
99
+ )
100
+ return {
101
+ "input_ids": text_input.input_ids,
102
+ "attention_mask": text_input.attention_mask,
103
+ "pixel_values_1": audio_pixel_values.unsqueeze(0),
104
+ "modality": [["audio", "text"]]
105
+ }
106
+
107
+ score = sigmoid(model(**process_ta(prompt, response, audio_path)).end_scores.squeeze(dim=-1).item())
108
+ ```
109
+
110
  ## Note:
111
+ 1. Before using AnyRewardModel, you should install following dependency in [requirements.txt](https://huggingface.co/PKU-Alignment/AnyRewardModel/blob/main/requirements.txt):
112
+ ```txt
113
+ ftfy
114
+ timm
115
+ regex
116
+ einops
117
+ fvcore
118
+ decord
119
+ torchaudio
120
+ torchvision
121
+ pytorchvideo
122
+ ```
123
 
124
+ 2. If you encounter the following error:
125
  ```
126
  ModuleNotFoundError: No module named 'torchvision.transforms.functional_tensor'
127
  ```
128
+ Please refer to guide at [blog](https://blog.csdn.net/lanxing147/article/details/136625264) for detailed resolution steps.
129
+
130
+ **Note:** The current code is a sample script for the All-Modality Generation subtask of Eval Anything. In the future, we will integrate Eval Anything's evaluation into the framework to provide convenience for community use.
131
+
132
+ ## Citation
133
+ Please cite our work if you use our benchmark or model in your paper.
134
+ ```bibtex
135
+ @inproceedings{ji2024align,
136
+ title={Align Anything: Training All-Modality Models to Follow Instructions with Language Feedback},
137
+ author={Jiaming Ji and Jiayi Zhou and Hantao Lou and Boyuan Chen and Donghai Hong and Xuyao Wang and Wenqi Chen and Kaile Wang and Rui Pan and Jiahao Li and Mohan Wang and Josef Dai and Tianyi Qiu and Hua Xu and Dong Li and Weipeng Chen and Jun Song and Bo Zheng and Yaodong Yang},
138
+ year={2024},
139
+ url={https://arxiv.org/abs/2412.15838}
140
+ }
141
+ ```