Yiqiao Jin commited on
Commit
bdafe83
·
1 Parent(s): 6c58fd4

Initial Commit

Browse files

Delete Redundant Files
Scripts for Download and Process Submission
Improve Data Loading

Add initial app

This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. .gitignore +185 -0
  2. LICENSE +203 -0
  3. README.md +89 -12
  4. agentreview/__init__.py +8 -0
  5. agentreview/agent.py +253 -0
  6. agentreview/arena.py +201 -0
  7. agentreview/backends/__init__.py +30 -0
  8. agentreview/backends/anthropic.py +119 -0
  9. agentreview/backends/bard.py +90 -0
  10. agentreview/backends/base.py +66 -0
  11. agentreview/backends/cohere.py +126 -0
  12. agentreview/backends/dummy.py +14 -0
  13. agentreview/backends/hf_transformers.py +127 -0
  14. agentreview/backends/human.py +23 -0
  15. agentreview/backends/langchain.py +169 -0
  16. agentreview/backends/openai.py +180 -0
  17. agentreview/config.py +143 -0
  18. agentreview/database.py +136 -0
  19. agentreview/dataset/__init__.py +0 -0
  20. agentreview/dataset/download_openreview_paper.py +136 -0
  21. agentreview/dataset/process_submissions.py +113 -0
  22. agentreview/environments/__init__.py +25 -0
  23. agentreview/environments/base.py +188 -0
  24. agentreview/environments/conversation.py +198 -0
  25. agentreview/environments/paper_decision.py +161 -0
  26. agentreview/environments/paper_review.py +217 -0
  27. agentreview/experiment_config.py +244 -0
  28. agentreview/message.py +150 -0
  29. agentreview/paper_processor.py +163 -0
  30. agentreview/paper_review_arena.py +185 -0
  31. agentreview/paper_review_message.py +104 -0
  32. agentreview/paper_review_player.py +120 -0
  33. agentreview/paper_review_settings.py +114 -0
  34. agentreview/role_descriptions.py +515 -0
  35. agentreview/ui/__init__.py +0 -0
  36. agentreview/ui/cli.py +269 -0
  37. agentreview/utils.py +116 -0
  38. arguments.py +156 -0
  39. const.py +100 -0
  40. docs/devdoc/design.md +39 -0
  41. docs/devdoc/mainloop.md +62 -0
  42. docs/devdoc/moderated.md +16 -0
  43. docs/tutorials/create_your_environment.md +90 -0
  44. notebooks/barplot_similarity_between_review_metareview.ipynb +0 -0
  45. notebooks/demo.ipynb +0 -0
  46. notebooks/histplots.ipynb +0 -0
  47. notebooks/lineplots.ipynb +0 -0
  48. requirements.txt +19 -0
  49. review_content_analysis/analysis.py +477 -0
  50. review_content_analysis/classification_prompt.txt +58 -0
.gitignore ADDED
@@ -0,0 +1,185 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+
3
+ key.py
4
+
5
+ *.pdf
6
+ *.json
7
+ *.png
8
+ *.jpg
9
+ *.jpeg
10
+ *.gif
11
+ data/
12
+ unused_data/
13
+ demo/
14
+ Summary/
15
+
16
+ # Byte-compiled / optimized / DLL files
17
+ __pycache__/
18
+ *.py[cod]
19
+ *$py.class
20
+
21
+ # C extensions
22
+ *.so
23
+
24
+ outputs
25
+
26
+ # Distribution / packaging
27
+ .Python
28
+ build/
29
+ develop-eggs/
30
+ dist/
31
+ downloads/
32
+ eggs/
33
+ .eggs/
34
+ lib/
35
+ lib64/
36
+ parts/
37
+ sdist/
38
+ var/
39
+ wheels/
40
+ pip-wheel-metadata/
41
+ share/python-wheels/
42
+ *.egg-info/
43
+ .installed.cfg
44
+ *.egg
45
+ MANIFEST
46
+
47
+ # PyInstaller
48
+ # Usually these files are written by a python script from a template
49
+ # before PyInstaller builds the exe, so as to inject date/other infos into it.
50
+ *.manifest
51
+ *.spec
52
+
53
+ # Installer logs
54
+ pip-log.txt
55
+ pip-delete-this-directory.txt
56
+
57
+ # Unit test / coverage reports
58
+ htmlcov/
59
+ .tox/
60
+ .nox/
61
+ .coverage
62
+ .coverage.*
63
+ .cache
64
+ nosetests.xml
65
+ coverage.xml
66
+ *.cover
67
+ *.py,cover
68
+ .hypothesis/
69
+ .pytest_cache/
70
+
71
+ # Translations
72
+ *.mo
73
+ *.pot
74
+
75
+ # Django stuff:
76
+ *.log
77
+ local_settings.py
78
+ db.sqlite3
79
+ db.sqlite3-journal
80
+
81
+ # Flask stuff:
82
+ instance/
83
+ .webassets-cache
84
+
85
+ # Scrapy stuff:
86
+ .scrapy
87
+
88
+ # Sphinx documentation
89
+ docs/_build/
90
+
91
+ # PyBuilder
92
+ .pybuilder/
93
+ target/
94
+
95
+ # Jupyter Notebook
96
+ .ipynb_checkpoints
97
+
98
+ # IPython
99
+ profile_default/
100
+ ipython_config.py
101
+
102
+ # pyenv
103
+ # For a library or package, you might want to ignore these files since the code is
104
+ # intended to run in multiple environments; otherwise, check them in:
105
+ .python-version
106
+
107
+ # pipenv
108
+ # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
109
+ # However, in case of collaboration, if having platform-specific dependencies or dependencies
110
+ # having no cross-platform support, pipenv may install dependencies that don't work, or not
111
+ # install all needed dependencies.
112
+ #Pipfile.lock
113
+
114
+ # poetry
115
+ # Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
116
+ # This is especially recommended for binary packages to ensure reproducibility, and is more
117
+ # commonly ignored for libraries.
118
+ # https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
119
+ #poetry.lock
120
+
121
+ # pdm
122
+ # Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
123
+ #pdm.lock
124
+ # pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
125
+ # in version control.
126
+ # https://pdm.fming.dev/#use-with-ide
127
+ .pdm.toml
128
+
129
+ # PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
130
+ __pypackages__/
131
+
132
+ # Celery stuff
133
+ celerybeat-schedule
134
+ celerybeat.pid
135
+
136
+ # SageMath parsed files
137
+ *.sage.py
138
+
139
+ # Environments
140
+ .env
141
+ .venv
142
+ env/
143
+ venv/
144
+ ENV/
145
+ env.bak/
146
+ venv.bak/
147
+
148
+ # Spyder project settings
149
+ .spyderproject
150
+ .spyproject
151
+
152
+ # Rope project settings
153
+ .ropeproject
154
+
155
+ # mkdocs documentation
156
+ /site
157
+
158
+ # mypy
159
+ .mypy_cache/
160
+ .dmypy.json
161
+ dmypy.json
162
+
163
+ # Pyre type checker
164
+ .pyre/
165
+
166
+ # pytype static type analyzer
167
+ .pytype/
168
+
169
+ # Cython debug symbols
170
+ cython_debug/
171
+
172
+ # PyCharm
173
+ # JetBrains specific template is maintained in a separate JetBrains.gitignore that can
174
+ # be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
175
+ # and can be added to the global gitignore or merged into this file. For a more nuclear
176
+ # option (not recommended) you can uncomment the following to ignore the entire idea folder.
177
+ .idea/
178
+
179
+ .DS_Store
180
+ hf-spaces/
181
+ etc/
182
+ .conda
183
+ *.xlsx
184
+ *.csv
185
+ *.zip
LICENSE ADDED
@@ -0,0 +1,203 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Copyright 2023 ChatArena. All rights reserved.
2
+
3
+ Apache License
4
+ Version 2.0, January 2004
5
+ http://www.apache.org/licenses/
6
+
7
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
8
+
9
+ 1. Definitions.
10
+
11
+ "License" shall mean the terms and conditions for use, reproduction,
12
+ and distribution as defined by Sections 1 through 9 of this document.
13
+
14
+ "Licensor" shall mean the copyright owner or entity authorized by
15
+ the copyright owner that is granting the License.
16
+
17
+ "Legal Entity" shall mean the union of the acting entity and all
18
+ other entities that control, are controlled by, or are under common
19
+ control with that entity. For the purposes of this definition,
20
+ "control" means (i) the power, direct or indirect, to cause the
21
+ direction or management of such entity, whether by contract or
22
+ otherwise, or (ii) ownership of fifty percent (50%) or more of the
23
+ outstanding shares, or (iii) beneficial ownership of such entity.
24
+
25
+ "You" (or "Your") shall mean an individual or Legal Entity
26
+ exercising permissions granted by this License.
27
+
28
+ "Source" form shall mean the preferred form for making modifications,
29
+ including but not limited to software source code, documentation
30
+ source, and configuration files.
31
+
32
+ "Object" form shall mean any form resulting from mechanical
33
+ transformation or translation of a Source form, including but
34
+ not limited to compiled object code, generated documentation,
35
+ and conversions to other media types.
36
+
37
+ "Work" shall mean the work of authorship, whether in Source or
38
+ Object form, made available under the License, as indicated by a
39
+ copyright notice that is included in or attached to the work
40
+ (an example is provided in the Appendix below).
41
+
42
+ "Derivative Works" shall mean any work, whether in Source or Object
43
+ form, that is based on (or derived from) the Work and for which the
44
+ editorial revisions, annotations, elaborations, or other modifications
45
+ represent, as a whole, an original work of authorship. For the purposes
46
+ of this License, Derivative Works shall not include works that remain
47
+ separable from, or merely link (or bind by name) to the interfaces of,
48
+ the Work and Derivative Works thereof.
49
+
50
+ "Contribution" shall mean any work of authorship, including
51
+ the original version of the Work and any modifications or additions
52
+ to that Work or Derivative Works thereof, that is intentionally
53
+ submitted to Licensor for inclusion in the Work by the copyright owner
54
+ or by an individual or Legal Entity authorized to submit on behalf of
55
+ the copyright owner. For the purposes of this definition, "submitted"
56
+ means any form of electronic, verbal, or written communication sent
57
+ to the Licensor or its representatives, including but not limited to
58
+ communication on electronic mailing lists, source code control systems,
59
+ and issue tracking systems that are managed by, or on behalf of, the
60
+ Licensor for the purpose of discussing and improving the Work, but
61
+ excluding communication that is conspicuously marked or otherwise
62
+ designated in writing by the copyright owner as "Not a Contribution."
63
+
64
+ "Contributor" shall mean Licensor and any individual or Legal Entity
65
+ on behalf of whom a Contribution has been received by Licensor and
66
+ subsequently incorporated within the Work.
67
+
68
+ 2. Grant of Copyright License. Subject to the terms and conditions of
69
+ this License, each Contributor hereby grants to You a perpetual,
70
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
71
+ copyright license to reproduce, prepare Derivative Works of,
72
+ publicly display, publicly perform, sublicense, and distribute the
73
+ Work and such Derivative Works in Source or Object form.
74
+
75
+ 3. Grant of Patent License. Subject to the terms and conditions of
76
+ this License, each Contributor hereby grants to You a perpetual,
77
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
78
+ (except as stated in this section) patent license to make, have made,
79
+ use, offer to sell, sell, import, and otherwise transfer the Work,
80
+ where such license applies only to those patent claims licensable
81
+ by such Contributor that are necessarily infringed by their
82
+ Contribution(s) alone or by combination of their Contribution(s)
83
+ with the Work to which such Contribution(s) was submitted. If You
84
+ institute patent litigation against any entity (including a
85
+ cross-claim or counterclaim in a lawsuit) alleging that the Work
86
+ or a Contribution incorporated within the Work constitutes direct
87
+ or contributory patent infringement, then any patent licenses
88
+ granted to You under this License for that Work shall terminate
89
+ as of the date such litigation is filed.
90
+
91
+ 4. Redistribution. You may reproduce and distribute copies of the
92
+ Work or Derivative Works thereof in any medium, with or without
93
+ modifications, and in Source or Object form, provided that You
94
+ meet the following conditions:
95
+
96
+ (a) You must give any other recipients of the Work or
97
+ Derivative Works a copy of this License; and
98
+
99
+ (b) You must cause any modified files to carry prominent notices
100
+ stating that You changed the files; and
101
+
102
+ (c) You must retain, in the Source form of any Derivative Works
103
+ that You distribute, all copyright, patent, trademark, and
104
+ attribution notices from the Source form of the Work,
105
+ excluding those notices that do not pertain to any part of
106
+ the Derivative Works; and
107
+
108
+ (d) If the Work includes a "NOTICE" text file as part of its
109
+ distribution, then any Derivative Works that You distribute must
110
+ include a readable copy of the attribution notices contained
111
+ within such NOTICE file, excluding those notices that do not
112
+ pertain to any part of the Derivative Works, in at least one
113
+ of the following places: within a NOTICE text file distributed
114
+ as part of the Derivative Works; within the Source form or
115
+ documentation, if provided along with the Derivative Works; or,
116
+ within a display generated by the Derivative Works, if and
117
+ wherever such third-party notices normally appear. The contents
118
+ of the NOTICE file are for informational purposes only and
119
+ do not modify the License. You may add Your own attribution
120
+ notices within Derivative Works that You distribute, alongside
121
+ or as an addendum to the NOTICE text from the Work, provided
122
+ that such additional attribution notices cannot be construed
123
+ as modifying the License.
124
+
125
+ You may add Your own copyright statement to Your modifications and
126
+ may provide additional or different license terms and conditions
127
+ for use, reproduction, or distribution of Your modifications, or
128
+ for any such Derivative Works as a whole, provided Your use,
129
+ reproduction, and distribution of the Work otherwise complies with
130
+ the conditions stated in this License.
131
+
132
+ 5. Submission of Contributions. Unless You explicitly state otherwise,
133
+ any Contribution intentionally submitted for inclusion in the Work
134
+ by You to the Licensor shall be under the terms and conditions of
135
+ this License, without any additional terms or conditions.
136
+ Notwithstanding the above, nothing herein shall supersede or modify
137
+ the terms of any separate license agreement you may have executed
138
+ with Licensor regarding such Contributions.
139
+
140
+ 6. Trademarks. This License does not grant permission to use the trade
141
+ names, trademarks, service marks, or product names of the Licensor,
142
+ except as required for reasonable and customary use in describing the
143
+ origin of the Work and reproducing the content of the NOTICE file.
144
+
145
+ 7. Disclaimer of Warranty. Unless required by applicable law or
146
+ agreed to in writing, Licensor provides the Work (and each
147
+ Contributor provides its Contributions) on an "AS IS" BASIS,
148
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
149
+ implied, including, without limitation, any warranties or conditions
150
+ of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
151
+ PARTICULAR PURPOSE. You are solely responsible for determining the
152
+ appropriateness of using or redistributing the Work and assume any
153
+ risks associated with Your exercise of permissions under this License.
154
+
155
+ 8. Limitation of Liability. In no event and under no legal theory,
156
+ whether in tort (including negligence), contract, or otherwise,
157
+ unless required by applicable law (such as deliberate and grossly
158
+ negligent acts) or agreed to in writing, shall any Contributor be
159
+ liable to You for damages, including any direct, indirect, special,
160
+ incidental, or consequential damages of any character arising as a
161
+ result of this License or out of the use or inability to use the
162
+ Work (including but not limited to damages for loss of goodwill,
163
+ work stoppage, computer failure or malfunction, or any and all
164
+ other commercial damages or losses), even if such Contributor
165
+ has been advised of the possibility of such damages.
166
+
167
+ 9. Accepting Warranty or Additional Liability. While redistributing
168
+ the Work or Derivative Works thereof, You may choose to offer,
169
+ and charge a fee for, acceptance of support, warranty, indemnity,
170
+ or other liability obligations and/or rights consistent with this
171
+ License. However, in accepting such obligations, You may act only
172
+ on Your own behalf and on Your sole responsibility, not on behalf
173
+ of any other Contributor, and only if You agree to indemnify,
174
+ defend, and hold each Contributor harmless for any liability
175
+ incurred by, or claims asserted against, such Contributor by reason
176
+ of your accepting any such warranty or additional liability.
177
+
178
+ END OF TERMS AND CONDITIONS
179
+
180
+ APPENDIX: How to apply the Apache License to your work.
181
+
182
+ To apply the Apache License to your work, attach the following
183
+ boilerplate notice, with the fields enclosed by brackets "[]"
184
+ replaced with your own identifying information. (Don't include
185
+ the brackets!) The text should be enclosed in the appropriate
186
+ comment syntax for the file format. We also recommend that a
187
+ file or class name and description of purpose be included on the
188
+ same "printed page" as the copyright notice for easier
189
+ identification within third-party archives.
190
+
191
+ Copyright [yyyy] [name of copyright owner]
192
+
193
+ Licensed under the Apache License, Version 2.0 (the "License");
194
+ you may not use this file except in compliance with the License.
195
+ You may obtain a copy of the License at
196
+
197
+ http://www.apache.org/licenses/LICENSE-2.0
198
+
199
+ Unless required by applicable law or agreed to in writing, software
200
+ distributed under the License is distributed on an "AS IS" BASIS,
201
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
202
+ See the License for the specific language governing permissions and
203
+ limitations under the License.
README.md CHANGED
@@ -1,14 +1,91 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
- title: AgentReview
3
- emoji: 👁
4
- colorFrom: indigo
5
- colorTo: pink
6
- sdk: gradio
7
- sdk_version: 5.4.0
8
- app_file: app.py
9
- pinned: false
10
- license: apache-2.0
11
- short_description: EMNLP 2024
12
- ---
13
 
14
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # AgentReview
2
+
3
+ Official implementation for the 🔗[EMNLP 2024](https://2024.emnlp.org/) (main) paper: [AgentReview: Exploring Peer Review Dynamics with LLM Agents](https://arxiv.org/abs/2406.12708)
4
+
5
+ * 🌐 Website: [https://agentreview.github.io/](https://agentreview.github.io/)
6
+ * 📄 Paper: [https://arxiv.org/abs/2406.12708](https://arxiv.org/abs/2406.12708)
7
+ * **🚀 Note: This repository is under construction development. Please stay tuned!!**
8
+
9
+
10
+
11
+ ```bibtex
12
+ @inproceedings{jin2024agentreview,
13
+ title={AgentReview: Exploring Peer Review Dynamics with LLM Agents},
14
+ author={Jin, Yiqiao and Zhao, Qinlin and Wang, Yiyang and Chen, Hao and Zhu, Kaijie and Xiao, Yijia and Wang, Jindong},
15
+ booktitle={EMNLP},
16
+ year={2024}
17
+ }
18
+ ```
19
+
20
+ <img src="static/img/Overview.png">
21
+
22
  ---
 
 
 
 
 
 
 
 
 
 
 
23
 
24
+ ### Introduction
25
+
26
+ Agent4Review is a simulation framework for systematic analysis of the peer review process based on large language models (LLMs). This framework aims to understand decision-making patterns, reviewer behavior, and the dynamics of paper acceptance and rejection.
27
+
28
+ ### Academic Abstract
29
+
30
+ Peer review is fundamental to the integrity and advancement of scientific publication. Traditional methods of peer review analyses often rely on exploration and statistics of existing peer review data, which do not adequately address the multivariate nature of the process, account for the latent variables, and are further constrained by privacy concerns due to the sensitive nature of the data. We introduce AgentReview, the first large language model (LLM) based peer review simulation framework, which effectively disentangles the impacts of multiple latent factors and addresses the privacy issue. Our study reveals significant insights, including a notable 37.1% variation in paper decisions due to reviewers' biases, supported by sociological theories such as the social influence theory, altruism fatigue, and authority bias. We believe that this study could offer valuable insights to improve the design of peer review mechanisms
31
+
32
+
33
+ ![Review Stage Design](static/img/ReviewPipeline.png)
34
+
35
+
36
+ ### Installation
37
+
38
+ 1. Download and unzip the [data](https://www.dropbox.com/scl/fi/mydblhx8yxk8kbz8b7zmr/AgentReview_data.zip?rlkey=se16p9gonclw5t8t3vn9p0o6n&st=6988u8lx&dl=0) under `data/`, which contains the PDF versions of the paper as well as the forum discussions for ICLR 2020 - 2023
39
+ 2. **Install Required Packages**:
40
+ ```
41
+ cd AgentReview
42
+ pip install -r requirements.txt
43
+ ```
44
+
45
+ 3. **Run the Project**:
46
+
47
+ **Note: all project files should be run from the `AgentReview` directory.**
48
+
49
+
50
+ ## Data
51
+
52
+ #### Project Structure
53
+ - `app.py`: The main application file for running the framework.
54
+ - `analysis/`: Contains Python scripts for various statistical analyses of review data.
55
+ - `chatarena/`: Core module for simulating different review environments and integrating LLM backends.
56
+ - `dataset/`: Scripts for handling dataset operations, such as downloading and processing submissions.
57
+ - `demo/`: Demonstrative scripts showcasing the functionality of different components.
58
+ - `docs/`: Documentation files and markdown guides for using and extending the framework.
59
+ - `examples/`: Configuration files and examples to demonstrate the capabilities and setup of simulations.
60
+ - `experiments/`: Experimental scripts to test new ideas or improvements on the framework.
61
+ - `visual/`: Visualization scripts for generating insightful plots and charts from the simulation data.
62
+
63
+ #### Usage
64
+
65
+ **[UNDER CONSTRUCTION]**
66
+
67
+
68
+ ### Stage Design
69
+
70
+ Our simulation adopts a structured, 5-phase pipeline
71
+
72
+ * **Phase I. Reviewer Assessment.** Each manuscript is evaluated by three reviewers independently.
73
+ * **Phase II. Author-Reviewer Discussion.** Authors submit rebuttals to address reviewers' concerns;
74
+ * **Phase III. Reviewer-AC Discussion.** The AC facilitates discussions among reviewers, prompting updates to their initial assessments.
75
+ * **Phase IV. Meta-Review Compilation.** The AC synthesizes the discussions into a meta-review.
76
+ * **Phase V. Paper Decision.** The AC makes the final decision on whether to accept or reject the paper, based on all gathered inputs.
77
+
78
+ ## Note
79
+
80
+ - We use a fixed acceptance rate of 32%, corresponding to the actual acceptance rate of ICLR 2020 -- 2023. See [Conference Acceptance Rates](https://github.com/lixin4ever/Conference-Acceptance-Rate) for more information.
81
+ - Sometimes the API can apply strict filtering to the request. You may need to adjust the content filtering to get the desired results.
82
+
83
+
84
+
85
+ ## License
86
+
87
+ This project is licensed under the Apache-2.0 License.
88
+
89
+ ## Acknowledgements
90
+
91
+ The implementation is partially based on the [chatarena](https://github.com/Farama-Foundation/chatarena) framework.
agentreview/__init__.py ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ import os
2
+
3
+ ROOT_DIR = (
4
+ os.path.abspath(os.path.join(os.path.dirname(__file__), os.pardir)) + os.path.sep
5
+ )
6
+ EXAMPLES_DIR = os.path.join(ROOT_DIR, "examples")
7
+
8
+ __version__ = "0.1.16"
agentreview/agent.py ADDED
@@ -0,0 +1,253 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import logging
2
+ import re
3
+ import uuid
4
+ from abc import abstractmethod
5
+ from argparse import Namespace
6
+ from typing import List, Union
7
+
8
+ from tenacity import RetryError
9
+
10
+ from .backends import IntelligenceBackend, load_backend
11
+ from .config import AgentConfig, BackendConfig, Configurable
12
+ from .message import SYSTEM_NAME, Message
13
+
14
+ # A special signal sent by the player to indicate that it is not possible to continue the conversation, and it requests to end the conversation.
15
+ # It contains a random UUID string to avoid being exploited by any of the players.
16
+ SIGNAL_END_OF_CONVERSATION = f"<<<<<<END_OF_CONVERSATION>>>>>>{uuid.uuid4()}"
17
+
18
+
19
+ class Agent(Configurable):
20
+ """An abstract base class for all the agents in the chatArena environment."""
21
+
22
+ @abstractmethod
23
+ def __init__(
24
+ self, name: str, role_desc: str, global_prompt: str = None, *args, **kwargs
25
+ ):
26
+ """
27
+ Initialize the agent.
28
+
29
+ Parameters:
30
+ name (str): The name of the agent.
31
+ role_desc (str): Description of the agent's role.
32
+ global_prompt (str): A universal prompt that applies to all agents. Defaults to None.
33
+ """
34
+ super().__init__(
35
+ name=name, role_desc=role_desc, global_prompt=global_prompt, **kwargs
36
+ )
37
+ self.name = name
38
+ self.role_desc = role_desc
39
+ self.global_prompt = global_prompt
40
+
41
+
42
+ class Player(Agent):
43
+ """
44
+ The Player class represents a player in the chatArena environment.
45
+
46
+ A player can observe the environment
47
+ and perform an action (generate a response) based on the observation.
48
+ """
49
+
50
+ def __init__(
51
+ self,
52
+ name: str,
53
+ role_desc: str,
54
+ backend: Union[BackendConfig, IntelligenceBackend],
55
+ global_prompt: str = None,
56
+ args: Namespace = None,
57
+ **kwargs,
58
+ ):
59
+ """
60
+ Initialize the player with a name, role description, backend, and a global prompt.
61
+
62
+ Parameters:
63
+ name (str): The name of the player.
64
+ role_desc (str): Description of the player's role.
65
+ backend (Union[BackendConfig, IntelligenceBackend]): The backend that will be used for decision making. It can be either a LLM backend or a Human backend.
66
+ global_prompt (str): A universal prompt that applies to all players. Defaults to None.
67
+ """
68
+
69
+ self.data_dir = kwargs.pop("data_dir", None)
70
+ self.args = args
71
+
72
+ if isinstance(backend, BackendConfig):
73
+ backend_config = backend
74
+ backend = load_backend(backend_config)
75
+ elif isinstance(backend, IntelligenceBackend):
76
+ backend_config = backend.to_config()
77
+ else:
78
+ raise ValueError(
79
+ f"backend must be a BackendConfig or an IntelligenceBackend, but got {type(backend)}"
80
+ )
81
+
82
+ assert (
83
+ name != SYSTEM_NAME
84
+ ), f"Player name cannot be {SYSTEM_NAME}, which is reserved for the system."
85
+
86
+ # Register the fields in the _config
87
+ super().__init__(
88
+ name=name,
89
+ role_desc=role_desc,
90
+ backend=backend_config,
91
+ global_prompt=global_prompt,
92
+ **kwargs,
93
+ )
94
+
95
+ self.backend = backend
96
+
97
+ def to_config(self) -> AgentConfig:
98
+ return AgentConfig(
99
+ name=self.name,
100
+ role_desc=self.role_desc,
101
+ backend=self.backend.to_config(),
102
+ global_prompt=self.global_prompt,
103
+ )
104
+
105
+ def act(self, observation: List[Message]) -> str:
106
+ """
107
+ Take an action based on the observation (Generate a response), which can later be parsed to actual actions that affect the game dynamics.
108
+
109
+ Parameters:
110
+ observation (List[Message]): The messages that the player has observed from the environment.
111
+
112
+ Returns:
113
+ str: The action (response) of the player.
114
+ """
115
+ try:
116
+ response = self.backend.query(
117
+ agent_name=self.name,
118
+ role_desc=self.role_desc,
119
+ history_messages=observation,
120
+ global_prompt=self.global_prompt,
121
+ request_msg=None,
122
+ )
123
+ except RetryError as e:
124
+ err_msg = f"Agent {self.name} failed to generate a response. Error: {e.last_attempt.exception()}. Sending signal to end the conversation."
125
+ logging.warning(err_msg)
126
+ response = SIGNAL_END_OF_CONVERSATION + err_msg
127
+
128
+ return response
129
+
130
+ def __call__(self, observation: List[Message]) -> str:
131
+ return self.act(observation)
132
+
133
+ async def async_act(self, observation: List[Message]) -> str:
134
+ """
135
+ Async version of act().
136
+
137
+ This is used when you want to generate a response asynchronously.
138
+
139
+ Parameters:
140
+ observation (List[Message]): The messages that the player has observed from the environment.
141
+
142
+ Returns:
143
+ str: The action (response) of the player.
144
+ """
145
+ try:
146
+ response = self.backend.async_query(
147
+ agent_name=self.name,
148
+ role_desc=self.role_desc,
149
+ history_messages=observation,
150
+ global_prompt=self.global_prompt,
151
+ request_msg=None,
152
+ )
153
+ except RetryError as e:
154
+ err_msg = f"Agent {self.name} failed to generate a response. Error: {e.last_attempt.exception()}. Sending signal to end the conversation."
155
+ logging.warning(err_msg)
156
+ response = SIGNAL_END_OF_CONVERSATION + err_msg
157
+
158
+ return response
159
+
160
+ def reset(self):
161
+ """
162
+ Reset the player's backend in case they are not stateless.
163
+
164
+ This is usually called at the end of each episode.
165
+ """
166
+ self.backend.reset()
167
+
168
+
169
+ class Moderator(Player):
170
+ """
171
+ The Moderator class represents a special type of player that moderates the conversation.
172
+
173
+ It is usually used as a component of the environment when the transition dynamics is conditioned on natural language that are not easy to parse programmatically.
174
+ """
175
+
176
+ def __init__(
177
+ self,
178
+ role_desc: str,
179
+ backend: Union[BackendConfig, IntelligenceBackend],
180
+ terminal_condition: str,
181
+ global_prompt: str = None,
182
+ **kwargs,
183
+ ):
184
+ """
185
+ Initialize the moderator with a role description, backend, terminal condition, and a global prompt.
186
+
187
+ Parameters:
188
+ role_desc (str): Description of the moderator's role.
189
+ backend (Union[BackendConfig, IntelligenceBackend]): The backend that will be used for decision making.
190
+ terminal_condition (str): The condition that signifies the end of the conversation.
191
+ global_prompt (str): A universal prompt that applies to the moderator. Defaults to None.
192
+ """
193
+ name = "Moderator"
194
+ super().__init__(
195
+ name=name,
196
+ role_desc=role_desc,
197
+ backend=backend,
198
+ global_prompt=global_prompt,
199
+ **kwargs,
200
+ )
201
+
202
+ self.terminal_condition = terminal_condition
203
+
204
+ def to_config(self) -> AgentConfig:
205
+ return AgentConfig(
206
+ name=self.name,
207
+ role_desc=self.role_desc,
208
+ backend=self.backend.to_config(),
209
+ terminal_condition=self.terminal_condition,
210
+ global_prompt=self.global_prompt,
211
+ )
212
+
213
+ def is_terminal(self, history: List[Message], *args, **kwargs) -> bool:
214
+ """
215
+ Check whether an episode is terminated based on the terminal condition.
216
+
217
+ Parameters:
218
+ history (List[Message]): The conversation history.
219
+
220
+ Returns:
221
+ bool: True if the conversation is over, otherwise False.
222
+ """
223
+ # If the last message is the signal, then the conversation is over
224
+ if history[-1].content == SIGNAL_END_OF_CONVERSATION:
225
+ return True
226
+
227
+ try:
228
+ request_msg = Message(
229
+ agent_name=self.name, content=self.terminal_condition, turn=-1
230
+ )
231
+ response = self.backend.query(
232
+ agent_name=self.name,
233
+ role_desc=self.role_desc,
234
+ history_messages=history,
235
+ global_prompt=self.global_prompt,
236
+ request_msg=request_msg,
237
+ *args,
238
+ **kwargs,
239
+ )
240
+ except RetryError as e:
241
+ logging.warning(
242
+ f"Agent {self.name} failed to generate a response. "
243
+ f"Error: {e.last_attempt.exception()}."
244
+ )
245
+ return True
246
+
247
+ if re.match(
248
+ r"yes|y|yea|yeah|yep|yup|sure|ok|okay|alright", response, re.IGNORECASE
249
+ ):
250
+ # print(f"Decision: {response}. Conversation is ended by moderator.")
251
+ return True
252
+ else:
253
+ return False
agentreview/arena.py ADDED
@@ -0,0 +1,201 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import csv
2
+ import json
3
+ import logging
4
+ import uuid
5
+ from typing import Dict, List, Union
6
+
7
+ from .agent import Player
8
+ from .backends import Human
9
+ from .config import ArenaConfig
10
+ from .environments import Environment, TimeStep, load_environment
11
+
12
+
13
+ class TooManyInvalidActions(Exception):
14
+ pass
15
+
16
+
17
+ class Arena:
18
+ """Utility class that manages the game environment and players."""
19
+
20
+ def __init__(
21
+ self, players: List[Player], environment: Environment, args, global_prompt: str = None
22
+ ):
23
+ # Create a container for the players and environment and reset the game
24
+ self.players = players
25
+ self.environment = environment
26
+ self.global_prompt = global_prompt
27
+
28
+ self.current_timestep = environment.reset()
29
+ self.uuid = uuid.uuid4() # Generate a unique id for the game
30
+ self.invalid_actions_retry = 5
31
+ self.args = args
32
+
33
+ @property
34
+ def num_players(self):
35
+ return self.environment.num_players
36
+
37
+ @property
38
+ def name_to_player(self) -> Dict[str, Player]:
39
+ return {player.name: player for player in self.players}
40
+
41
+ def reset(self) -> TimeStep:
42
+ # Reset the environment
43
+ self.current_timestep = self.environment.reset()
44
+ # Reset the players
45
+ for player in self.players:
46
+ player.reset()
47
+ # Reset the uuid
48
+ self.uuid = uuid.uuid4()
49
+ return self.current_timestep
50
+
51
+ def step(self) -> TimeStep:
52
+ """Take a step in the game: one player takes an action and the environment updates."""
53
+ player_name = self.environment.get_next_player()
54
+ player = self.name_to_player[player_name] # get the player object
55
+ observation = self.environment.get_observation(
56
+ player_name
57
+ ) # get the observation for the player
58
+
59
+ timestep = None
60
+ for i in range(
61
+ self.invalid_actions_retry
62
+ ): # try to take an action for a few times
63
+ action = player(observation) # take an action
64
+ if self.environment.check_action(action, player_name): # action is valid
65
+ timestep = self.environment.step(
66
+ player_name, action
67
+ ) # update the environment
68
+ break
69
+ else: # action is invalid
70
+ logging.warning(f"{player_name} made an invalid action {action}")
71
+ continue
72
+
73
+ if (
74
+ timestep is None
75
+ ): # if the player made invalid actions for too many times, terminate the game
76
+ warning_msg = f"{player_name} has made invalid actions for {self.invalid_actions_retry} times. Terminating the game."
77
+ logging.warning(warning_msg)
78
+ raise TooManyInvalidActions(warning_msg)
79
+
80
+ return timestep
81
+
82
+ def next_is_human(self):
83
+ """Check if the next player is human."""
84
+ player_name = self.environment.get_next_player()
85
+ player = self.name_to_player[player_name]
86
+ return isinstance(player.backend, Human)
87
+
88
+ def run(self, num_steps: int = 1):
89
+ """Run the game for num_turns."""
90
+ for i in range(num_steps):
91
+ timestep = self.step()
92
+ if timestep.terminal:
93
+ break
94
+
95
+ @classmethod
96
+ def from_config(cls, config: Union[str, ArenaConfig]):
97
+ """Create an arena from a config."""
98
+ # If config is a path, load the config
99
+ if isinstance(config, str):
100
+ config = ArenaConfig.load(config)
101
+
102
+ global_prompt = config.get("global_prompt", None)
103
+
104
+ # Create the players
105
+ players = []
106
+ for player_config in config.players:
107
+ # Add public_prompt to the player config
108
+ if global_prompt is not None:
109
+ player_config["global_prompt"] = global_prompt
110
+
111
+ player = Player.from_config(player_config)
112
+ players.append(player)
113
+
114
+ # Check that the player names are unique
115
+ player_names = [player.name for player in players]
116
+ assert len(player_names) == len(
117
+ set(player_names)
118
+ ), "Player names must be unique"
119
+
120
+ # Create the environment
121
+ config.environment[
122
+ "player_names"
123
+ ] = player_names # add the player names to the environment config
124
+ env = load_environment(config.environment)
125
+
126
+ return cls(players, env, global_prompt=global_prompt)
127
+
128
+ def to_config(self) -> ArenaConfig:
129
+ """Convert the arena to a config."""
130
+ # return {
131
+ # "players": [player.to_config() for player in self.players],
132
+ # "environment": self.environment.to_config(),
133
+ # "global_prompt": self.global_prompt
134
+ # }
135
+ return ArenaConfig(
136
+ players=[player.to_config() for player in self.players],
137
+ environment=self.environment.to_config(),
138
+ global_prompt=self.global_prompt,
139
+ )
140
+
141
+ def launch_cli(self, max_steps: int = None, interactive: bool = True):
142
+ """Launch the command line interface."""
143
+ from agentreview.ui.cli import ArenaCLI
144
+
145
+ cli = ArenaCLI(self)
146
+ cli.launch(max_steps=max_steps, interactive=interactive)
147
+
148
+ def save_config(self, path: str):
149
+ """Save the config to a file."""
150
+ config = self.to_config()
151
+ config.save(path)
152
+
153
+ def save_history(self, path: str):
154
+ """
155
+ Save the history of the game to a file.
156
+
157
+ Supports csv and json formats.
158
+ """
159
+ messages = self.environment.get_observation()
160
+ message_rows = []
161
+
162
+ if path.endswith(".csv"):
163
+ header = [
164
+ "agent_name",
165
+ "content",
166
+ "turn",
167
+ "timestamp",
168
+ "visible_to",
169
+ "msg_type",
170
+ ]
171
+ for message in messages:
172
+ message_row = [
173
+ message.agent_name,
174
+ message.content,
175
+ message.turn,
176
+ str(message.timestamp),
177
+ message.visible_to,
178
+ message.msg_type,
179
+ ]
180
+ message_rows.append(message_row)
181
+
182
+ with open(path, "w") as f:
183
+ writer = csv.writer(f)
184
+ writer.writerow(header)
185
+ writer.writerows(message_rows)
186
+ elif path.endswith(".json"):
187
+ for message in messages:
188
+ message_row = {
189
+ "agent_name": message.agent_name,
190
+ "content": message.content,
191
+ "turn": message.turn,
192
+ "timestamp": str(message.timestamp),
193
+ "visible_to": message.visible_to,
194
+ "msg_type": message.msg_type,
195
+ }
196
+ message_rows.append(message_row)
197
+
198
+ with open(path, "w") as f:
199
+ json.dump(message_rows, f, indent=2)
200
+ else:
201
+ raise ValueError("Invalid file format")
agentreview/backends/__init__.py ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from ..config import BackendConfig
2
+ from .anthropic import Claude
3
+ from .base import IntelligenceBackend
4
+ from .cohere import CohereAIChat
5
+ from .hf_transformers import TransformersConversational
6
+ from .human import Human
7
+ from .openai import OpenAIChat
8
+ from .dummy import Dummy
9
+
10
+ ALL_BACKENDS = [
11
+ Human,
12
+ OpenAIChat,
13
+ CohereAIChat,
14
+ TransformersConversational,
15
+ Claude,
16
+ Dummy,
17
+ ]
18
+
19
+ BACKEND_REGISTRY = {backend.type_name: backend for backend in ALL_BACKENDS}
20
+
21
+
22
+ # Load a backend from a config dictionary
23
+ def load_backend(config: BackendConfig):
24
+ try:
25
+ backend_cls = BACKEND_REGISTRY[config.backend_type]
26
+ except KeyError:
27
+ raise ValueError(f"Unknown backend type: {config.backend_type}")
28
+
29
+ backend = backend_cls.from_config(config)
30
+ return backend
agentreview/backends/anthropic.py ADDED
@@ -0,0 +1,119 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import re
3
+ from typing import List
4
+
5
+ from tenacity import retry, stop_after_attempt, wait_random_exponential
6
+
7
+ from ..message import SYSTEM_NAME as SYSTEM
8
+ from ..message import Message
9
+ from .base import IntelligenceBackend
10
+
11
+ try:
12
+ import anthropic
13
+ except ImportError:
14
+ is_anthropic_available = False
15
+ # logging.warning("anthropic package is not installed")
16
+ else:
17
+ anthropic_api_key = os.environ.get("ANTHROPIC_API_KEY")
18
+ if anthropic_api_key is None:
19
+ # logging.warning("Anthropic API key is not set. Please set the environment variable ANTHROPIC_API_KEY")
20
+ is_anthropic_available = False
21
+ else:
22
+ is_anthropic_available = True
23
+
24
+ DEFAULT_MAX_TOKENS = 256
25
+ DEFAULT_MODEL = "claude-v1"
26
+
27
+
28
+ class Claude(IntelligenceBackend):
29
+ """Interface to the Claude offered by Anthropic."""
30
+
31
+ stateful = False
32
+ type_name = "claude"
33
+
34
+ def __init__(
35
+ self, max_tokens: int = DEFAULT_MAX_TOKENS, model: str = DEFAULT_MODEL, **kwargs
36
+ ):
37
+ assert (
38
+ is_anthropic_available
39
+ ), "anthropic package is not installed or the API key is not set"
40
+ super().__init__(max_tokens=max_tokens, model=model, **kwargs)
41
+
42
+ self.max_tokens = max_tokens
43
+ self.model = model
44
+
45
+ self.client = anthropic.Client(os.environ["ANTHROPIC_API_KEY"])
46
+
47
+ @retry(stop=stop_after_attempt(6), wait=wait_random_exponential(min=1, max=60))
48
+ def _get_response(self, prompt: str):
49
+ response = self.client.completion(
50
+ prompt=prompt,
51
+ stop_sequences=[anthropic.HUMAN_PROMPT],
52
+ model=self.model,
53
+ max_tokens_to_sample=self.max_tokens,
54
+ )
55
+
56
+ response = response["completion"].strip()
57
+ return response
58
+
59
+ def query(
60
+ self,
61
+ agent_name: str,
62
+ role_desc: str,
63
+ history_messages: List[Message],
64
+ global_prompt: str = None,
65
+ request_msg: Message = None,
66
+ *args,
67
+ **kwargs,
68
+ ) -> str:
69
+ """
70
+ Format the input and call the Claude API.
71
+
72
+ args:
73
+ agent_name: the name of the agent
74
+ role_desc: the description of the role of the agent
75
+ env_desc: the description of the environment
76
+ history_messages: the history of the conversation, or the observation for the agent
77
+ request_msg: the request from the system to guide the agent's next response
78
+ """
79
+ all_messages = (
80
+ [(SYSTEM, global_prompt), (SYSTEM, role_desc)]
81
+ if global_prompt
82
+ else [(SYSTEM, role_desc)]
83
+ )
84
+
85
+ for message in history_messages:
86
+ all_messages.append((message.agent_name, message.content))
87
+ if request_msg:
88
+ all_messages.append((SYSTEM, request_msg.content))
89
+
90
+ prompt = ""
91
+ prev_is_human = False # Whether the previous message is from human (in anthropic, the human is the user)
92
+ for i, message in enumerate(all_messages):
93
+ if i == 0:
94
+ assert (
95
+ message[0] == SYSTEM
96
+ ) # The first message should be from the system
97
+
98
+ if message[0] == agent_name:
99
+ if prev_is_human:
100
+ prompt = f"{prompt}{anthropic.AI_PROMPT} {message[1]}"
101
+ else:
102
+ prompt = f"{prompt}\n\n{message[1]}"
103
+ prev_is_human = False
104
+ else:
105
+ if prev_is_human:
106
+ prompt = f"{prompt}\n\n[{message[0]}]: {message[1]}"
107
+ else:
108
+ prompt = f"{prompt}{anthropic.HUMAN_PROMPT}\n[{message[0]}]: {message[1]}"
109
+ prev_is_human = True
110
+ assert prev_is_human # The last message should be from the human
111
+ # Add the AI prompt for Claude to generate the response
112
+ prompt = f"{prompt}{anthropic.AI_PROMPT}"
113
+
114
+ response = self._get_response(prompt, *args, **kwargs)
115
+
116
+ # Remove the agent name if the response starts with it
117
+ response = re.sub(rf"^\s*\[{agent_name}]:?", "", response).strip()
118
+
119
+ return response
agentreview/backends/bard.py ADDED
@@ -0,0 +1,90 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import re
3
+ from typing import List
4
+
5
+ from tenacity import retry, stop_after_attempt, wait_random_exponential
6
+
7
+ from ..message import SYSTEM_NAME as SYSTEM
8
+ from ..message import Message
9
+ from .base import IntelligenceBackend
10
+
11
+ try:
12
+ import bardapi
13
+ except ImportError:
14
+ is_bard_available = False
15
+ # logging.warning("bard package is not installed")
16
+ else:
17
+ bard_api_key = os.environ.get("_BARD_API_KEY")
18
+ if bard_api_key is None:
19
+ # logging.warning(
20
+ # "Bard API key is not set. Please set the environment variable _BARD_API_KEY")
21
+ is_bard_available = False
22
+ else:
23
+ is_bard_available = True
24
+
25
+ DEFAULT_MAX_TOKENS = 4096
26
+
27
+
28
+ class Bard(IntelligenceBackend):
29
+ """Interface to the Bard offered by Google."""
30
+
31
+ stateful = False
32
+ type_name = "bard"
33
+
34
+ def __init__(self, max_tokens: int = DEFAULT_MAX_TOKENS, **kwargs):
35
+ assert (
36
+ is_bard_available
37
+ ), "bard package is not installed or the API key is not set"
38
+ super().__init__(max_tokens=max_tokens, **kwargs)
39
+
40
+ self.max_tokens = max_tokens
41
+
42
+ self.client = bardapi.core.Bard()
43
+
44
+ @retry(stop=stop_after_attempt(6), wait=wait_random_exponential(min=1, max=60))
45
+ def _get_response(self, prompt: str):
46
+ response = self.client.get_answer(
47
+ input_text=prompt,
48
+ )
49
+
50
+ response = response["content"].strip()
51
+ return response
52
+
53
+ def query(
54
+ self,
55
+ agent_name: str,
56
+ role_desc: str,
57
+ history_messages: List[Message],
58
+ global_prompt: str = None,
59
+ request_msg: Message = None,
60
+ *args,
61
+ **kwargs,
62
+ ) -> str:
63
+ """
64
+ Format the input and call the Bard API.
65
+
66
+ args:
67
+ agent_name: the name of the agent
68
+ role_desc: the description of the role of the agent
69
+ env_desc: the description of the environment
70
+ history_messages: the history of the conversation, or the observation for the agent
71
+ request_msg: the request from the system to guide the agent's next response
72
+ """
73
+ all_messages = (
74
+ [(SYSTEM, global_prompt), (SYSTEM, role_desc)]
75
+ if global_prompt
76
+ else [(SYSTEM, role_desc)]
77
+ )
78
+
79
+ for message in history_messages:
80
+ all_messages.append((message.agent_name, message.content))
81
+ if request_msg:
82
+ all_messages.append((SYSTEM, request_msg.content))
83
+
84
+ # current bard api doesn't support role system, so just dump the raw messages as prompt
85
+ response = self._get_response(str(all_messages), *args, **kwargs)
86
+
87
+ # Remove the agent name if the response starts with it
88
+ response = re.sub(rf"^\s*\[{agent_name}]:?", "", response).strip()
89
+
90
+ return response
agentreview/backends/base.py ADDED
@@ -0,0 +1,66 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from abc import abstractmethod
2
+ from typing import List
3
+
4
+ from ..config import BackendConfig, Configurable
5
+ from ..message import Message
6
+
7
+
8
+ class IntelligenceBackend(Configurable):
9
+ """An abstraction of the intelligence source of the agents."""
10
+
11
+ stateful = None
12
+ type_name = None
13
+
14
+ @abstractmethod
15
+ def __init__(self, **kwargs):
16
+ super().__init__(**kwargs) # registers the arguments with Configurable
17
+
18
+ def __init_subclass__(cls, **kwargs):
19
+ # check if the subclass has the required attributes
20
+ for required in (
21
+ "stateful",
22
+ "type_name",
23
+ ):
24
+ if getattr(cls, required) is None:
25
+ raise TypeError(
26
+ f"Can't instantiate abstract class {cls.__name__} without {required} attribute defined"
27
+ )
28
+ return super().__init_subclass__(**kwargs)
29
+
30
+ def to_config(self) -> BackendConfig:
31
+ self._config_dict["backend_type"] = self.type_name
32
+ return BackendConfig(**self._config_dict)
33
+
34
+ @abstractmethod
35
+ def query(
36
+ self,
37
+ agent_name: str,
38
+ role_desc: str,
39
+ history_messages: List[Message],
40
+ global_prompt: str = None,
41
+ request_msg: Message = None,
42
+ *args,
43
+ **kwargs,
44
+ ) -> str:
45
+ raise NotImplementedError
46
+
47
+ @abstractmethod
48
+ async def async_query(
49
+ self,
50
+ agent_name: str,
51
+ role_desc: str,
52
+ history_messages: List[Message],
53
+ global_prompt: str = None,
54
+ request_msg: Message = None,
55
+ *args,
56
+ **kwargs,
57
+ ) -> str:
58
+ """Async querying."""
59
+ raise NotImplementedError
60
+
61
+ # reset the state of the backend
62
+ def reset(self):
63
+ if self.stateful:
64
+ raise NotImplementedError
65
+ else:
66
+ pass
agentreview/backends/cohere.py ADDED
@@ -0,0 +1,126 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ from typing import List
3
+
4
+ from tenacity import retry, stop_after_attempt, wait_random_exponential
5
+
6
+ from ..message import Message
7
+ from .base import IntelligenceBackend
8
+
9
+ # Try to import the cohere package and check whether the API key is set
10
+ try:
11
+ import cohere
12
+ except ImportError:
13
+ is_cohere_available = False
14
+ else:
15
+ if os.environ.get("COHEREAI_API_KEY") is None:
16
+ is_cohere_available = False
17
+ else:
18
+ is_cohere_available = True
19
+
20
+ # Default config follows the [Cohere documentation](https://cohere-sdk.readthedocs.io/en/latest/cohere.html#cohere.client.Client.chat)
21
+ DEFAULT_TEMPERATURE = 0.8
22
+ DEFAULT_MAX_TOKENS = 200
23
+ DEFAULT_MODEL = "command-xlarge"
24
+
25
+
26
+ class CohereAIChat(IntelligenceBackend):
27
+ """Interface to the Cohere API."""
28
+
29
+ stateful = True
30
+ type_name = "cohere-chat"
31
+
32
+ def __init__(
33
+ self,
34
+ temperature: float = DEFAULT_TEMPERATURE,
35
+ max_tokens: int = DEFAULT_MAX_TOKENS,
36
+ model: str = DEFAULT_MODEL,
37
+ **kwargs,
38
+ ):
39
+ super().__init__(
40
+ temperature=temperature, max_tokens=max_tokens, model=model, **kwargs
41
+ )
42
+
43
+ self.temperature = temperature
44
+ self.max_tokens = max_tokens
45
+ self.model = model
46
+
47
+ assert (
48
+ is_cohere_available
49
+ ), "Cohere package is not installed or the API key is not set"
50
+ self.client = cohere.Client(os.environ.get("COHEREAI_API_KEY"))
51
+
52
+ # Stateful variables
53
+ self.session_id = None # The session id for the last conversation
54
+ self.last_msg_hash = (
55
+ None # The hash of the last message of the last conversation
56
+ )
57
+
58
+ def reset(self):
59
+ self.session_id = None
60
+ self.last_msg_hash = None
61
+
62
+ @retry(stop=stop_after_attempt(6), wait=wait_random_exponential(min=1, max=60))
63
+ def _get_response(self, new_message: str, persona_prompt: str):
64
+ response = self.client.chat(
65
+ new_message,
66
+ persona_prompt=persona_prompt,
67
+ temperature=self.temperature,
68
+ max_tokens=self.max_tokens,
69
+ session_id=self.session_id,
70
+ )
71
+
72
+ self.session_id = response.session_id # Update the session id
73
+ return response.reply
74
+
75
+ def query(
76
+ self,
77
+ agent_name: str,
78
+ role_desc: str,
79
+ history_messages: List[Message],
80
+ global_prompt: str = None,
81
+ request_msg: Message = None,
82
+ *args,
83
+ **kwargs,
84
+ ) -> str:
85
+ """
86
+ Format the input and call the Cohere API.
87
+
88
+ args:
89
+ agent_name: the name of the agent
90
+ role_desc: the description of the role of the agent
91
+ env_desc: the description of the environment
92
+ history_messages: the history of the conversation, or the observation for the agent
93
+ request_msg: the request for the CohereAI
94
+ """
95
+ # Find the index of the last message of the last conversation
96
+ new_message_start_idx = 0
97
+ if self.last_msg_hash is not None:
98
+ for i, message in enumerate(history_messages):
99
+ if message.msg_hash == self.last_msg_hash:
100
+ new_message_start_idx = i + 1
101
+ break
102
+
103
+ new_messages = history_messages[new_message_start_idx:]
104
+ assert len(new_messages) > 0, "No new messages found (this should not happen)"
105
+
106
+ new_conversations = []
107
+ for message in new_messages:
108
+ if message.agent_name != agent_name:
109
+ # Since there are more than one player, we need to distinguish between the players
110
+ new_conversations.append(f"[{message.agent_name}]: {message.content}")
111
+
112
+ if request_msg:
113
+ new_conversations.append(
114
+ f"[{request_msg.agent_name}]: {request_msg.content}"
115
+ )
116
+
117
+ # Concatenate all new messages into one message because the Cohere API only accepts one message
118
+ new_message = "\n".join(new_conversations)
119
+ persona_prompt = f"Environment:\n{global_prompt}\n\nYour role:\n{role_desc}"
120
+
121
+ response = self._get_response(new_message, persona_prompt)
122
+
123
+ # Only update the last message hash if the API call is successful
124
+ self.last_msg_hash = new_messages[-1].msg_hash
125
+
126
+ return response
agentreview/backends/dummy.py ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from agentreview.config import Configurable
2
+
3
+
4
+ class Dummy(Configurable):
5
+ """A dummy backend does not make any API calls. We use it for extracting paper contents in PaperExtractor
6
+ and also for testing."""
7
+ stateful = False
8
+ type_name = "dummy"
9
+
10
+ def __init__(self, **kwargs):
11
+ super().__init__(**kwargs)
12
+
13
+ def reset(self):
14
+ pass
agentreview/backends/hf_transformers.py ADDED
@@ -0,0 +1,127 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ from contextlib import contextmanager, redirect_stderr, redirect_stdout
3
+ from typing import List
4
+
5
+ from tenacity import retry, stop_after_attempt, wait_random_exponential
6
+
7
+ from ..message import SYSTEM_NAME as SYSTEM
8
+ from ..message import Message
9
+ from .base import IntelligenceBackend
10
+
11
+
12
+ @contextmanager
13
+ def suppress_stdout_stderr():
14
+ """A context manager that redirects stdout and stderr to devnull."""
15
+ with open(os.devnull, "w") as fnull:
16
+ with redirect_stderr(fnull) as err, redirect_stdout(fnull) as out:
17
+ yield (err, out)
18
+
19
+
20
+ with suppress_stdout_stderr():
21
+ # Try to import the transformers package
22
+ try:
23
+ import transformers
24
+ from transformers import pipeline
25
+ from transformers.pipelines.conversational import (
26
+ Conversation,
27
+ ConversationalPipeline,
28
+ )
29
+ except ImportError:
30
+ is_transformers_available = False
31
+ else:
32
+ is_transformers_available = True
33
+
34
+
35
+ class TransformersConversational(IntelligenceBackend):
36
+ """Interface to the Transformers ConversationalPipeline."""
37
+
38
+ stateful = False
39
+ type_name = "transformers:conversational"
40
+
41
+ def __init__(self, model: str, device: int = -1, **kwargs):
42
+ super().__init__(model=model, device=device, **kwargs)
43
+ self.model = model
44
+ self.device = device
45
+
46
+ assert is_transformers_available, "Transformers package is not installed"
47
+ self.chatbot = pipeline(
48
+ task="conversational", model=self.model, device=self.device
49
+ )
50
+
51
+ @retry(stop=stop_after_attempt(6), wait=wait_random_exponential(min=1, max=60))
52
+ def _get_response(self, conversation):
53
+ conversation = self.chatbot(conversation)
54
+ response = conversation.generated_responses[-1]
55
+ return response
56
+
57
+ @staticmethod
58
+ def _msg_template(agent_name, content):
59
+ return f"[{agent_name}]: {content}"
60
+
61
+ def query(
62
+ self,
63
+ agent_name: str,
64
+ role_desc: str,
65
+ history_messages: List[Message],
66
+ global_prompt: str = None,
67
+ request_msg: Message = None,
68
+ *args,
69
+ **kwargs,
70
+ ) -> str:
71
+ user_inputs, generated_responses = [], []
72
+ all_messages = (
73
+ [(SYSTEM, global_prompt), (SYSTEM, role_desc)]
74
+ if global_prompt
75
+ else [(SYSTEM, role_desc)]
76
+ )
77
+
78
+ for msg in history_messages:
79
+ all_messages.append((msg.agent_name, msg.content))
80
+ if request_msg:
81
+ all_messages.append((SYSTEM, request_msg.content))
82
+
83
+ prev_is_user = False # Whether the previous message is from the user
84
+ for i, message in enumerate(all_messages):
85
+ if i == 0:
86
+ assert (
87
+ message[0] == SYSTEM
88
+ ) # The first message should be from the system
89
+
90
+ if message[0] != agent_name:
91
+ if not prev_is_user:
92
+ user_inputs.append(self._msg_template(message[0], message[1]))
93
+ else:
94
+ user_inputs[-1] += "\n" + self._msg_template(message[0], message[1])
95
+ prev_is_user = True
96
+ else:
97
+ if prev_is_user:
98
+ generated_responses.append(message[1])
99
+ else:
100
+ generated_responses[-1] += "\n" + message[1]
101
+ prev_is_user = False
102
+
103
+ assert len(user_inputs) == len(generated_responses) + 1
104
+ past_user_inputs = user_inputs[:-1]
105
+ new_user_input = user_inputs[-1]
106
+
107
+ # Recreate a conversation object from the history messages
108
+ conversation = Conversation(
109
+ text=new_user_input,
110
+ past_user_inputs=past_user_inputs,
111
+ generated_responses=generated_responses,
112
+ )
113
+
114
+ # Get the response
115
+ response = self._get_response(conversation)
116
+ return response
117
+
118
+
119
+ # conversation = Conversation("Going to the movies tonight - any suggestions?")
120
+ #
121
+ # # Steps usually performed by the model when generating a response:
122
+ # # 1. Mark the user input as processed (moved to the history)
123
+ # conversation.mark_processed()
124
+ # # 2. Append a mode response
125
+ # conversation.append_response("The Big lebowski.")
126
+ #
127
+ # conversation.add_user_input("Is it good?")
agentreview/backends/human.py ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from ..config import BackendConfig
2
+ from .base import IntelligenceBackend
3
+
4
+
5
+ # An Error class for the human backend
6
+ class HumanBackendError(Exception):
7
+ def __init__(self, agent_name: str):
8
+ self.agent_name = agent_name
9
+ super().__init__(f"Human backend requires a UI to get input from {agent_name}.")
10
+
11
+
12
+ class Human(IntelligenceBackend):
13
+ stateful = False
14
+ type_name = "human"
15
+
16
+ def __init__(self, **kwargs):
17
+ super().__init__(**kwargs)
18
+
19
+ def to_config(self) -> BackendConfig:
20
+ return BackendConfig(backend_type=self.type_name)
21
+
22
+ def query(self, agent_name: str, **kwargs) -> str:
23
+ raise HumanBackendError(agent_name)
agentreview/backends/langchain.py ADDED
@@ -0,0 +1,169 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import re
3
+ from typing import List
4
+
5
+ from tenacity import retry, stop_after_attempt, wait_random_exponential
6
+
7
+ from ..message import SYSTEM_NAME, Message
8
+ from .base import IntelligenceBackend
9
+
10
+ try:
11
+ from langchain.llms import OpenAI
12
+ except ImportError:
13
+ is_langchain_openai_available = False
14
+ # logging.warning("openai package is not installed")
15
+ else:
16
+ api_key = os.environ.get("OPENAI_API_KEY")
17
+ if api_key is None:
18
+ # logging.warning("OpenAI API key is not set. Please set the environment variable OPENAI_API_KEY")
19
+ is_langchain_openai_available = False
20
+ else:
21
+ is_langchain_openai_available = True
22
+
23
+ # Default config follows the OpenAI playground
24
+ DEFAULT_TEMPERATURE = 0.7
25
+ DEFAULT_MAX_TOKENS = 2048
26
+ DEFAULT_MODEL = "gpt-4"
27
+
28
+ END_OF_MESSAGE = "<EOS>" # End of message token specified by us not OpenAI
29
+ STOP = ("<|endoftext|>", END_OF_MESSAGE) # End of sentence token
30
+ BASE_PROMPT = f"The messages always end with the token {END_OF_MESSAGE}."
31
+
32
+
33
+ class LangChainOpenAIChat(IntelligenceBackend):
34
+ """Interface to the ChatGPT style model with system, user, assistant roles separation."""
35
+
36
+ stateful = False
37
+ type_name = "openai-chat"
38
+
39
+ def __init__(
40
+ self,
41
+ temperature: float = DEFAULT_TEMPERATURE,
42
+ max_tokens: int = DEFAULT_MAX_TOKENS,
43
+ model: str = DEFAULT_MODEL,
44
+ merge_other_agents_as_one_user: bool = True,
45
+ **kwargs,
46
+ ):
47
+ """
48
+ Instantiate the OpenAIChat backend.
49
+
50
+ args:
51
+ temperature: the temperature of the sampling
52
+ max_tokens: the maximum number of tokens to sample
53
+ model: the model to use
54
+ merge_other_agents_as_one_user: whether to merge messages from other agents as one user message
55
+ """
56
+ assert (
57
+ is_langchain_openai_available
58
+ ), "langchain package is not installed or the API key is not set"
59
+ super().__init__(
60
+ temperature=temperature,
61
+ max_tokens=max_tokens,
62
+ model=model,
63
+ merge_other_agents_as_one_user=merge_other_agents_as_one_user,
64
+ **kwargs,
65
+ )
66
+
67
+ self.temperature = temperature
68
+ self.max_tokens = max_tokens
69
+ self.model = model
70
+ self.merge_other_agent_as_user = merge_other_agents_as_one_user
71
+ self.llm = OpenAI(
72
+ model_name=model,
73
+ temperature=temperature,
74
+ max_tokens=max_tokens,
75
+ openai_api_key=api_key,
76
+ )
77
+
78
+ @retry(stop=stop_after_attempt(6), wait=wait_random_exponential(min=1, max=60))
79
+ def _get_response(self, messages):
80
+ response = self.llm(prompt=messages, stop=STOP)
81
+ return response
82
+
83
+ def query(
84
+ self,
85
+ agent_name: str,
86
+ role_desc: str,
87
+ history_messages: List[Message],
88
+ global_prompt: str = None,
89
+ request_msg: Message = None,
90
+ *args,
91
+ **kwargs,
92
+ ) -> str:
93
+ """
94
+ Format the input and call the ChatGPT/GPT-4 API.
95
+
96
+ args:
97
+ agent_name: the name of the agent
98
+ role_desc: the description of the role of the agent
99
+ env_desc: the description of the environment
100
+ history_messages: the history of the conversation, or the observation for the agent
101
+ request_msg: the request from the system to guide the agent's next response
102
+ """
103
+
104
+ # Merge the role description and the global prompt as the system prompt for the agent
105
+ if global_prompt: # Prepend the global prompt if it exists
106
+ system_prompt = f"{global_prompt.strip()}\n{BASE_PROMPT}\n\nYour name: {agent_name}\n\nYour role:{role_desc}"
107
+ else:
108
+ system_prompt = (
109
+ f"You are {agent_name}.\n\nYour role:{role_desc}\n\n{BASE_PROMPT}"
110
+ )
111
+
112
+ all_messages = [(SYSTEM_NAME, system_prompt)]
113
+ for msg in history_messages:
114
+ if msg.agent_name == SYSTEM_NAME:
115
+ all_messages.append((SYSTEM_NAME, msg.content))
116
+ else: # non-system messages are suffixed with the end of message token
117
+ all_messages.append((msg.agent_name, f"{msg.content}{END_OF_MESSAGE}"))
118
+
119
+ if request_msg:
120
+ all_messages.append((SYSTEM_NAME, request_msg.content))
121
+ else: # The default request message that reminds the agent its role and instruct it to speak
122
+ all_messages.append(
123
+ (SYSTEM_NAME, f"Now you speak, {agent_name}.{END_OF_MESSAGE}")
124
+ )
125
+
126
+ messages = []
127
+ for i, msg in enumerate(all_messages):
128
+ if i == 0:
129
+ assert (
130
+ msg[0] == SYSTEM_NAME
131
+ ) # The first message should be from the system
132
+ messages.append({"role": "system", "content": msg[1]})
133
+ else:
134
+ if msg[0] == agent_name:
135
+ messages.append({"role": "assistant", "content": msg[1]})
136
+ else:
137
+ if messages[-1]["role"] == "user": # last message is from user
138
+ if self.merge_other_agent_as_user:
139
+ messages[-1][
140
+ "content"
141
+ ] = f"{messages[-1]['content']}\n\n[{msg[0]}]: {msg[1]}"
142
+ else:
143
+ messages.append(
144
+ {"role": "user", "content": f"[{msg[0]}]: {msg[1]}"}
145
+ )
146
+ elif (
147
+ messages[-1]["role"] == "assistant"
148
+ ): # consecutive assistant messages
149
+ # Merge the assistant messages
150
+ messages[-1]["content"] = f"{messages[-1]['content']}\n{msg[1]}"
151
+ elif messages[-1]["role"] == "system":
152
+ messages.append(
153
+ {"role": "user", "content": f"[{msg[0]}]: {msg[1]}"}
154
+ )
155
+ else:
156
+ raise ValueError(f"Invalid role: {messages[-1]['role']}")
157
+
158
+ response = self._get_response(messages, *args, **kwargs)
159
+
160
+ # Remove the agent name if the response starts with it
161
+ response = re.sub(rf"^\s*\[.*]:", "", response).strip() # noqa: F541
162
+ response = re.sub(
163
+ rf"^\s*{re.escape(agent_name)}\s*:", "", response
164
+ ).strip() # noqa: F541
165
+
166
+ # Remove the tailing end of message token
167
+ response = re.sub(rf"{END_OF_MESSAGE}$", "", response).strip()
168
+
169
+ return response
agentreview/backends/openai.py ADDED
@@ -0,0 +1,180 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import re
2
+ from typing import List
3
+
4
+ from tenacity import retry, stop_after_attempt, wait_random_exponential
5
+
6
+ from arguments import parse_args
7
+ from utility.authentication_utils import get_openai_client
8
+ from .base import IntelligenceBackend
9
+ from ..message import SYSTEM_NAME, Message
10
+
11
+ args = parse_args()
12
+
13
+ client = get_openai_client(client_type=args.openai_client_type)
14
+
15
+ OPENAI_CLIENT_TYPE = args.openai_client_type
16
+
17
+ # Default config follows the OpenAI playground
18
+ DEFAULT_TEMPERATURE = 1.0
19
+ DEFAULT_MAX_TOKENS = 4096
20
+
21
+ # Check https://platform.openai.com/docs/models for more models
22
+
23
+ DEFAULT_MODEL = "gpt-4o"
24
+
25
+ END_OF_MESSAGE = "<EOS>" # End of message token specified by us not OpenAI
26
+ STOP = ("<|endoftext|>", END_OF_MESSAGE) # End of sentence token
27
+ BASE_PROMPT = f"The messages always end with the token {END_OF_MESSAGE}."
28
+
29
+
30
+ class OpenAIChat(IntelligenceBackend):
31
+ """Interface to the ChatGPT style model with system, user, assistant roles separation."""
32
+
33
+ stateful = False
34
+ type_name = "openai-chat"
35
+
36
+ def __init__(
37
+ self,
38
+ temperature: float = DEFAULT_TEMPERATURE,
39
+ max_tokens: int = DEFAULT_MAX_TOKENS,
40
+ model: str = DEFAULT_MODEL,
41
+ merge_other_agents_as_one_user: bool = True,
42
+ **kwargs,
43
+ ):
44
+ """
45
+ Instantiate the OpenAIChat backend.
46
+
47
+ args:
48
+ temperature: the temperature of the sampling
49
+ max_tokens: the maximum number of tokens to sample
50
+ model: the model to use
51
+ merge_other_agents_as_one_user: whether to merge messages from other agents as one user message
52
+ """
53
+ super().__init__(
54
+ temperature=temperature,
55
+ max_tokens=max_tokens,
56
+ model=model,
57
+ merge_other_agents_as_one_user=merge_other_agents_as_one_user,
58
+ **kwargs,
59
+ )
60
+
61
+ self.temperature = temperature
62
+ self.max_tokens = max_tokens
63
+ self.model = model
64
+ self.merge_other_agent_as_user = merge_other_agents_as_one_user
65
+
66
+ @retry(stop=stop_after_attempt(6), wait=wait_random_exponential(min=1, max=60))
67
+ def _get_response(self, messages):
68
+ # Refer to https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/switching-endpoints for how to
69
+ # make API calls
70
+
71
+ if OPENAI_CLIENT_TYPE == "openai":
72
+ completion = client.chat.completions.create(
73
+ model=self.model,
74
+ messages=messages,
75
+ temperature=self.temperature,
76
+ max_tokens=self.max_tokens,
77
+ stop=STOP,
78
+ )
79
+
80
+ elif OPENAI_CLIENT_TYPE == "azure_openai":
81
+ completion = client.chat.completions.create(
82
+ model=self.model,
83
+ messages=messages,
84
+ temperature=self.temperature,
85
+ max_tokens=self.max_tokens,
86
+ stop=STOP,
87
+ )
88
+
89
+ else:
90
+ raise NotImplementedError
91
+
92
+ response = completion.choices[0].message.content
93
+ response = response.strip()
94
+ return response
95
+
96
+ def query(
97
+ self,
98
+ agent_name: str,
99
+ role_desc: str,
100
+ history_messages: List[Message],
101
+ global_prompt: str = None,
102
+ request_msg: Message = None,
103
+ *args,
104
+ **kwargs,
105
+ ) -> str:
106
+ """
107
+ Format the input and call the ChatGPT/GPT-4 API.
108
+
109
+ args:
110
+ agent_name: the name of the agent
111
+ role_desc: the description of the role of the agent
112
+ env_desc: the description of the environment
113
+ history_messages: the history of the conversation, or the observation for the agent
114
+ request_msg: the request from the system to guide the agent's next response
115
+ """
116
+
117
+ # Merge the role description and the global prompt as the system prompt for the agent
118
+ if global_prompt: # Prepend the global prompt if it exists
119
+ system_prompt = f"You are a helpful assistant.\n{global_prompt.strip()}\n{BASE_PROMPT}\n\nYour name is {agent_name}.\n\nYour role:{role_desc}"
120
+ else:
121
+ system_prompt = f"You are a helpful assistant. Your name is {agent_name}.\n\nYour role:{role_desc}\n\n{BASE_PROMPT}"
122
+
123
+ all_messages = [(SYSTEM_NAME, system_prompt)]
124
+ for msg in history_messages:
125
+ if msg.agent_name == SYSTEM_NAME:
126
+ all_messages.append((SYSTEM_NAME, msg.content))
127
+ else: # non-system messages are suffixed with the end of message token
128
+ all_messages.append((msg.agent_name, f"{msg.content}{END_OF_MESSAGE}"))
129
+
130
+ if request_msg:
131
+ all_messages.append((SYSTEM_NAME, request_msg.content))
132
+ else: # The default request message that reminds the agent its role and instruct it to speak
133
+ all_messages.append(
134
+ (SYSTEM_NAME, f"Now you speak, {agent_name}.{END_OF_MESSAGE}")
135
+ )
136
+
137
+ messages = []
138
+ for i, msg in enumerate(all_messages):
139
+ if i == 0:
140
+ assert (
141
+ msg[0] == SYSTEM_NAME
142
+ ) # The first message should be from the system
143
+ messages.append({"role": "system", "content": msg[1]})
144
+ else:
145
+ if msg[0] == agent_name:
146
+ messages.append({"role": "assistant", "content": msg[1]})
147
+ else:
148
+ if messages[-1]["role"] == "user": # last message is from user
149
+ if self.merge_other_agent_as_user:
150
+ messages[-1][
151
+ "content"
152
+ ] = f"{messages[-1]['content']}\n\n[{msg[0]}]: {msg[1]}"
153
+ else:
154
+ messages.append(
155
+ {"role": "user", "content": f"[{msg[0]}]: {msg[1]}"}
156
+ )
157
+ elif (
158
+ messages[-1]["role"] == "assistant"
159
+ ): # consecutive assistant messages
160
+ # Merge the assistant messages
161
+ messages[-1]["content"] = f"{messages[-1]['content']}\n{msg[1]}"
162
+ elif messages[-1]["role"] == "system":
163
+ messages.append(
164
+ {"role": "user", "content": f"[{msg[0]}]: {msg[1]}"}
165
+ )
166
+ else:
167
+ raise ValueError(f"Invalid role: {messages[-1]['role']}")
168
+
169
+ response = self._get_response(messages, *args, **kwargs)
170
+
171
+ # Remove the agent name if the response starts with it
172
+ response = re.sub(rf"^\s*\[.*]:", "", response).strip() # noqa: F541
173
+ response = re.sub(
174
+ rf"^\s*{re.escape(agent_name)}\s*:", "", response
175
+ ).strip() # noqa: F451
176
+
177
+ # Remove the tailing end of message token
178
+ response = re.sub(rf"{END_OF_MESSAGE}$", "", response).strip()
179
+
180
+ return response
agentreview/config.py ADDED
@@ -0,0 +1,143 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import copy
2
+ import json
3
+
4
+ from .utils import AttributedDict
5
+
6
+
7
+ class Config(AttributedDict):
8
+ """
9
+ Config class to manage the configuration of the games.
10
+
11
+ The class has a few useful methods to load and save the config.
12
+ """
13
+
14
+ # convert dict to Config recursively
15
+ def __init__(self, *args, **kwargs):
16
+ super().__init__(*args, **kwargs)
17
+ for key, value in self.items():
18
+
19
+ # Try to convert the value (the "metadata" field) to dict if applicable
20
+ try:
21
+ value = dict(eval(value))
22
+ except Exception:
23
+ pass
24
+
25
+ if isinstance(value, dict):
26
+ self[key] = init_config(value) # convert dict to Config recursively
27
+ # convert list of dict to list of Config recursively
28
+ elif isinstance(value, list) and len(value) > 0:
29
+ self[key] = [
30
+ init_config(item) if isinstance(item, dict) else item
31
+ for item in value
32
+ ]
33
+
34
+ def save(self, path: str):
35
+ # save config to file
36
+ with open(path, "w") as f:
37
+ json.dump(self, f, indent=4)
38
+
39
+ @classmethod
40
+ def load(cls, path: str):
41
+ # load config from file
42
+ with open(path) as f:
43
+ config = json.load(f)
44
+ return cls(config)
45
+
46
+ def deepcopy(self):
47
+ # get the config class so that subclasses can be copied in the correct class
48
+ config_class = self.__class__
49
+ # make a deep copy of the config
50
+ return config_class(copy.deepcopy(self))
51
+
52
+
53
+ class Configurable:
54
+ """Configurable is an interface for classes that can be initialized with a config."""
55
+
56
+ def __init__(self, **kwargs):
57
+ self._config_dict = kwargs
58
+
59
+ @classmethod
60
+ def from_config(cls, config: Config):
61
+ return cls(**config)
62
+
63
+ def to_config(self) -> Config:
64
+ # Convert the _config_dict to Config
65
+ return Config(**self._config_dict)
66
+
67
+ def save_config(self, path: str):
68
+ self.to_config().save(path)
69
+
70
+
71
+ class EnvironmentConfig(Config):
72
+ """EnvironmentConfig contains a env_type field to indicate the name of the environment."""
73
+
74
+ def __init__(self, *args, **kwargs):
75
+ super().__init__(*args, **kwargs)
76
+ # check if the env_type field is specified
77
+ if "env_type" not in self:
78
+ raise ValueError("The env_type field is not specified")
79
+
80
+
81
+ class BackendConfig(Config):
82
+ """BackendConfig contains a backend_type field to indicate the name of the backend."""
83
+
84
+ def __init__(self, *args, **kwargs):
85
+ super().__init__(*args, **kwargs)
86
+ # check if the backend_type field is specified
87
+ if "backend_type" not in self:
88
+ raise ValueError("The backend_type field is not specified")
89
+
90
+
91
+ class AgentConfig(Config):
92
+ """AgentConfig contains role_desc and backend fields."""
93
+
94
+ def __init__(self, *args, **kwargs):
95
+ super().__init__(*args, **kwargs)
96
+ # check if the role_desc field is specified
97
+ if "role_desc" not in self:
98
+ raise ValueError("The role_desc field is not specified")
99
+ # check if the backend field is specified
100
+ if "backend" not in self:
101
+ raise ValueError("The backend field is not specified")
102
+ # Make sure the backend field is a BackendConfig
103
+ if not isinstance(self["backend"], BackendConfig):
104
+ raise ValueError("The backend field must be a BackendConfig")
105
+
106
+
107
+ class ArenaConfig(Config):
108
+ """ArenaConfig contains a list of AgentConfig."""
109
+
110
+ def __init__(self, *args, **kwargs):
111
+ super().__init__(*args, **kwargs)
112
+ # check if the players field is specified and it is List[AgentConfig]
113
+ if "players" not in self:
114
+ raise ValueError("The players field is not specified")
115
+ if not isinstance(self["players"], list):
116
+ raise ValueError("The players field must be a list")
117
+ for player in self["players"]:
118
+ if not isinstance(player, AgentConfig):
119
+ raise ValueError("The players field must be a list of AgentConfig")
120
+
121
+ # check if environment field is specified and it is EnvironmentConfig
122
+ if "environment" not in self:
123
+ raise ValueError("The environment field is not specified")
124
+ if not isinstance(self["environment"], EnvironmentConfig):
125
+ raise ValueError("The environment field must be an EnvironmentConfig")
126
+
127
+
128
+ # Initialize with different config class depending on whether the config is for environment or backend
129
+ def init_config(config: dict):
130
+ if not isinstance(config, dict):
131
+ raise ValueError("The config must be a dict")
132
+
133
+ # check if the config is for environment or backend
134
+ if "env_type" in config:
135
+ return EnvironmentConfig(config)
136
+ elif "backend_type" in config:
137
+ return BackendConfig(config)
138
+ elif "role_desc" in config:
139
+ return AgentConfig(config)
140
+ elif "players" in config:
141
+ return ArenaConfig(config)
142
+ else:
143
+ return Config(config)
agentreview/database.py ADDED
@@ -0,0 +1,136 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Datastore module for chat_arena.
3
+
4
+ This module provides utilities for storing the messages and the game results into database.
5
+ Currently, it supports Supabase.
6
+ """
7
+ import json
8
+ import os
9
+ import uuid
10
+ from typing import List
11
+
12
+ from .arena import Arena
13
+ from .message import Message
14
+
15
+ # Attempt importing Supabase
16
+ try:
17
+ import supabase
18
+
19
+ # Get the Supabase URL and secret key from environment variables
20
+ SUPABASE_URL = os.environ.get("SUPABASE_URL", "")
21
+ SUPABASE_SECRET_KEY = os.environ.get("SUPABASE_SECRET_KEY", "")
22
+ assert SUPABASE_URL and SUPABASE_SECRET_KEY
23
+ except Exception:
24
+ supabase_available = False
25
+ else:
26
+ supabase_available = True
27
+
28
+
29
+ # Store the messages into the Supabase database
30
+ class SupabaseDB:
31
+ def __init__(self):
32
+ assert supabase_available and SUPABASE_URL and SUPABASE_SECRET_KEY
33
+ supabase_client = supabase.create_client(SUPABASE_URL, SUPABASE_SECRET_KEY)
34
+ self.client = supabase_client
35
+
36
+ # Save Arena state to Supabase
37
+ def save_arena(self, arena: Arena):
38
+ # Save the environment config
39
+ self._save_environment(arena)
40
+
41
+ # Save the player configs
42
+ self._save_player_configs(arena)
43
+
44
+ # Save the messages
45
+ self.save_messages(arena)
46
+
47
+ # Save the environment config of the arena
48
+ def _save_environment(self, arena: Arena):
49
+ env = arena.environment
50
+ env_config = env.to_config()
51
+ moderator_config = env_config.pop("moderator", None)
52
+
53
+ arena_row = {
54
+ "arena_id": str(arena.uuid),
55
+ "global_prompt": arena.global_prompt,
56
+ "env_type": env_config["env_type"],
57
+ "env_config": json.dumps(env_config),
58
+ }
59
+ self.client.table("Arena").insert(arena_row).execute()
60
+
61
+ # Get the moderator config
62
+ if moderator_config:
63
+ moderator_row = {
64
+ "moderator_id": str(
65
+ uuid.uuid5(arena.uuid, json.dumps(moderator_config))
66
+ ),
67
+ "arena_id": str(arena.uuid),
68
+ "role_desc": moderator_config["role_desc"],
69
+ "terminal_condition": moderator_config["terminal_condition"],
70
+ "backend_type": moderator_config["backend"]["backend_type"],
71
+ "temperature": moderator_config["backend"]["temperature"],
72
+ "max_tokens": moderator_config["backend"]["max_tokens"],
73
+ }
74
+ self.client.table("Moderator").insert(moderator_row).execute()
75
+
76
+ # Save the player configs of the arena
77
+ def _save_player_configs(self, arena: Arena):
78
+ player_rows = []
79
+ for player in arena.players:
80
+ player_config = player.to_config()
81
+ player_row = {
82
+ "player_id": str(uuid.uuid5(arena.uuid, json.dumps(player_config))),
83
+ "arena_id": str(arena.uuid),
84
+ "name": player.name,
85
+ "role_desc": player_config["role_desc"],
86
+ "backend_type": player_config["backend"]["backend_type"],
87
+ "temperature": player_config["backend"].get("temperature", None),
88
+ "max_tokens": player_config["backend"].get("max_tokens", None),
89
+ }
90
+ player_rows.append(player_row)
91
+
92
+ self.client.table("Player").insert(player_rows).execute()
93
+
94
+ # Save the messages
95
+ def save_messages(self, arena: Arena, messages: List[Message] = None):
96
+ if messages is None:
97
+ messages = arena.environment.get_observation()
98
+
99
+ # Filter messages that are already logged
100
+ messages = [msg for msg in messages if not msg.logged]
101
+
102
+ message_rows = []
103
+ for message in messages:
104
+ message_row = {
105
+ "message_id": str(uuid.uuid5(arena.uuid, message.msg_hash)),
106
+ "arena_id": str(arena.uuid),
107
+ "agent_name": message.agent_name,
108
+ "content": message.content,
109
+ "turn": message.turn,
110
+ "timestamp": str(message.timestamp),
111
+ "msg_type": message.msg_type,
112
+ "visible_to": json.dumps(message.visible_to),
113
+ }
114
+ message_rows.append(message_row)
115
+
116
+ self.client.table("Message").insert(message_rows).execute()
117
+
118
+ # Mark the messages as logged
119
+ for message in messages:
120
+ message.logged = True
121
+
122
+
123
+ # Log the arena results into the Supabase database
124
+ def log_arena(arena: Arena, database=None):
125
+ if database is None:
126
+ pass
127
+ else:
128
+ database.save_arena(arena)
129
+
130
+
131
+ # Log the messages into the Supabase database
132
+ def log_messages(arena: Arena, messages: List[Message], database=None):
133
+ if database is None:
134
+ pass
135
+ else:
136
+ database.save_messages(arena, messages)
agentreview/dataset/__init__.py ADDED
File without changes
agentreview/dataset/download_openreview_paper.py ADDED
@@ -0,0 +1,136 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Download all papers from one year of ICLR conference using OpenReview API.
3
+
4
+ This script downloads all paper PDFs and their corresponding metadata
5
+ from the ICLR 2023 conference using the OpenReview API.
6
+
7
+ Alternative methods to download can be found in this
8
+ [colab notebook](https://colab.research.google.com/drive/1vXXNxn8lnO3j1dgoidjybbKIN0DW0Bt2),
9
+ though it's not used here.
10
+ """
11
+
12
+ import glob
13
+ import json
14
+ import os
15
+ import time
16
+ import requests
17
+
18
+ from arguments import parse_args
19
+
20
+ try:
21
+ import openreview
22
+ except ImportError:
23
+ raise ImportError("Please install openreview package using `pip install openreview-py`")
24
+
25
+ def download_papers():
26
+ """Downloads all papers from ICLR 2023 using OpenReview API.
27
+
28
+ This function authenticates with the OpenReview API using environment
29
+ variables for the username and password. It then iterates through the
30
+ available papers, downloads the PDF, and saves the corresponding metadata
31
+ (in JSON format) in the specified directories.
32
+
33
+ Raises:
34
+ AssertionError: If the OPENREVIEW_USERNAME or OPENREVIEW_PASSWORD environment
35
+ variables are not set.
36
+ AssertionError: If the conference argument is not for ICLR.
37
+ """
38
+
39
+ args = parse_args()
40
+
41
+ openreview_username = os.environ.get("OPENREVIEW_USERNAME")
42
+ openreview_password = os.environ.get("OPENREVIEW_PASSWORD")
43
+
44
+ assert openreview_username is not None, (
45
+ "Please set your OpenReview username through the OPENREVIEW_USERNAME environment variable."
46
+ )
47
+ assert openreview_password is not None, (
48
+ "Please set your OpenReview password through the OPENREVIEW_PASSWORD environment variable."
49
+ )
50
+
51
+ client = openreview.Client(
52
+ baseurl='https://api.openreview.net',
53
+ username=openreview_username,
54
+ password=openreview_password
55
+ )
56
+
57
+ page_size = 1000
58
+ offset = 0
59
+ papers_directory = os.path.join(args.data_dir, args.conference, "paper")
60
+ notes_directory = os.path.join(args.data_dir, args.conference, "notes")
61
+
62
+ assert "ICLR" in args.conference, "Only works for ICLR conferences!"
63
+ year = int(args.conference.split("ICLR")[-1]) # Only works for ICLR currently
64
+ ids = []
65
+
66
+ # Create directories if they don't exist
67
+ for path in [papers_directory, notes_directory]:
68
+ os.makedirs(path, exist_ok=True)
69
+
70
+ while True:
71
+ # Fetch submissions with pagination
72
+ notes = client.get_notes(
73
+ invitation=f'ICLR.cc/{year}/Conference/-/Blind_Submission',
74
+ details='all',
75
+ offset=offset,
76
+ limit=page_size
77
+ )
78
+
79
+ if not notes:
80
+ break # Exit if no more notes are available
81
+
82
+ # Get existing paper IDs to avoid re-downloading
83
+ existing_papers = glob.glob(f"{papers_directory}/*.pdf")
84
+ existing_paper_ids = {int(os.path.basename(paper).split(".pdf")[0]) for paper in existing_papers}
85
+
86
+ for note in notes:
87
+ paper_id = note.number
88
+ paper_path = os.path.join(papers_directory, f"{paper_id}.pdf")
89
+ note_path = os.path.join(notes_directory, f"{paper_id}.json")
90
+
91
+ # Skip existing papers
92
+ if paper_id in existing_paper_ids:
93
+ print(f"Paper {paper_id} already downloaded.")
94
+ continue
95
+
96
+ print(f"Title: {note.content.get('title', 'N/A')}")
97
+ print(f"Abstract: {note.content.get('abstract', 'N/A')}")
98
+ print(f"TL;DR: {note.content.get('TL;DR', 'N/A')}")
99
+ pdf_link = f"https://openreview.net/pdf?id={note.id}"
100
+ print(f"PDF Link: {pdf_link}")
101
+
102
+ # Attempt to download the paper PDF, retry if fails
103
+ tries = 0
104
+ while tries < 10:
105
+ try:
106
+ response = requests.get(pdf_link)
107
+
108
+ if response.status_code == 200:
109
+
110
+ with open(paper_path, "wb") as pdf_file:
111
+ pdf_file.write(response.content)
112
+
113
+ print(f"PDF downloaded successfully as {paper_path}")
114
+
115
+ # Save metadata as JSON, which contains the reviews, rebuttals, and decisions.
116
+ with open(note_path, "w") as note_file:
117
+ json.dump(note.to_json(), note_file, indent=2)
118
+
119
+ break
120
+
121
+ else:
122
+ print(f"Attempt {tries} failed. Status code: {response.status_code}")
123
+ if response.status_code == 429: # Too many requests
124
+ print("Too many requests. Sleeping for 10 seconds.")
125
+ time.sleep(10)
126
+
127
+ except Exception as e:
128
+ print(f"Attempt {tries} failed with error: {e}")
129
+
130
+ tries += 1
131
+
132
+ offset += page_size
133
+
134
+
135
+ if __name__ == "__main__":
136
+ download_papers()
agentreview/dataset/process_submissions.py ADDED
@@ -0,0 +1,113 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Process and classify ICLR submissions using OpenReview API.
3
+
4
+ This script processes ICLR submissions, classifies them into subdirectories
5
+ based on decisions, extracts paper content into JSON format, and checks the
6
+ validity of the processed papers.
7
+
8
+ It includes three main functions:
9
+ - classify_ICLR_submissions_into_subdirectories: Classifies papers into
10
+ directories based on decisions.
11
+ - process_submission: Processes each submission by extracting text and saving
12
+ it as a JSON file.
13
+ - check_processed_paper: Verifies if all processed papers are valid JSON files.
14
+ """
15
+
16
+ import os
17
+ import sys
18
+ import traceback
19
+ from collections import Counter
20
+
21
+ from tqdm import tqdm
22
+
23
+ sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), '..')))
24
+
25
+ import const
26
+ from arguments import parse_args
27
+ from utility.utils import print_colored
28
+
29
+ decision_map = {
30
+ # ICLR 2023
31
+ "Reject": "Reject",
32
+ "Accept: poster": "Accept-poster",
33
+ "Accept: notable-top-25%": "Accept-notable-top-25",
34
+ "Accept: notable-top-5%": "Accept-notable-top-5",
35
+
36
+ # ICLR 2022
37
+ "Accept (Poster)": "Accept-poster",
38
+ "Accept (Oral)": "Accept-oral",
39
+ "Accept (Spotlight)": "Accept-spotlight",
40
+
41
+ # ICLR 2021
42
+ "Significant concerns (Do not publish)": "Significant-concerns",
43
+ "Concerns raised (can publish with adjustment)": "Concerns-raised",
44
+
45
+ # ICLR 2020
46
+ "Accept (Talk)": "Accept-oral", # We assume this signifies an oral presentation
47
+
48
+ # ICLR 2018
49
+ "Invite to Workshop Track": "Reject"
50
+ }
51
+
52
+
53
+ def categorize_ICLR_submissions_into_subdirectories():
54
+ """Classifies ICLR submissions into subdirectories based on review decisions.
55
+
56
+ This function iterates through the review notes and identifies the decision
57
+ (recommendation or final decision) for each submission. It then moves the
58
+ notes and their corresponding papers into directories based on the decision.
59
+
60
+ Raises:
61
+ AssertionError: If the line containing the decision does not have the
62
+ expected format.
63
+ """
64
+ note_dir = f"data/{args.conference}/notes"
65
+ paper_dir = f"data/{args.conference}/paper"
66
+
67
+ for note in os.listdir(note_dir):
68
+ print(note)
69
+
70
+ # Skip directories or irrelevant files
71
+ if os.path.isdir(os.path.join(note_dir, note)) or ".DS_Store" in note:
72
+ continue
73
+
74
+ note_path = os.path.join(note_dir, note)
75
+ lines = open(note_path, "r").readlines()
76
+ decision = None
77
+
78
+ for line in tqdm(lines):
79
+ if "\"recommendation\"" in line:
80
+ assert Counter(line)["\""] == 4, "Unexpected format in recommendation line."
81
+ print(line)
82
+ decision = line.split("\"recommendation\"")[1].split("\"")[1]
83
+ break
84
+
85
+ elif "\"decision\"" in line:
86
+ assert Counter(line)["\""] == 4, "Unexpected format in decision line."
87
+ print(line)
88
+ try:
89
+ decision = line.split("\"decision\"")[1].split("\"")[1]
90
+ break
91
+ except Exception:
92
+ traceback.print_exc()
93
+ print_colored(line, 'red')
94
+
95
+ if decision is None:
96
+ # Possibly withdrawn papers
97
+ print_colored(f"Could not find decision for {note}", "red")
98
+ continue
99
+
100
+ os.makedirs(os.path.join(note_dir, decision_map[decision]), exist_ok=True)
101
+ os.makedirs(os.path.join(paper_dir, decision_map[decision]), exist_ok=True)
102
+ os.rename(note_path, os.path.join(note_dir, decision_map[decision], note))
103
+
104
+ paper_id = int(note.split(".json")[0])
105
+ paper_path = os.path.join(paper_dir, f"{paper_id}.pdf")
106
+ os.rename(paper_path, os.path.join(paper_dir, decision_map[decision], f"{paper_id}.pdf"))
107
+
108
+
109
+ if __name__ == "__main__":
110
+ args = parse_args()
111
+
112
+ # Extract contents of each paper into a JSON file
113
+ categorize_ICLR_submissions_into_subdirectories()
agentreview/environments/__init__.py ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from ..config import EnvironmentConfig
2
+ from .base import Environment, TimeStep
3
+ from .conversation import Conversation, ModeratedConversation
4
+ from .paper_review import PaperReview
5
+ from .paper_decision import PaperDecision
6
+
7
+ ALL_ENVIRONMENTS = [
8
+ Conversation,
9
+ ModeratedConversation,
10
+ PaperReview,
11
+ PaperDecision,
12
+ ]
13
+
14
+ ENV_REGISTRY = {env.type_name: env for env in ALL_ENVIRONMENTS}
15
+
16
+
17
+ # Load an environment from a config dictionary
18
+ def load_environment(config: EnvironmentConfig):
19
+ try:
20
+ env_cls = ENV_REGISTRY[config["env_type"]]
21
+ except KeyError:
22
+ raise ValueError(f"Unknown environment type: {config['env_type']}")
23
+
24
+ env = env_cls.from_config(config)
25
+ return env
agentreview/environments/base.py ADDED
@@ -0,0 +1,188 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from abc import abstractmethod
2
+ from dataclasses import dataclass
3
+ from typing import Dict, List
4
+
5
+ from ..config import Configurable, EnvironmentConfig
6
+ from ..message import Message
7
+ from ..utils import AttributedDict
8
+
9
+
10
+ @dataclass
11
+ class TimeStep(AttributedDict):
12
+ """
13
+ Represents a single step in time within the simulation.
14
+
15
+ It includes observation, reward, and terminal state.
16
+
17
+ Attributes:
18
+ observation (List[Message]): A list of messages (observations) for the current timestep.
19
+ reward (Dict[str, float]): A dictionary with player names as keys and corresponding rewards as values.
20
+ terminal (bool): A boolean indicating whether the current state is terminal (end of episode).
21
+ """
22
+
23
+ observation: List[Message]
24
+ reward: Dict[str, float]
25
+ terminal: bool
26
+
27
+
28
+ class Environment(Configurable):
29
+ """
30
+ Abstract class representing an environment.
31
+
32
+ It defines the necessary methods any environment must implement.
33
+
34
+ Inherits from:
35
+ Configurable: A custom class that provides methods to handle configuration settings.
36
+
37
+ Attributes:
38
+ type_name (str): Type of the environment, typically set to the lower case of the class name.
39
+
40
+ Note:
41
+ Subclasses should override and implement the abstract methods defined here.
42
+ """
43
+
44
+ type_name = None
45
+ phase_index = 0
46
+ task = None
47
+ @abstractmethod
48
+ def __init__(self, player_names: List[str], **kwargs):
49
+ """
50
+ Initialize the Environment.
51
+
52
+ Parameters:
53
+ player_names (List[str]): Names of the players in the environment.
54
+ """
55
+ super().__init__(
56
+ player_names=player_names, **kwargs
57
+ ) # registers the arguments with Configurable
58
+ self.player_names = player_names
59
+
60
+ def __init_subclass__(cls, **kwargs):
61
+ """
62
+ Automatically called when a subclass is being initialized.
63
+
64
+ Here it's used to check if the subclass has the required attributes.
65
+ """
66
+ for required in ("type_name",):
67
+ if getattr(cls, required) is None:
68
+ cls.type_name = cls.__name__.lower()
69
+
70
+ return super().__init_subclass__(**kwargs)
71
+
72
+ @abstractmethod
73
+ def reset(self):
74
+ """
75
+ Reset the environment to its initial state.
76
+
77
+ Note:
78
+ This method must be implemented by subclasses.
79
+ """
80
+ pass
81
+
82
+ def to_config(self) -> EnvironmentConfig:
83
+ self._config_dict["env_type"] = self.type_name
84
+ return EnvironmentConfig(**self._config_dict)
85
+
86
+ @property
87
+ def num_players(self) -> int:
88
+ """Get the number of players."""
89
+ return len(self.player_names)
90
+
91
+ @abstractmethod
92
+ def get_next_player(self) -> str:
93
+ """
94
+ Return the name of the next player.
95
+
96
+ Note:
97
+ This method must be implemented by subclasses.
98
+
99
+ Returns:
100
+ str: The name of the next player.
101
+ """
102
+ pass
103
+
104
+ @abstractmethod
105
+ def get_observation(self, player_name=None) -> List[Message]:
106
+ """
107
+ Return observation for a given player.
108
+
109
+ Note:
110
+ This method must be implemented by subclasses.
111
+
112
+ Parameters:
113
+ player_name (str, optional): The name of the player for whom to get the observation.
114
+
115
+ Returns:
116
+ List[Message]: The observation for the player in the form of a list of messages.
117
+ """
118
+ pass
119
+
120
+ @abstractmethod
121
+ def print(self):
122
+ """Print the environment state."""
123
+ pass
124
+
125
+ @abstractmethod
126
+ def step(self, player_name: str, action: str) -> TimeStep:
127
+ """
128
+ Execute a step in the environment given an action from a player.
129
+
130
+ Note:
131
+ This method must be implemented by subclasses.
132
+
133
+ Parameters:
134
+ player_name (str): The name of the player.
135
+ action (str): The action that the player wants to take.
136
+
137
+ Returns:
138
+ TimeStep: An object of the TimeStep class containing the observation, reward, and done state.
139
+ """
140
+ pass
141
+
142
+ @abstractmethod
143
+ def check_action(self, action: str, player_name: str) -> bool:
144
+ """
145
+ Check whether a given action is valid for a player.
146
+
147
+ Note:
148
+ This method must be implemented by subclasses.
149
+
150
+ Parameters:
151
+ action (str): The action to be checked.
152
+ player_name (str): The name of the player.
153
+
154
+ Returns:
155
+ bool: True if the action is valid, False otherwise.
156
+ """
157
+ return True
158
+
159
+ @abstractmethod
160
+ def is_terminal(self) -> bool:
161
+ """
162
+ Check whether the environment is in a terminal state (end of episode).
163
+
164
+ Note:
165
+ This method must be implemented by subclasses.
166
+
167
+ Returns:
168
+ bool: True if the environment is in a terminal state, False otherwise.
169
+ """
170
+ pass
171
+
172
+ def get_zero_rewards(self) -> Dict[str, float]:
173
+ """
174
+ Return a dictionary with all player names as keys and zero as reward.
175
+
176
+ Returns:
177
+ Dict[str, float]: A dictionary of players and their rewards (all zero).
178
+ """
179
+ return {player_name: 0.0 for player_name in self.player_names}
180
+
181
+ def get_one_rewards(self) -> Dict[str, float]:
182
+ """
183
+ Return a dictionary with all player names as keys and one as reward.
184
+
185
+ Returns:
186
+ Dict[str, float]: A dictionary of players and their rewards (all one).
187
+ """
188
+ return {player_name: 1.0 for player_name in self.player_names}
agentreview/environments/conversation.py ADDED
@@ -0,0 +1,198 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from typing import List, Union
2
+
3
+ from ..agent import SIGNAL_END_OF_CONVERSATION, Moderator
4
+ from ..config import AgentConfig, EnvironmentConfig
5
+ from ..message import Message, MessagePool
6
+ from .base import Environment, TimeStep
7
+
8
+
9
+ class Conversation(Environment):
10
+ """
11
+ Turn-based fully observable conversation environment.
12
+
13
+ Next speaker order is either parallel or round-robin.
14
+ """
15
+
16
+ type_name = "conversation"
17
+
18
+ def __init__(self, player_names: List[str], parallel: bool = False, **kwargs):
19
+ super().__init__(player_names=player_names, parallel=parallel, **kwargs)
20
+
21
+ self.parallel = parallel
22
+
23
+ # The "state" of the environment is maintained by the message pool
24
+ self.message_pool = MessagePool()
25
+
26
+ self._current_turn = 0
27
+ self._next_player_index = 0
28
+
29
+ def reset(self):
30
+ self._current_turn = 0
31
+ self._next_player_index = 0
32
+ self.message_pool.reset()
33
+
34
+ init_timestep = TimeStep(
35
+ observation=[], reward=self.get_zero_rewards(), terminal=False
36
+ )
37
+ return init_timestep
38
+
39
+ @property
40
+ def phase_index(self):
41
+ return self._phase_index
42
+
43
+ @phase_index.setter
44
+ def phase_index(self, value):
45
+ self._phase_index = value
46
+
47
+ def to_config(self) -> EnvironmentConfig:
48
+ return EnvironmentConfig(
49
+ env_type=self.type_name,
50
+ player_names=self.player_names,
51
+ parallel=self.parallel,
52
+ )
53
+
54
+ def print(self):
55
+ self.message_pool.print()
56
+
57
+ def get_next_player(self) -> str:
58
+ """Get the next player."""
59
+ return self.player_names[self._next_player_index]
60
+
61
+ def get_observation(self, player_name=None) -> List[Message]:
62
+ """Get observation for the player."""
63
+ if player_name is None:
64
+ return self.message_pool.get_all_messages()
65
+ else:
66
+ return self.message_pool.get_visible_messages(
67
+ player_name, turn=self._current_turn
68
+ )
69
+
70
+ def is_terminal(self) -> bool:
71
+ """Check if the conversation is over."""
72
+ # If the last message is the signal, then the conversation is over
73
+ if self.message_pool.last_message.content.startswith(
74
+ SIGNAL_END_OF_CONVERSATION
75
+ ):
76
+ return True
77
+
78
+ def step(self, player_name: str, action: str) -> TimeStep:
79
+ """
80
+ Step function that is called by the arena.
81
+
82
+ Args:
83
+ player_name: the name of the player that takes the action
84
+ action: the action that the agents wants to take
85
+ """
86
+ message = Message(
87
+ agent_name=player_name, content=action, turn=self._current_turn
88
+ )
89
+ self.message_pool.append_message(message)
90
+
91
+ # Update the counters
92
+ if not self.parallel or self._next_player_index == 0:
93
+ self._current_turn += 1
94
+ self._next_player_index = (self._next_player_index + 1) % self.num_players
95
+
96
+ timestep = TimeStep(
97
+ observation=self.get_observation(),
98
+ reward=self.get_zero_rewards(),
99
+ terminal=self.is_terminal(),
100
+ ) # Return all the messages
101
+ return timestep
102
+
103
+
104
+ class ModeratedConversation(Conversation):
105
+ """
106
+ Turn-based fully observable conversation environment.
107
+
108
+ Next speaker order is either parallel or round-robin.
109
+ Moderator is a special agent that can see all messages and can decide whether the conversation is over.
110
+ """
111
+
112
+ type_name = "moderated_conversation"
113
+
114
+ def __init__(
115
+ self,
116
+ player_names: List[str],
117
+ moderator: Union[Moderator, AgentConfig],
118
+ parallel: bool = False,
119
+ moderator_visibility="all",
120
+ moderator_period=None,
121
+ **kwargs,
122
+ ):
123
+ super().__init__(player_names=player_names, parallel=parallel, **kwargs)
124
+
125
+ if isinstance(moderator, AgentConfig):
126
+ moderator_config = moderator
127
+ moderator = Moderator.from_config(moderator_config)
128
+ elif not isinstance(moderator, Moderator):
129
+ raise ValueError(
130
+ "moderator must be either an AgentConfig or a Moderator instance."
131
+ )
132
+
133
+ self.moderator = moderator
134
+ self.moderator_visibility = moderator_visibility
135
+ if moderator_period is None:
136
+ if parallel:
137
+ self.moderator_period = "round"
138
+ else:
139
+ self.moderator_period = "turn"
140
+ else:
141
+ self.moderator_period = moderator_period
142
+
143
+ def to_config(self) -> EnvironmentConfig:
144
+ # This environment contains some special config arguments that needs to be handle specially
145
+ return EnvironmentConfig(
146
+ env_type=self.type_name,
147
+ player_names=self.player_names,
148
+ parallel=self.parallel,
149
+ moderator=self.moderator.to_config(),
150
+ moderator_visibility=self.moderator_visibility,
151
+ moderator_period=self.moderator_period,
152
+ )
153
+
154
+ def step(self, player_name: str, action: str) -> TimeStep:
155
+ """
156
+ Step function that is called by the arena.
157
+
158
+ Args:
159
+ player_name: the name of the player that takes the action
160
+ action: the action that the agents wants to take
161
+ """
162
+ message = Message(
163
+ agent_name=player_name, content=action, turn=self._current_turn
164
+ )
165
+ self.message_pool.append_message(message)
166
+
167
+ # Round-robin order for the next player
168
+ self._next_player_index = (self._next_player_index + 1) % self.num_players
169
+
170
+ if self.moderator_period == "turn" or (
171
+ self.moderator_period == "round" and self._next_player_index == 0
172
+ ):
173
+ # Moderator's turn
174
+ moderator_history = self.message_pool.get_all_messages()
175
+ moderator_response = self.moderator(moderator_history)
176
+ moderator_message = Message(
177
+ agent_name=self.moderator.name,
178
+ content=moderator_response,
179
+ turn=self._current_turn,
180
+ visible_to=self.moderator_visibility,
181
+ )
182
+ self.message_pool.append_message(moderator_message)
183
+ terminal = (
184
+ self.moderator.is_terminal(moderator_history) or self.is_terminal()
185
+ )
186
+ else:
187
+ terminal = self.is_terminal()
188
+
189
+ # Update the counters
190
+ if not self.parallel or self._next_player_index == 0:
191
+ self._current_turn += 1
192
+
193
+ timestep = TimeStep(
194
+ observation=self.get_observation(),
195
+ reward=self.get_zero_rewards(),
196
+ terminal=terminal,
197
+ ) # Return all the messages
198
+ return timestep
agentreview/environments/paper_decision.py ADDED
@@ -0,0 +1,161 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import logging
2
+ import traceback
3
+ from typing import List
4
+
5
+ from agentreview.environments import Conversation
6
+ from .base import TimeStep
7
+ from ..message import Message, MessagePool
8
+
9
+
10
+ logger = logging.getLogger(__name__)
11
+
12
+
13
+ class PaperDecision(Conversation):
14
+ """
15
+ Area chairs make decision based on the meta reviews
16
+ """
17
+
18
+ type_name = "paper_decision"
19
+
20
+ def __init__(self,
21
+ player_names: List[str],
22
+ experiment_setting: dict,
23
+ paper_ids: List[int] = None,
24
+ metareviews: List[str] = None,
25
+ parallel: bool = False,
26
+
27
+ **kwargs):
28
+ """
29
+
30
+ Args:
31
+ paper_id (int): the id of the paper, such as 917
32
+ paper_decision (str): the decision of the paper, such as "Accept: notable-top-25%"
33
+
34
+ """
35
+
36
+ # Inherit from the parent class of `class Conversation`
37
+ super(Conversation, self).__init__(player_names=player_names, parallel=parallel, **kwargs)
38
+
39
+ self.paper_ids = paper_ids
40
+ self.metareviews = metareviews
41
+ self.parallel = parallel
42
+ self.experiment_setting = experiment_setting
43
+ self.ac_scoring_method = kwargs.get("ac_scoring_method")
44
+ # The "state" of the environment is maintained by the message pool
45
+ self.message_pool = MessagePool()
46
+
47
+ self.ac_decisions = None
48
+
49
+ self._current_turn = 0
50
+ self._next_player_index = 0
51
+ self.phase_index = 5 # "ACs make decision based on meta review" is the last phase (Phase 5)
52
+
53
+ self._phases = None
54
+
55
+ @property
56
+ def phases(self):
57
+
58
+ if self._phases is None:
59
+ self._phases = {
60
+ 5: {
61
+ "name": "ac_make_decisions",
62
+ 'speaking_order': ["AC"]
63
+ },
64
+ }
65
+ return self._phases
66
+
67
+ def step(self, player_name: str, action: str) -> TimeStep:
68
+ """
69
+ Step function that is called by the arena.
70
+
71
+ Args:
72
+ player_name: the name of the player that takes the action
73
+ action: the action that the agents wants to take
74
+ """
75
+
76
+
77
+
78
+ message = Message(
79
+ agent_name=player_name, content=action, turn=self._current_turn
80
+ )
81
+ self.message_pool.append_message(message)
82
+
83
+ speaking_order = self.phases[self.phase_index]["speaking_order"]
84
+
85
+ # Reached the end of the speaking order. Move to the next phase.
86
+
87
+ logging.info(f"Phase {self.phase_index}: {self.phases[self.phase_index]['name']} "
88
+ f"| Player {self._next_player_index}: {speaking_order[self._next_player_index]}")
89
+ if self._next_player_index == len(speaking_order) - 1:
90
+ self._next_player_index = 0
91
+ logger.info(f"Phase {self.phase_index}: end of the speaking order. Move to Phase {self.phase_index + 1}.")
92
+ self.phase_index += 1
93
+ self._current_turn += 1
94
+ else:
95
+ self._next_player_index += 1
96
+
97
+ timestep = TimeStep(
98
+ observation=self.get_observation(),
99
+ reward=self.get_zero_rewards(),
100
+ terminal=self.is_terminal(),
101
+ ) # Return all the messages
102
+
103
+ return timestep
104
+
105
+
106
+ def check_action(self, action: str, player_name: str) -> bool:
107
+ """Check if the action is valid."""
108
+
109
+ if player_name.startswith("AC"):
110
+
111
+ try:
112
+ self.ac_decisions = self.parse_ac_decisions(action)
113
+
114
+ except:
115
+ traceback.print_exc()
116
+ return False
117
+
118
+ if not isinstance(self.ac_decisions, dict):
119
+ return False
120
+
121
+ return True
122
+
123
+ @property
124
+ def ac_decisions(self):
125
+ return self._ac_decisions
126
+
127
+ @ac_decisions.setter
128
+ def ac_decisions(self, value):
129
+ self._ac_decisions = value
130
+
131
+ def parse_ac_decisions(self, action: str):
132
+ """
133
+ Parse the decisions made by the ACs
134
+ """
135
+
136
+ lines = action.split("\n")
137
+
138
+ paper2rating = {}
139
+
140
+ paper_id, rank = None, None
141
+
142
+ for line in lines:
143
+
144
+ if line.lower().startswith("paper id:"):
145
+ paper_id = int(line.split(":")[1].split('(')[0].strip())
146
+ elif self.ac_scoring_method == "ranking" and line.lower().startswith("willingness to accept:"):
147
+ rank = int(line.split(":")[1].strip())
148
+
149
+ elif self.ac_scoring_method == "recommendation" and line.lower().startswith("decision"):
150
+ rank = line.split(":")[1].strip()
151
+
152
+
153
+
154
+ if paper_id in paper2rating:
155
+ raise ValueError(f"Paper {paper_id} is assigned a rank twice.")
156
+
157
+ if paper_id is not None and rank is not None:
158
+ paper2rating[paper_id] = rank
159
+ paper_id, rank = None, None
160
+
161
+ return paper2rating
agentreview/environments/paper_review.py ADDED
@@ -0,0 +1,217 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import json
2
+ import json
3
+ import logging
4
+ import os.path as osp
5
+ from typing import List
6
+
7
+ from agentreview.environments import Conversation
8
+ from utility.utils import get_rebuttal_dir
9
+ from .base import TimeStep
10
+ from ..message import Message
11
+ from ..paper_review_message import PaperReviewMessagePool
12
+
13
+
14
+ logger = logging.getLogger(__name__)
15
+
16
+ class PaperReview(Conversation):
17
+ """
18
+ Discussion between reviewers and area chairs.
19
+
20
+ There are several phases in the reviewing process:
21
+ reviewer_write_reviews: reviewers write their reviews based on the paper content.
22
+ author_reviewer_discussion: An author respond to comments from the reviewers.
23
+ reviewer_ac_discussion: reviewers and an area chair discuss the paper.
24
+ ac_discussion: an area chair makes the final decision.
25
+ """
26
+
27
+ type_name = "paper_review"
28
+
29
+ def __init__(self, player_names: List[str], paper_id: int, paper_decision: str, experiment_setting: dict, args,
30
+ parallel: bool = False,
31
+
32
+ **kwargs):
33
+ """
34
+ Args:
35
+ paper_id (int): the id of the paper, such as 917
36
+ paper_decision (str): the decision of the paper, such as "Accept: notable-top-25%"
37
+ """
38
+
39
+ # Inherit from the parent class of `class Conversation`
40
+ super(Conversation, self).__init__(player_names=player_names, parallel=parallel, **kwargs)
41
+ self.args = args
42
+ self.paper_id = paper_id
43
+ self.paper_decision = paper_decision
44
+ self.parallel = parallel
45
+ self.experiment_setting = experiment_setting
46
+ self.player_to_test = experiment_setting.get('player_to_test', None)
47
+ self.task = kwargs.get("task")
48
+ self.experiment_name = args.experiment_name
49
+
50
+ # The "state" of the environment is maintained by the message pool
51
+ self.message_pool = PaperReviewMessagePool(experiment_setting)
52
+
53
+ self.phase_index = 0
54
+ self._phases = None
55
+
56
+ @property
57
+ def phases(self):
58
+
59
+ if self._phases is not None:
60
+ return self._phases
61
+
62
+ reviewer_names = [name for name in self.player_names if name.startswith("Reviewer")]
63
+
64
+ num_reviewers = len(reviewer_names)
65
+
66
+ reviewer_names = [f"Reviewer {i}" for i in range(1, num_reviewers + 1)]
67
+
68
+ self._phases = {
69
+ # In phase 0, no LLM-based agents are called.
70
+ 0: {
71
+ "name": "paper_extraction",
72
+ 'speaking_order': ["Paper Extractor"],
73
+ },
74
+
75
+ 1: {
76
+ "name": 'reviewer_write_reviews',
77
+ 'speaking_order': reviewer_names
78
+ },
79
+
80
+ # The author responds to each reviewer's review
81
+ 2: {
82
+ 'name': 'author_reviewer_discussion',
83
+ 'speaking_order': ["Author" for _ in reviewer_names],
84
+ },
85
+
86
+ 3: {
87
+ 'name': 'reviewer_ac_discussion',
88
+ 'speaking_order': ["AC"] + reviewer_names,
89
+ },
90
+
91
+ 4: {
92
+ 'name': 'ac_write_metareviews',
93
+ 'speaking_order': ["AC"]
94
+ },
95
+ 5: {
96
+ 'name': 'ac_makes_decisions',
97
+ 'speaking_order': ["AC"]
98
+ },
99
+ }
100
+
101
+ return self.phases
102
+
103
+ @phases.setter
104
+ def phases(self, value):
105
+ self._phases = value
106
+
107
+ def reset(self):
108
+ self._current_phase = "review"
109
+ self.phase_index = 0
110
+ return super().reset()
111
+
112
+
113
+
114
+ def load_message_history_from_cache(self):
115
+ if self._phase_index == 0:
116
+
117
+ print("Loading message history from BASELINE experiment")
118
+
119
+ full_paper_discussion_path = get_rebuttal_dir(paper_id=self.paper_id,
120
+ experiment_name="BASELINE",
121
+ model_name=self.args.model_name,
122
+ conference=self.args.conference)
123
+
124
+ messages = json.load(open(osp.join(full_paper_discussion_path, f"{self.paper_id}.json"), 'r',
125
+ encoding='utf-8'))['messages']
126
+
127
+ num_messages_from_AC = 0
128
+
129
+ for msg in messages:
130
+
131
+ # We have already extracted contents from the paper.
132
+ if msg['agent_name'] == "Paper Extractor":
133
+ continue
134
+
135
+ # Encountering the 2nd message from the AC. Stop loading messages.
136
+ if msg['agent_name'] == "AC" and num_messages_from_AC == 1:
137
+ break
138
+
139
+ if msg['agent_name'] == "AC":
140
+ num_messages_from_AC += 1
141
+
142
+ message = Message(**msg)
143
+ self.message_pool.append_message(message)
144
+
145
+ num_unique_reviewers = len(
146
+ set([msg['agent_name'] for msg in messages if msg['agent_name'].startswith("Reviewer")]))
147
+
148
+ assert num_unique_reviewers == self.args.num_reviewers_per_paper
149
+
150
+ self._phase_index = 4
151
+
152
+ def step(self, player_name: str, action: str) -> TimeStep:
153
+ """
154
+ Step function that is called by the arena.
155
+
156
+ Args:
157
+ player_name: the name of the player that takes the action
158
+ action: the action that the agents wants to take
159
+ """
160
+
161
+ message = Message(
162
+ agent_name=player_name, content=action, turn=self._current_turn
163
+ )
164
+ self.message_pool.append_message(message)
165
+
166
+ speaking_order = self.phases[self.phase_index]["speaking_order"]
167
+
168
+ # Reached the end of the speaking order. Move to the next phase.
169
+ logging.info(f"Phase {self.phase_index}: {self.phases[self._phase_index]['name']} "
170
+ f"| Player {self._next_player_index}: {speaking_order[self._next_player_index]}")
171
+
172
+ terminal = self.is_terminal()
173
+
174
+ if self._next_player_index == len(speaking_order) - 1:
175
+ self._next_player_index = 0
176
+
177
+ if self.phase_index == 4:
178
+ terminal = True
179
+ logger.info(
180
+ "Finishing the simulation for Phase I - IV. Please run `python run_paper_decision_cli.py ` for "
181
+ "Phase V. (AC makes decisions).")
182
+
183
+ else:
184
+ logger.info(f"Phase {self.phase_index}: end of the speaking order. Move to Phase ({self.phase_index + 1}).")
185
+ self.phase_index += 1
186
+ self._current_turn += 1
187
+
188
+
189
+
190
+
191
+ else:
192
+ self._next_player_index += 1
193
+
194
+ timestep = TimeStep(
195
+ observation=self.get_observation(),
196
+ reward=self.get_zero_rewards(),
197
+ terminal=terminal,
198
+ ) # Return all the messages
199
+
200
+ return timestep
201
+
202
+ def get_next_player(self) -> str:
203
+ """Get the next player in the current phase."""
204
+ speaking_order = self.phases[self.phase_index]["speaking_order"]
205
+ next_player = speaking_order[self._next_player_index]
206
+ return next_player
207
+
208
+ def get_observation(self, player_name=None) -> List[Message]:
209
+ """Get observation for the player."""
210
+ if player_name is None:
211
+ return self.message_pool.get_all_messages()
212
+ else:
213
+
214
+ return self.message_pool.get_visible_messages_for_paper_review(
215
+ player_name, phase_index=self.phase_index, next_player_idx=self._next_player_index,
216
+ player_names=self.player_names
217
+ )
agentreview/experiment_config.py ADDED
@@ -0,0 +1,244 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ BASELINE: The default settings which all other settings compare against.
3
+
4
+ """
5
+
6
+ baseline_setting = {
7
+ "AC": [
8
+ "BASELINE"
9
+ ],
10
+
11
+ "reviewer": [
12
+ "BASELINE",
13
+ "BASELINE",
14
+ "BASELINE"
15
+ ],
16
+
17
+ "author": [
18
+ "BASELINE"
19
+ ],
20
+ "global_settings":{
21
+ "provides_numeric_rating": ['reviewer', 'ac'],
22
+ "persons_aware_of_authors_identities": []
23
+ }
24
+ }
25
+
26
+ benign_Rx1_setting = {
27
+ "AC": [
28
+ "BASELINE"
29
+ ],
30
+
31
+ "reviewer": [
32
+ "benign",
33
+ "BASELINE",
34
+ "BASELINE"
35
+ ],
36
+
37
+ "author": [
38
+ "BASELINE"
39
+ ],
40
+ "global_settings":{
41
+ "provides_numeric_rating": ['reviewer', 'ac'],
42
+ "persons_aware_of_authors_identities": []
43
+ }
44
+ }
45
+
46
+ malicious_Rx1_setting = {
47
+ "AC": [
48
+ "BASELINE"
49
+ ],
50
+
51
+ "reviewer": [
52
+ "malicious",
53
+ "BASELINE",
54
+ "BASELINE"
55
+ ],
56
+
57
+ "author": [
58
+ "BASELINE"
59
+ ],
60
+ "global_settings":{
61
+ "provides_numeric_rating": ['reviewer', 'ac'],
62
+ "persons_aware_of_authors_identities": []
63
+ }
64
+ }
65
+
66
+ unknowledgeable_Rx1_setting = {
67
+ "AC": [
68
+ "BASELINE"
69
+ ],
70
+
71
+ "reviewer": [
72
+ "knowledgeable",
73
+ "BASELINE",
74
+ "BASELINE"
75
+ ],
76
+
77
+ "author": [
78
+ "BASELINE"
79
+ ],
80
+ "global_settings":{
81
+ "provides_numeric_rating": ['reviewer', 'ac'],
82
+ "persons_aware_of_authors_identities": []
83
+ }
84
+ }
85
+
86
+ knowledgeable_Rx1_setting = {
87
+ "AC": [
88
+ "BASELINE"
89
+ ],
90
+
91
+ "reviewer": [
92
+ "knowledgeable",
93
+ "BASELINE",
94
+ "BASELINE"
95
+ ],
96
+
97
+ "author": [
98
+ "BASELINE"
99
+ ],
100
+ "global_settings":{
101
+ "provides_numeric_rating": ['reviewer', 'ac'],
102
+ "persons_aware_of_authors_identities": []
103
+ }
104
+ }
105
+
106
+
107
+ responsible_Rx1_setting = {
108
+ "AC": [
109
+ "BASELINE"
110
+ ],
111
+
112
+ "reviewer": [
113
+ "responsible",
114
+ "BASELINE",
115
+ "BASELINE"
116
+ ],
117
+
118
+ "author": [
119
+ "BASELINE"
120
+ ],
121
+ "global_settings":{
122
+ "provides_numeric_rating": ['reviewer', 'ac'],
123
+ "persons_aware_of_authors_identities": []
124
+ }
125
+ }
126
+
127
+ irresponsible_Rx1_setting = {
128
+ "AC": [
129
+ "BASELINE"
130
+ ],
131
+
132
+ "reviewer": [
133
+ "irresponsible",
134
+ "BASELINE",
135
+ "BASELINE"
136
+ ],
137
+
138
+ "author": [
139
+ "BASELINE"
140
+ ],
141
+ "global_settings":{
142
+ "provides_numeric_rating": ['reviewer', 'ac'],
143
+ "persons_aware_of_authors_identities": []
144
+ }
145
+ }
146
+
147
+ conformist_ACx1_setting = {
148
+ "AC": [
149
+ "conformist"
150
+ ],
151
+
152
+ "reviewer": [
153
+ "BASELINE",
154
+ "BASELINE",
155
+ "BASELINE"
156
+ ],
157
+
158
+ "author": [
159
+ "BASELINE"
160
+ ],
161
+ "global_settings":{
162
+ "provides_numeric_rating": ['reviewer', 'ac'],
163
+ "persons_aware_of_authors_identities": []
164
+ }
165
+ }
166
+
167
+ authoritarian_ACx1_setting = {
168
+ "AC": [
169
+ "authoritarian"
170
+ ],
171
+
172
+ "reviewer": [
173
+ "BASELINE",
174
+ "BASELINE",
175
+ "BASELINE"
176
+ ],
177
+
178
+ "author": [
179
+ "BASELINE"
180
+ ],
181
+ "global_settings":{
182
+ "provides_numeric_rating": ['reviewer', 'ac'],
183
+ "persons_aware_of_authors_identities": []
184
+ }
185
+ }
186
+
187
+ inclusive_ACx1_setting = {
188
+ "AC": [
189
+ "inclusive"
190
+ ],
191
+
192
+ "reviewer": [
193
+ "BASELINE",
194
+ "BASELINE",
195
+ "BASELINE"
196
+ ],
197
+
198
+ "author": [
199
+ "BASELINE"
200
+ ],
201
+ "global_settings":{
202
+ "provides_numeric_rating": ['reviewer', 'ac'],
203
+ "persons_aware_of_authors_identities": []
204
+ }
205
+ }
206
+
207
+
208
+
209
+ no_numeric_ratings_setting = {
210
+ "AC": [
211
+ "BASELINE"
212
+ ],
213
+
214
+ "reviewer": [
215
+ "BASELINE"
216
+ ],
217
+
218
+ "author": [
219
+ "BASELINE"
220
+ ],
221
+ "global_settings":{
222
+ "provides_numeric_rating": [],
223
+ "persons_aware_of_authors_identities": []
224
+ }
225
+ }
226
+
227
+
228
+ # All experimental settings.
229
+ # Customize your own by adding new settings to this dict.
230
+ all_settings = {
231
+ "BASELINE": baseline_setting,
232
+ "benign_Rx1": benign_Rx1_setting,
233
+ "malicious_Rx1": malicious_Rx1_setting,
234
+ "knowledgeable_Rx1_setting": knowledgeable_Rx1_setting,
235
+ "unknowledgeable_Rx1_setting": unknowledgeable_Rx1_setting,
236
+ "responsible_Rx1_setting": responsible_Rx1_setting,
237
+ "irresponsible_Rx1_setting": irresponsible_Rx1_setting,
238
+ "conformist_ACx1": conformist_ACx1_setting,
239
+ "authoritarian_ACx1": authoritarian_ACx1_setting,
240
+ "inclusive_ACx1": inclusive_ACx1_setting,
241
+ "no_numeric_ratings": no_numeric_ratings_setting,
242
+
243
+ }
244
+
agentreview/message.py ADDED
@@ -0,0 +1,150 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import hashlib
2
+ import time
3
+ from dataclasses import dataclass
4
+ from typing import List, Union
5
+ from uuid import uuid1
6
+
7
+ # Preserved roles
8
+ SYSTEM_NAME = "System"
9
+ MODERATOR_NAME = "Moderator"
10
+
11
+
12
+ def _hash(input: str):
13
+ """
14
+ Helper function that generates a SHA256 hash of a given input string.
15
+
16
+ Parameters:
17
+ input (str): The input string to be hashed.
18
+
19
+ Returns:
20
+ str: The SHA256 hash of the input string.
21
+ """
22
+ hex_dig = hashlib.sha256(input.encode()).hexdigest()
23
+ return hex_dig
24
+
25
+
26
+ @dataclass
27
+ class Message:
28
+ """
29
+ Represents a message in the chatArena environment.
30
+
31
+ Attributes:
32
+ agent_name (str): Name of the agent who sent the message.
33
+ content (str): Content of the message.
34
+ turn (int): The turn at which the message was sent.
35
+ timestamp (int): Wall time at which the message was sent. Defaults to current time in nanoseconds.
36
+ visible_to (Union[str, List[str]]): The receivers of the message. Can be a single agent, multiple agents, or 'all'. Defaults to 'all'.
37
+ msg_type (str): Type of the message, e.g., 'text'. Defaults to 'text'.
38
+ logged (bool): Whether the message is logged in the database. Defaults to False.
39
+ """
40
+
41
+ agent_name: str
42
+ content: str
43
+ turn: int
44
+ timestamp: int = time.time_ns()
45
+ visible_to: Union[str, List[str]] = "all"
46
+ msg_type: str = "text"
47
+ logged: bool = False # Whether the message is logged in the database
48
+
49
+ @property
50
+ def msg_hash(self):
51
+ # Generate a unique message id given the content, timestamp and role
52
+ return _hash(
53
+ f"agent: {self.agent_name}\ncontent: {self.content}\ntimestamp: {str(self.timestamp)}\nturn: {self.turn}\nmsg_type: {self.msg_type}"
54
+ )
55
+
56
+
57
+ class MessagePool:
58
+ """
59
+ A pool to manage the messages in the chatArena environment.
60
+
61
+ The pool is essentially a list of messages, and it allows a unified treatment of the visibility of the messages.
62
+ It supports two configurations for step definition: multiple players can act in the same turn (like in rock-paper-scissors).
63
+ Agents can only see the messages that 1) were sent before the current turn, and 2) are visible to the current role.
64
+ """
65
+
66
+ def __init__(self):
67
+ """Initialize the MessagePool with a unique conversation ID."""
68
+ self.conversation_id = str(uuid1())
69
+ self._messages: List[
70
+ Message
71
+ ] = []
72
+ self._last_message_idx = 0
73
+
74
+ def reset(self):
75
+ """Clear the message pool."""
76
+ self._messages = []
77
+
78
+ def append_message(self, message: Message):
79
+ """
80
+ Append a message to the pool.
81
+
82
+ Parameters:
83
+ message (Message): The message to be added to the pool.
84
+ """
85
+ self._messages.append(message)
86
+
87
+ def print(self):
88
+ """Print all the messages in the pool."""
89
+ for message in self._messages:
90
+ print(f"[{message.agent_name}->{message.visible_to}]: {message.content}")
91
+
92
+ @property
93
+ def last_turn(self):
94
+ """
95
+ Get the turn of the last message in the pool.
96
+
97
+ Returns:
98
+ int: The turn of the last message.
99
+ """
100
+ if len(self._messages) == 0:
101
+ return 0
102
+ else:
103
+ return self._messages[-1].turn
104
+
105
+ @property
106
+ def last_message(self):
107
+ """
108
+ Get the last message in the pool.
109
+
110
+ Returns:
111
+ Message: The last message.
112
+ """
113
+ if len(self._messages) == 0:
114
+ return None
115
+ else:
116
+ return self._messages[-1]
117
+
118
+ def get_all_messages(self) -> List[Message]:
119
+ """
120
+ Get all the messages in the pool.
121
+
122
+ Returns:
123
+ List[Message]: A list of all messages.
124
+ """
125
+ return self._messages
126
+
127
+ def get_visible_messages(self, agent_name, turn: int) -> List[Message]:
128
+ """
129
+ Get all the messages that are visible to a given agent before a specified turn.
130
+
131
+ Parameters:
132
+ agent_name (str): The name of the agent.
133
+ turn (int): The specified turn.
134
+
135
+ Returns:
136
+ List[Message]: A list of visible messages.
137
+ """
138
+
139
+ # Get the messages before the current turn
140
+ prev_messages = [message for message in self._messages if message.turn < turn]
141
+
142
+ visible_messages = []
143
+ for message in prev_messages:
144
+ if (
145
+ message.visible_to == "all"
146
+ or agent_name in message.visible_to
147
+ or agent_name == "Moderator"
148
+ ):
149
+ visible_messages.append(message)
150
+ return visible_messages
agentreview/paper_processor.py ADDED
@@ -0,0 +1,163 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Read papers from a PDF file and extract the title, abstract, figures and tables captions, and main content. These
3
+ functions work best with ICLR / NeurIPS papers.
4
+
5
+ """
6
+
7
+ from io import StringIO
8
+
9
+ from pdfminer.converter import TextConverter
10
+ from pdfminer.layout import LAParams
11
+ from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter
12
+ from pdfminer.pdfpage import PDFPage
13
+
14
+
15
+ def extract_text_from_pdf(path: str) -> str:
16
+ """Extracts text from a PDF file.
17
+
18
+ Args:
19
+ path (str): A string specifying the path to the PDF file.
20
+
21
+ Returns:
22
+ A string containing the extracted text from the PDF.
23
+ """
24
+
25
+ with open(path, 'rb') as file_handle:
26
+ # Initialize a PDF resource manager to store shared resources.
27
+ resource_manager = PDFResourceManager()
28
+
29
+ # Set up a StringIO instance to capture the extracted text.
30
+ text_output = StringIO()
31
+
32
+ # Create a TextConverter to convert PDF pages to text.
33
+ converter = TextConverter(resource_manager, text_output, laparams=LAParams())
34
+
35
+ # Initialize a PDF page interpreter.
36
+ interpreter = PDFPageInterpreter(resource_manager, converter)
37
+
38
+ # Process each page in the PDF.
39
+ for page in PDFPage.get_pages(file_handle, caching=True, check_extractable=True):
40
+ interpreter.process_page(page)
41
+
42
+ # Retrieve the extracted text and close the StringIO instance.
43
+ extracted_text = text_output.getvalue()
44
+ text_output.close()
45
+
46
+ # Finalize the converter.
47
+ converter.close()
48
+
49
+ # Replace form feed characters with newlines.
50
+ extracted_text = extracted_text.replace('\x0c', '\n')
51
+
52
+ return extracted_text
53
+
54
+
55
+ def convert_text_into_dict(text: str) -> dict:
56
+ """Converts the extracted text into a dictionary.
57
+
58
+ Args:
59
+ text (str): the extracted text from the PDF.
60
+
61
+ Returns:
62
+ A json object containing the extracted fields from the paper.
63
+
64
+ """
65
+
66
+ lines = text.split('\n')
67
+
68
+ # Create a filtered list to store non-matching lines
69
+ filtered_lines = [line for line in lines if not (line.startswith('Under review') or
70
+ line.startswith('Published as') or
71
+ line.startswith('Paper under double-blind review'))]
72
+
73
+ # Remove the first few empty lines before the title
74
+ while filtered_lines[0].strip() == "":
75
+ filtered_lines.pop(0)
76
+
77
+ # Get title
78
+ title = ""
79
+ while filtered_lines[0] != "":
80
+ title += filtered_lines.pop(0) + ' '
81
+
82
+ title = title.strip().capitalize()
83
+
84
+ # Remove the author information between the title and the abstract
85
+ while filtered_lines[0].lower() != "abstract":
86
+ filtered_lines.pop(0)
87
+ filtered_lines.pop(0)
88
+
89
+ # Get abstract
90
+ abstract = ""
91
+ while filtered_lines[0].lower() != "introduction":
92
+ abstract += filtered_lines.pop(0) + ' '
93
+
94
+ main_content = ""
95
+
96
+ figures_captions = []
97
+ tables_captions = []
98
+
99
+ while filtered_lines != [] and not filtered_lines[0].lower().startswith("references"):
100
+ figure_caption = ""
101
+ table_caption = ""
102
+
103
+ if filtered_lines[0].lower().startswith("figure"):
104
+ while not filtered_lines[0] == "":
105
+ figure_caption += filtered_lines.pop(0) + ' '
106
+
107
+
108
+ elif filtered_lines[0].lower().startswith("Table"):
109
+ while not filtered_lines[0] == "":
110
+ table_caption += filtered_lines.pop(0) + ' '
111
+
112
+ else:
113
+ main_content += filtered_lines.pop(0) + ' '
114
+
115
+ if figure_caption != "":
116
+ figures_captions.append(figure_caption)
117
+
118
+ if table_caption != "":
119
+ tables_captions.append(table_caption)
120
+
121
+
122
+ figures_captions = "\n".join(figures_captions) + "\n" + "\n".join(tables_captions)
123
+
124
+ # Get the first section title in the Appendix
125
+ # Example section title: "A ENVIRONMENT DETAILS"
126
+ while filtered_lines != [] and not (filtered_lines[0].isupper() and filtered_lines[0][0] == "A"):
127
+ filtered_lines.pop(0)
128
+
129
+
130
+ appendix = ""
131
+
132
+ while filtered_lines != []:
133
+ appendix += filtered_lines.pop(0) + ' '
134
+
135
+ # Now we have reached the "References" section
136
+ # Skip until we reach
137
+
138
+
139
+ paper = {
140
+ "Title": title.strip(),
141
+ "Abstract": abstract.strip(),
142
+ "Figures/Tables Captions": figures_captions.strip(),
143
+ "Main Content": main_content.strip(),
144
+ "Appendix": appendix.strip(),
145
+ }
146
+
147
+ return paper
148
+
149
+
150
+ if __name__ == "__main__":
151
+ from utility.authentication_utils import read_and_set_openai_key
152
+ from agentreview.review import get_lm_review
153
+
154
+ read_and_set_openai_key()
155
+
156
+ path = "data/rejected/6359.pdf"
157
+ text = extract_text_from_pdf(path)
158
+
159
+ parsed_paper = convert_text_into_dict(text)
160
+
161
+ review_generated = get_lm_review(parsed_paper)
162
+
163
+ print(review_generated["review_generated"])
agentreview/paper_review_arena.py ADDED
@@ -0,0 +1,185 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import csv
2
+ import glob
3
+ import json
4
+ import logging
5
+ import os
6
+ from typing import Union
7
+
8
+ from agentreview.arena import Arena, TooManyInvalidActions
9
+ from agentreview.role_descriptions import get_reviewer_description
10
+ from utility.utils import get_next_review_id, get_reviewer_type_from_profile, \
11
+ get_paper_review_and_rebuttal_dir, format_metareviews
12
+ from .agent import Player
13
+ from .config import ArenaConfig
14
+ from .environments import TimeStep, load_environment
15
+ from .paper_review_player import PaperExtractorPlayer, AreaChair, Reviewer
16
+
17
+
18
+ logger = logging.getLogger(__name__)
19
+
20
+
21
+ class PaperReviewArena(Arena):
22
+ """Arena for the paper review environment.
23
+
24
+ """
25
+
26
+ # PaperReviewArena.from_config
27
+ @classmethod
28
+ def from_config(cls, config: Union[str, ArenaConfig]):
29
+ """Create an arena from a config."""
30
+ # If config is a path, load the config
31
+ if isinstance(config, str):
32
+ config = ArenaConfig.load(config)
33
+
34
+ global_prompt = config.get("global_prompt", None)
35
+
36
+ # Create the players
37
+ players = []
38
+ for player_config in config.players:
39
+ # Add public_prompt to the player config
40
+ if global_prompt is not None:
41
+ player_config["global_prompt"] = global_prompt
42
+
43
+ if player_config['name'].startswith("Paper Extractor"):
44
+ player = PaperExtractorPlayer.from_config(player_config)
45
+
46
+ elif player_config['name'].startswith("AC"):
47
+ player = AreaChair.from_config(player_config)
48
+
49
+ elif player_config['name'].startswith("Reviewer"):
50
+ player = Reviewer.from_config(player_config)
51
+
52
+ else:
53
+ player = Player.from_config(player_config)
54
+ players.append(player)
55
+
56
+ # Check that the player names are unique
57
+ player_names = [player.name for player in players]
58
+ assert len(player_names) == len(
59
+ set(player_names)
60
+ ), f"Player names must be unique, current players: {[','.join(player_names)]}"
61
+
62
+ # Create the environment
63
+ config.environment[
64
+ "player_names"
65
+ ] = player_names # add the player names to the environment config
66
+ env = load_environment(config.environment)
67
+
68
+ return cls(players, env, global_prompt=global_prompt)
69
+
70
+ # PaperReviewArena.step()
71
+ def step(self) -> TimeStep:
72
+ """Take a step in the game: one player takes an action and the environment updates."""
73
+
74
+ # if self.environment.phase_index > 4 and self.args.task == "paper_review":
75
+ # logger.info("Finishing the simulation for Phase I - IV. Please run `python run_paper_decision_cli.py ` for "
76
+ # "Phase V. (AC makes decisions).")
77
+ # return
78
+ #
79
+ # elif self.environment.phase_index > 5 and self.args.task == "paper_decision":
80
+ # logger.info("Finishing the simulation for Phase V. (AC makes decisions).")
81
+ # return
82
+
83
+ player_name = self.environment.get_next_player()
84
+
85
+ player = self.name_to_player[player_name] # get the player object
86
+
87
+ observation = self.environment.get_observation(
88
+ player_name
89
+ ) # get the observation for the player
90
+
91
+ timestep = None
92
+
93
+ # try to take an action for a few times
94
+ for i in range(self.invalid_actions_retry):
95
+
96
+
97
+ # Update reviewer description for rebuttal
98
+ if self.environment.phase_index == 3 and player.name.startswith("Reviewer"):
99
+ logging.info("Update reviewers' role_desc for Phase 3 (reviewer_ac_discussion)")
100
+ reviewer_index = int(player.name.split("Reviewer ")[1])
101
+
102
+ # reviewer_index starts from 1, so we need to subtract 1 to get the index of the reviewer in the list
103
+
104
+ player.role_desc = get_reviewer_description(phase="reviewer_ac_discussion",
105
+ **self.environment.experiment_setting["players"][
106
+ 'Reviewer'][reviewer_index - 1])
107
+
108
+ elif self.environment.phase_index == 5: # Phase 5 AC Makes Decisions
109
+
110
+ player.role_desc += format_metareviews(self.environment.metareviews, self.environment.paper_ids)
111
+
112
+ action = player(observation) # take an action
113
+
114
+ if self.environment.check_action(action, player_name): # action is valid
115
+ timestep = self.environment.step(
116
+ player_name, action
117
+ ) # update the environment
118
+ break
119
+ else: # action is invalid
120
+ logging.warning(f"{player_name} made an invalid action {action}")
121
+ continue
122
+
123
+ if (
124
+ timestep is None
125
+ ): # if the player made invalid actions for too many times, terminate the game
126
+ warning_msg = f"{player_name} has made invalid actions for {self.invalid_actions_retry} times. Terminating the game."
127
+ logging.warning(warning_msg)
128
+ raise TooManyInvalidActions(warning_msg)
129
+
130
+ return timestep
131
+
132
+ def save_history(self, path: str):
133
+ """
134
+ Save the history of the game to a file.
135
+
136
+ Supports csv and json formats.
137
+ """
138
+ messages = self.environment.get_observation()
139
+ message_rows = []
140
+
141
+ if path.endswith(".csv"):
142
+ header = [
143
+ "agent_name",
144
+ "content",
145
+ "turn",
146
+ "timestamp",
147
+ "visible_to",
148
+ "msg_type",
149
+ ]
150
+ for message in messages:
151
+ message_row = [
152
+ message.agent_name,
153
+ message.content,
154
+ message.turn,
155
+ str(message.timestamp),
156
+ message.visible_to,
157
+ message.msg_type,
158
+ ]
159
+ message_rows.append(message_row)
160
+
161
+ with open(path, "w") as f:
162
+ writer = csv.writer(f)
163
+ writer.writerow(header)
164
+ writer.writerows(message_rows)
165
+ elif path.endswith(".json"):
166
+ for message in messages:
167
+ message_row = {
168
+ "agent_name": message.agent_name,
169
+ "content": message.content,
170
+ "turn": message.turn,
171
+ "timestamp": str(message.timestamp),
172
+ "visible_to": message.visible_to,
173
+ "msg_type": message.msg_type,
174
+ }
175
+ message_rows.append(message_row)
176
+
177
+ with open(path, "w") as f:
178
+
179
+
180
+ json.dump({
181
+ "experiment_setting": self.environment.experiment_setting,
182
+ "messages": message_rows,
183
+ }, f, indent=2)
184
+ else:
185
+ raise ValueError("Invalid file format")
agentreview/paper_review_message.py ADDED
@@ -0,0 +1,104 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import logging
2
+ from typing import List
3
+
4
+ from agentreview.message import MessagePool, Message
5
+
6
+
7
+ class PaperReviewMessagePool(MessagePool):
8
+ """
9
+ A pool to manage the messages in the paper review environment.
10
+
11
+ """
12
+
13
+ def __init__(self, experiment_setting: dict):
14
+ super().__init__()
15
+ self.experiment_setting = experiment_setting
16
+
17
+
18
+ def get_visible_messages_for_paper_review(self, agent_name, phase_index: int,
19
+ next_player_idx: int, player_names: List[str]) \
20
+ -> (List)[Message]:
21
+ """
22
+ Get all the messages that are visible to a given agent before a specified turn.
23
+
24
+ Parameters:
25
+ agent_name (str): The name of the agent.
26
+ turn (int): The specified turn.
27
+ phase_index (int): The specified phase in paper reviewing process.
28
+
29
+ Returns:
30
+ List[Message]: A list of visible messages.
31
+ """
32
+
33
+ reviewer_names = sorted([name for name in player_names if name.startswith("Reviewer")])
34
+
35
+ # Get the messages before the current turn
36
+ # prev_messages = [message for message in self._messages if message.turn < turn]
37
+ prev_messages = self._messages
38
+
39
+ if phase_index in [0, 1]:
40
+ visible_messages = [message for message in prev_messages if message.agent_name == "Paper Extractor"]
41
+
42
+ elif phase_index == 2:
43
+ visible_messages = []
44
+
45
+ for message in prev_messages:
46
+
47
+ # The author can see the paper content and each reviewer's review
48
+ if message.agent_name == "Paper Extractor" or message.agent_name == reviewer_names[next_player_idx]:
49
+ visible_messages.append(message)
50
+
51
+ # raise NotImplementedError(f"In Phase {phase_index}, only authors can respond to reviewers' "
52
+ # f"reviews, but the current agent is {agent_name}.")
53
+
54
+ elif phase_index == 3:
55
+ if [agent_name.startswith(prefix) for prefix in ["AC", "Reviewer", "Paper Extractor"]]:
56
+ # Both area chairs and reviewers can see all the reviews and rebuttals
57
+ visible_messages = prev_messages
58
+
59
+ elif agent_name.startswith("Author"):
60
+ visible_messages = []
61
+
62
+ elif phase_index == 4:
63
+ if agent_name.startswith("AC"):
64
+ area_chair_type = self.experiment_setting['players']['AC'][0]["area_chair_type"]
65
+
66
+ # 'BASELINE' means we do not specify the area chair's characteristics in the config file
67
+ if area_chair_type in ["inclusive", "BASELINE"]:
68
+ # An inclusive area chair can see all the reviews and rebuttals
69
+ visible_messages = prev_messages
70
+
71
+ elif area_chair_type == "conformist":
72
+ visible_messages = []
73
+
74
+ for message in prev_messages:
75
+ if message.agent_name.startswith("Author") or message.agent_name.startswith("Reviewer"):
76
+ visible_messages.append(message)
77
+
78
+
79
+ elif area_chair_type == "authoritarian":
80
+ visible_messages = []
81
+
82
+ for message in prev_messages:
83
+ if not (message.agent_name.startswith("Author") or message.agent_name.startswith("Reviewer")):
84
+ visible_messages.append(message)
85
+
86
+ else:
87
+ raise ValueError(f"Unknown Area chair type: {area_chair_type}.")
88
+
89
+
90
+ else:
91
+
92
+ visible_messages = []
93
+ for message in prev_messages:
94
+ if (
95
+ message.visible_to == "all"
96
+ or agent_name in message.visible_to
97
+ or agent_name == "Moderator"
98
+ ):
99
+ visible_messages.append(message)
100
+
101
+ logging.info(f"Phase {phase_index}: {agent_name} sees {len(visible_messages)} messages from "
102
+ f"{','.join([agent.agent_name for agent in visible_messages]) if visible_messages else 'None'}")
103
+
104
+ return visible_messages
agentreview/paper_review_player.py ADDED
@@ -0,0 +1,120 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import logging
2
+ import logging
3
+ import os
4
+ from pathlib import Path
5
+ from typing import List, Union
6
+
7
+ from llama_index.readers.file.docs import PDFReader
8
+
9
+ from agentreview.agent import Player
10
+ from .backends import IntelligenceBackend
11
+ from .config import BackendConfig
12
+ from .message import Message
13
+
14
+
15
+ class AreaChair(Player):
16
+
17
+ def __init__(
18
+ self,
19
+ name: str,
20
+ role_desc: str,
21
+ env_type: str,
22
+ backend: Union[BackendConfig, IntelligenceBackend],
23
+ global_prompt: str = None,
24
+ **kwargs,
25
+ ):
26
+ super().__init__(name, role_desc, backend, global_prompt, **kwargs)
27
+ self.env_type = env_type
28
+ self.role_desc = role_desc
29
+
30
+ def act(self, observation: List[Message]) -> str:
31
+
32
+ # The author just finished their rebuttals (so last speaker is Author 1).
33
+ # The AC asks each reviewer to update their reviews.
34
+
35
+ if self.env_type == "paper_review":
36
+ if len(observation) > 0 and observation[-1].agent_name.startswith("Author"):
37
+ return "Dear reviewers, please update your reviews based on the author's rebuttals."
38
+
39
+ else:
40
+ return super().act(observation)
41
+
42
+ elif self.env_type == "paper_decision":
43
+ return super().act(observation)
44
+
45
+ else:
46
+ raise ValueError(f"Unknown env_type: {self.env_type}")
47
+
48
+
49
+ class Reviewer(Player):
50
+
51
+ def __init__(
52
+ self,
53
+ name: str,
54
+ role_desc: str,
55
+ backend: Union[BackendConfig, IntelligenceBackend],
56
+ global_prompt: str = None,
57
+ **kwargs,
58
+ ):
59
+ super().__init__(name, role_desc, backend, global_prompt, **kwargs)
60
+
61
+ def act(self, observation: List[Message]) -> str:
62
+ return super().act(observation)
63
+
64
+
65
+ class PaperExtractorPlayer(Player):
66
+ """A player for solely extracting contents from a paper.
67
+
68
+ No API calls are made by this player.
69
+ """
70
+
71
+ def __init__(
72
+ self,
73
+ name: str,
74
+ role_desc: str,
75
+ paper_id: int,
76
+ paper_decision: str,
77
+ conference: str,
78
+ backend: Union[BackendConfig, IntelligenceBackend],
79
+ global_prompt: str = None,
80
+ **kwargs,
81
+ ):
82
+ super().__init__(name, role_desc, backend, global_prompt, **kwargs)
83
+ self.paper_id = paper_id
84
+ self.paper_decision = paper_decision
85
+ self.conference: str = conference
86
+
87
+ def act(self, observation: List[Message]) -> str:
88
+ """
89
+ Take an action based on the observation (Generate a response), which can later be parsed to actual actions that affect the game dynamics.
90
+
91
+ Parameters:
92
+ observation (List[Message]): The messages that the player has observed from the environment.
93
+
94
+ Returns:
95
+ str: The action (response) of the player.
96
+ """
97
+ print("Improve paper loading")
98
+ logging.info(f"Loading {self.conference} paper {self.paper_id} ({self.paper_decision}) ...")
99
+
100
+ loader = PDFReader()
101
+ document_path = Path(os.path.join(self.args.data_dir, self.conference, "paper", self.paper_decision,
102
+ f"{self.paper_id}.pdf")) #
103
+ documents = loader.load_data(file=document_path)
104
+
105
+ num_words = 0
106
+ main_contents = "Contents of this paper:\n\n"
107
+ FLAG = False
108
+
109
+ for doc in documents:
110
+ text = doc.text.split(' ')
111
+ if len(text) + num_words > self.args.max_num_words:
112
+ text = text[:self.args.max_num_words - num_words]
113
+ FLAG = True
114
+ num_words += len(text)
115
+ text = " ".join(text)
116
+ main_contents += text + ' '
117
+ if FLAG:
118
+ break
119
+
120
+ return main_contents
agentreview/paper_review_settings.py ADDED
@@ -0,0 +1,114 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ default_reviewer_setting = {
2
+ "is_benign": None,
3
+ "is_knowledgeable": None,
4
+ "is_responsible": None,
5
+ "provides_numeric_rating": True,
6
+ }
7
+
8
+
9
+ def get_experiment_settings(setting: dict):
10
+ """
11
+ Generate experiment settings based on provided configurations for area chairs (AC) and reviewers.
12
+
13
+ Args:
14
+ setting (dict): A dictionary containing configuration for AC, reviewers, authors, and global settings.
15
+
16
+ Returns:
17
+ dict: Experiment settings including players (Paper Extractor, AC, Author, Reviewer)
18
+ and global settings.
19
+ """
20
+
21
+ experiment_setting = {
22
+ "id": None,
23
+ "players": {
24
+
25
+ # Paper Extractor is a special player that extracts a paper from the dataset.
26
+ # Its constructor does not take any arguments.
27
+ "Paper Extractor": [{}],
28
+
29
+ # Assume there is only one area chair (AC) in the experiment.
30
+ "AC": [get_ac_setting_from_ac_type(ac_type) for ac_type in setting['AC']],
31
+
32
+ # Author role with default configuration.
33
+ "Author": [{}],
34
+
35
+ # Reviewer settings are generated based on reviewer types provided in the settings.
36
+ "Reviewer": [get_reviewer_setting_from_reviewer_type(reviewer_type) for reviewer_type in setting[
37
+ 'reviewer']],
38
+ },
39
+ "global_settings": setting['global_settings']
40
+ }
41
+
42
+ return experiment_setting
43
+
44
+
45
+ def get_reviewer_setting_from_reviewer_type(reviewer_type: str):
46
+ """
47
+ Map a reviewer type (e.g., 'benign', 'malicious') to a reviewer setting dictionary.
48
+
49
+ Args:
50
+ reviewer_type (str): The type of reviewer (e.g., 'benign', 'malicious', 'knowledgeable').
51
+
52
+ Returns:
53
+ dict: A dictionary representing the reviewer's attributes like is_benign, is_knowledgeable,
54
+ is_responsible, or if they know the authors (e.g., 'famous', 'unfamous').
55
+
56
+ Raises:
57
+ ValueError: If an unknown reviewer type is provided.
58
+ """
59
+ reviewer_setting = {
60
+ "is_benign": None,
61
+ "is_knowledgeable": None,
62
+ "is_responsible": None
63
+ }
64
+
65
+ # Intention
66
+ if reviewer_type == "benign":
67
+ reviewer_setting["is_benign"] = True
68
+ elif reviewer_type == "malicious":
69
+ reviewer_setting["is_benign"] = False
70
+
71
+ # Knowledgeability
72
+ elif reviewer_type == "knowledgeable":
73
+ reviewer_setting["is_knowledgeable"] = True
74
+ elif reviewer_type == "unknowledgeable":
75
+ reviewer_setting["is_knowledgeable"] = False
76
+
77
+ # Commitment
78
+ elif reviewer_type == "responsible":
79
+ reviewer_setting["is_responsible"] = True
80
+ elif reviewer_type == "irresponsible":
81
+ reviewer_setting["is_responsible"] = False
82
+
83
+ elif reviewer_type in ["BASELINE"]:
84
+ pass
85
+
86
+ elif reviewer_type in ["authors_are_famous"]:
87
+ reviewer_setting["knows_authors"] = "famous"
88
+
89
+ elif reviewer_type in ["authors_are_unfamous"]:
90
+ reviewer_setting["knows_authors"] = "unfamous"
91
+
92
+
93
+ else:
94
+ raise ValueError(f"Unknown reviewer type: {reviewer_type}")
95
+
96
+ return reviewer_setting
97
+
98
+
99
+ def get_ac_setting_from_ac_type(ac_type: str):
100
+ """
101
+ Generate the area chair (AC) settings based on the type of AC.
102
+
103
+ Args:
104
+ ac_type (str): The type of area chair (e.g., 'senior', 'junior').
105
+
106
+ Returns:
107
+ dict: A dictionary containing the area chair type.
108
+ """
109
+
110
+ ac_setting = {
111
+ "area_chair_type": ac_type
112
+ }
113
+
114
+ return ac_setting
agentreview/role_descriptions.py ADDED
@@ -0,0 +1,515 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import sys
3
+
4
+ import numpy as np
5
+
6
+ sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), '..')))
7
+
8
+ import const
9
+ from agentreview.config import AgentConfig
10
+
11
+ PLAYER_BACKEND = {
12
+ "backend_type": "openai-chat",
13
+ "temperature": 0.9,
14
+ "max_tokens": 4096
15
+ }
16
+
17
+ # archived. If we use this rubric, the scores given by the reviewers are too high
18
+ RUBRICS_v1 = ("Rubrics: 10 for strong accept (top 5% of accepted papers), "
19
+ "8 for accept (top 50% of accepted papers), "
20
+ "6 for borderline accept, "
21
+ "5 for borderline reject, "
22
+ "3 for reject, and 1 for strong reject. ")
23
+
24
+ SCORE_CALCULATION_v1 = {
25
+ 10: "This study is among the top 0.5% of all papers",
26
+ 8: "This study is one of the most thorough I have seen. It changed my thinking on this topic. I would fight for it to be accepted",
27
+ 6: "This study provides sufficient support for all of its claims/arguments. Some extra experiments are needed, but not essential. The method is highly original and generalizable to various fields. It deepens the understanding of some phenomenons or lowers the barriers to an existing research direction",
28
+ 5: "This study provides sufficient support for its major claims/arguments, some minor points may need extra support or details. The method is moderately original and generalizable to various relevant fields. The work it describes is not particularly interesting and/or novel, so it will not be a big loss if people don’t see it in this conference",
29
+ 3: "Some of the main claims/arguments are not sufficiently supported, there are major technical/methodological problems. The proposed method is somewhat original and generalizable to various relevant fields. I am leaning towards rejection, but I can be persuaded if my co-reviewers think otherwise",
30
+ 1: "This study is not yet sufficiently thorough to warrant publication or is not relevant to the conference. This paper makes marginal contributions"
31
+ }
32
+
33
+ # Start to use this rubric from 2024.1.23 as SCORE_CALCULATION_v1 is too harsh
34
+ SCORE_CALCULATION = {
35
+ 10: "This study is among the top 2% of all papers. It is one of the most thorough I have seen. It changed my "
36
+ "thinking on this topic. I would fight for it to be accepted",
37
+ 8: "This study is among the top 10% of all papers. It provides sufficient support for all of its claims/arguments. "
38
+ "Some extra experiments are needed, "
39
+ "but not essential. The method is highly original and generalizable to various fields. It deepens the understanding of some phenomenons or lowers the barriers to an existing research direction",
40
+ 6: "This study provides sufficient support for its major claims/arguments, some minor points may need extra support or details. The method is moderately original and generalizable to various relevant fields. The work it describes is not particularly interesting and/or novel, so it will not be a big loss if people don’t see it in this conference",
41
+ 5: "Some of the main claims/arguments are not sufficiently supported, there are major technical/methodological "
42
+ "problems. The proposed method is somewhat original and generalizable to various relevant fields. I am leaning towards rejection, but I can be persuaded if my co-reviewers think otherwise",
43
+ 3: "This paper makes marginal contributions",
44
+ 1: "This study is not yet sufficiently thorough to warrant publication or is not relevant to the conference"
45
+ }
46
+
47
+ RUBRICS_v1 = ("Rubrics: "
48
+ f"10 for strong accept ({SCORE_CALCULATION[10]}); "
49
+ f"8 for accept ({SCORE_CALCULATION[8]}); "
50
+ f"6 for borderline accept ({SCORE_CALCULATION[6]}); "
51
+ f"5 for borderline reject ({SCORE_CALCULATION[5]}); "
52
+ f"3 for reject ({SCORE_CALCULATION[3]}); "
53
+ f"1 for strong reject ({SCORE_CALCULATION[1]}); ")
54
+
55
+ INSTRUCTIONS_FOR_FAMOUS_AUTHORS = ("You know that the authors of the paper are from a very famous lab and have "
56
+ "several publication in "
57
+ "this "
58
+ "field. Be sure to consider that when writing the paper reviews. "
59
+ "\n\n")
60
+
61
+ RUBRICS = (f"* 10: {SCORE_CALCULATION[10]};\n"
62
+ f"* 8: {SCORE_CALCULATION[8]};\n"
63
+ f"* 6: {SCORE_CALCULATION[6]};\n"
64
+ f"* 5: {SCORE_CALCULATION[5]};\n"
65
+ f"* 3: {SCORE_CALCULATION[3]};\n"
66
+ f"* 1: {SCORE_CALCULATION[1]}. ")
67
+
68
+ # Try to lower the score
69
+ SCORE_CONTROL = ("This is a very rigorous top-tier conference. "
70
+ "Most papers get scores <=5 before the rebuttal. ")
71
+
72
+ # Need to explain this
73
+ EXPLANATION_FOR_NOT_UPDATING_MANUSCRIPT = (f"Note: Do not mention that the authors did not update the manuscripts "
74
+ f"and do not penalize them for "
75
+ f"not revising their papers. They cannot do it now. Just assume they have revised their "
76
+ f"manuscripts according to their rebuttals.")
77
+
78
+
79
+ def get_instructions_for_overall_scores(author_type: str) -> str:
80
+ instruction = "Do not write any reasons. "
81
+
82
+ if author_type not in ["famous"]:
83
+ instruction += ("Do not assign scores of 7 or higher before the rebuttal unless the paper "
84
+ "demonstrates exceptional originality and "
85
+ "significantly advances the state-of-the-art in machine learning. "
86
+ )
87
+ instruction += "Intermediary integer scores such as 9, 7, 4, and 2 are allowed. "
88
+
89
+ return instruction
90
+
91
+ def get_reviewer_description(is_benign: bool = None, is_knowledgeable: bool = None, is_responsible: bool = None,
92
+ provides_numeric_rating:
93
+ bool = True, knows_authors: bool = False, phase: str = "reviewer_write_reviews"):
94
+ assert phase in ["reviewer_write_reviews", 'reviewer_ac_discussion']
95
+ assert provides_numeric_rating in [True, False]
96
+ bio = ("You are a reviewer. You write peer review of academic papers by evaluating their technical "
97
+ f"quality, originality, and clarity. ")
98
+
99
+ # The reviewer's famous identities are known to the AC
100
+ if knows_authors:
101
+ bio += "\n\n" + INSTRUCTIONS_FOR_FAMOUS_AUTHORS
102
+
103
+ else:
104
+ bio += f"{SCORE_CONTROL}\n\n"
105
+
106
+ bio += "## Review Guidelines\n"
107
+
108
+ if phase in ["reviewer_write_reviews"]:
109
+
110
+ guideline = "Write a peer review using the following format:\n\n"
111
+ guideline += "```\n"
112
+ if provides_numeric_rating:
113
+ guideline += f"Overall rating: ... # {get_instructions_for_overall_scores(knows_authors)}\n\n"
114
+
115
+ """
116
+ # Review formats used in most ICLR conferences
117
+ guideline += "Summary: ... # Provide a brief summary of the paper, such as its main contributions.\n\n"
118
+ guideline += "Strengths: ... # Give a list of strengths for the paper.\n\n"
119
+ guideline += "Weaknesses: ...<EOS> # Give a list of weaknesses and questions for the paper.\n\n"
120
+ """
121
+
122
+ # Review formats used in [Stanford's Nature Submission](https://arxiv.org/abs/2310.01783)
123
+ guideline += "Significance and novelty: ... \n\n"
124
+ guideline += "Reasons for acceptance: ... # List 4 key reasons. \n\n"
125
+ guideline += "Reasons for rejection: ... # List 4 key reasons. For each of 4 key reasons, use **>=2 sub bullet points** to further clarify and support your arguments in painstaking details \n\n"
126
+ guideline += "Suggestions for improvement: ... <EOS> # List 4 key suggestions \n\n"
127
+
128
+
129
+
130
+
131
+
132
+
133
+
134
+ elif phase in ["reviewer_ac_discussion"]:
135
+
136
+ guideline = "Based on the authors' responses, write an updated paper review in the reviewer-AC discussion."
137
+
138
+ if provides_numeric_rating:
139
+ guideline += (
140
+ "Decrease your score if the authors fail to address your or other reviewers' concerns, or "
141
+ f"provide very vague responses. "
142
+ f"Increase your score only if the authors have "
143
+ f"addressed all your and other "
144
+ f"reviewers' concerns, and have comprehensively described how they plan to update the manuscript. Keep "
145
+ f"the "
146
+ f"score "
147
+ f"unchanged "
148
+ f"otherwise. {EXPLANATION_FOR_NOT_UPDATING_MANUSCRIPT}"
149
+ "\n\n## Format for the updated review\n\n```\n")
150
+
151
+ guideline += ("Overall rating: ... # Provide an updated overall rating using an integer from 1 to 10. Do "
152
+ "not penalize the authors for not updating their manuscripts. They cannot revise their "
153
+ "manuscripts now.")
154
+ else:
155
+ guideline += "\n\n```\n"
156
+
157
+ guideline += (f"Summary: ... <EOS> # "
158
+ f"{'Provide a justification on your updated score.' if provides_numeric_rating else ''} Comment on "
159
+ f"whether the "
160
+ "author has "
161
+ "addressed "
162
+ "your questions and concerns. Note that authors cannot revise their "
163
+ "manuscripts now.\n")
164
+
165
+ else:
166
+ raise ValueError(f"Invalid phase for a reviewer: {phase}")
167
+
168
+ bio += f"{guideline}```\n\n"
169
+
170
+ if not all([x is None for x in [is_benign, is_knowledgeable, is_responsible]]):
171
+ bio += "## Your Biography\n"
172
+
173
+ # Knowledgeability
174
+ desc_knowledgeable_reviewer = (
175
+ "You are knowledgeable, with a strong background and a PhD degree in the subject areas "
176
+ "related to this paper. "
177
+ "You possess the expertise necessary to scrutinize "
178
+ "and provide insightful feedback to this paper.")
179
+
180
+ desc_unknowledgeable_reviewer = (
181
+ "You are not knowledgeable and do not have strong background in the subject areas related to "
182
+ "this paper.")
183
+
184
+ if is_knowledgeable is not None:
185
+ if is_knowledgeable:
186
+ desc = desc_knowledgeable_reviewer
187
+ else:
188
+ desc = desc_unknowledgeable_reviewer
189
+
190
+ bio += f"Knowledgeability: {desc}\n\n"
191
+
192
+ # Responsible vs. lazy
193
+
194
+ desc_responsible_reviewer = ("As a responsible reviewer, you highly responsibly write paper reviews and actively "
195
+ "participate in reviewer-AC discussions. "
196
+ "You meticulously assess a research "
197
+ "paper's "
198
+ "technical accuracy, innovation, and relevance. You thoroughly read the paper, "
199
+ "critically analyze the methodologies, and carefully consider the paper's "
200
+ "contribution to the field. ")
201
+
202
+ desc_lazy_reviewer = ("As a lazy reviewer, your reviews tend to be superficial and hastily done. You do not like "
203
+ "to discuss in the reviewer-AC discussion. "
204
+ "Your assessments might overlook critical details, lack depth in analysis, "
205
+ "fail to recognize novel contributions, "
206
+ "or offer generic feedback that does little to advance the paper's quality.")
207
+
208
+ if is_responsible is not None:
209
+
210
+ if is_responsible:
211
+ desc = desc_responsible_reviewer
212
+ else:
213
+ desc = desc_lazy_reviewer
214
+
215
+ bio += f"Responsibility: {desc}\n\n"
216
+
217
+ # Benign (Good) vs. Malicious
218
+ desc_benign_reviewer = ("As a benign reviewer, your approach to reviewing is guided by a genuine intention "
219
+ "to aid authors in enhancing their work. You provide detailed, constructive feedback, "
220
+ "aimed at both validating robust research and guiding authors to refine and improve their work. "
221
+ "You are also critical of technical flaws in the paper. ")
222
+
223
+ desc_malicious_reviewer = ("As a mean reviewer, your reviewing style is often harsh and overly critical, "
224
+ "with a tendency towards negative bias. Your reviews may focus excessively on "
225
+ "faults, sometimes overlooking the paper's merits. Your feedback can be discouraging, "
226
+ "offering minimal guidance for improvement, and often aims more at rejection than constructive critique. ")
227
+
228
+ if is_benign is not None:
229
+
230
+ if is_benign:
231
+ desc = desc_benign_reviewer
232
+ else:
233
+ desc = desc_malicious_reviewer
234
+
235
+ bio += f"Intention: {desc}\n\n"
236
+
237
+ if provides_numeric_rating:
238
+ bio += f"## Rubrics for Overall Rating\n\n{RUBRICS}"
239
+
240
+ return bio
241
+
242
+
243
+ def get_author_description() -> str:
244
+ bio = ("You are an author. You write research papers and submit them to conferences. During the rebuttal phase, "
245
+ "you carefully read the reviews from the reviewers and respond to each of them.\n\n")
246
+
247
+ bio += "## Author Guidelines\n"
248
+
249
+ bio += "Write a response to the reviews using the following format:\n\n"
250
+ bio += "```\n"
251
+ bio += ("Response: ... # Provide a brief response to each review. Address each question and weakness mentioned "
252
+ "by the reviewer. No need to respond to the strengths they mentioned. \n\n")
253
+
254
+ return bio
255
+
256
+
257
+ def get_ac_description(area_chair_type: str, phase: str, scoring_method: str, num_papers_per_area_chair: int,
258
+ knows_authors: bool = False, **kwargs) -> (
259
+ str):
260
+ """
261
+ Note: We assume that the AC definitely provides a score so that the papers can be compared
262
+ Args:
263
+ phase (str): The phase of the conference. Must be either "reviewer_ac_discussion" or "ac_write_metareviews".
264
+ scoring_method (str): The method used by the area chair to make the final decision. Must be either of
265
+ "recommendation": directly make a recommendation (e.g. "Accept", "Reject") for each paper
266
+ "ranking": rank the papers using your willingness to accept
267
+
268
+ """
269
+
270
+ acceptance_rate = kwargs.get('acceptance_rate', 0.32)
271
+ bio = "You are a very knowledgeable and experienced area chair in a top-tier machine learning conference. "
272
+
273
+ if phase == "ac_write_metareviews":
274
+ bio += ("You evaluate the reviews provided by reviewers and write metareviews. Later, you will decide which "
275
+ "paper gets accepted or rejected based on your metareviews. ")
276
+
277
+ elif phase == "ac_make_decisions":
278
+ bio += "Based on the metareviews you wrote previously, you decide if a paper is accepted or rejected. "
279
+
280
+ # The authors' famous identities are known to the AC
281
+ if knows_authors:
282
+ bio += INSTRUCTIONS_FOR_FAMOUS_AUTHORS + SCORE_CONTROL
283
+
284
+ bio += "\n\n## Area Chair Guidelines\n"
285
+
286
+ if phase == "ac_write_metareviews":
287
+
288
+ guideline = "Write a metareview using the following format:\n\n"
289
+ guideline += "```\n"
290
+ guideline += (
291
+ f"Score: ... # Provide a score for the paper in the range from 1 to 10. {get_instructions_for_overall_scores(knows_authors)}Fractions such as "
292
+ "6.5 is allowed.\n\n")
293
+ guideline += ("Summary: ... <EOS> # Provide a summary of the paper based on the paper contents (if provided), "
294
+ "reviewers' "
295
+ "reviews and discussions (if provided), authors' rebuttal, and your own expertise. "
296
+ f"{EXPLANATION_FOR_NOT_UPDATING_MANUSCRIPT}\n")
297
+
298
+ bio += guideline
299
+
300
+ bio += "```\n\n"
301
+
302
+ elif phase == "ac_make_decisions":
303
+ max_num_accepted_papers = int(np.floor(num_papers_per_area_chair * acceptance_rate))
304
+
305
+ # The area chair usually accept more papers than s/he should
306
+ # So we use a ranking approach
307
+
308
+ if scoring_method == "recommendation":
309
+ num_rejected_papers = int(num_papers_per_area_chair)
310
+ CONTROL_NUM_ACCEPTED_PAPERS = (f"You must accept around "
311
+ f"{max_num_accepted_papers} out of {num_papers_per_area_chair} papers, "
312
+ # f"so around {num_rejected_papers - max_num_accepted_papers} papers should "
313
+ # f"have a decision of 'Reject'. "
314
+ # f"You should maintain the high criteria of this conference. "
315
+ # f"'5' is borderline reject."
316
+ )
317
+ guideline = (f"Carefully decide if a paper is accepted or rejected using the metareview. Use the following "
318
+ f"format ")
319
+ guideline += f"({CONTROL_NUM_ACCEPTED_PAPERS})"
320
+ guideline += f":\n\n"
321
+
322
+ guideline += "```\n"
323
+ guideline += ("Paper ID: ... # Provide the first paper ID. \n"
324
+ "Decision: ... # Provide a decision for the paper. Must be one of "
325
+ "'Reject' and 'Accept'.\n"
326
+ # "Reasons: ... # Provide a short justification for your decision, maximum 3 sentences. \n"
327
+ "Paper ID: ... # Provide the second paper ID. \n"
328
+ f"... # Likewise\n")
329
+ guideline += "```\n\n"
330
+
331
+ bio += guideline
332
+
333
+ elif scoring_method == "ranking":
334
+
335
+ # The area chair usually accept more papers than s/he should
336
+ # So we use this ranking approach
337
+
338
+ guideline = (f"Rank the papers from the paper you are most willing to accept to the least willing to "
339
+ f"accept. '1' indicates "
340
+ f"the paper "
341
+ f"you are most "
342
+ f"willing to accept. "
343
+ f"Use this format:\n\n")
344
+ guideline += "```\n"
345
+ guideline += "Paper ID: 1 # The paper ID you most want to accept.\n"
346
+ guideline += "Willingness to accept: 1 # This integer must be unique for each paper. \n"
347
+ guideline += "Paper ID: ... # The second paper ID you most want to accept .. \n...\n"
348
+ guideline += "Willingness to accept: 2 \n"
349
+ guideline += "...\n```\n\n"
350
+
351
+ bio += guideline
352
+
353
+ else:
354
+ raise NotImplementedError(f"Unknown scoring method: {scoring_method}")
355
+
356
+
357
+ else:
358
+ raise ValueError(f"Invalid phase for an area chair: {phase}")
359
+
360
+ if phase == "ac_write_metareviews":
361
+ bio += f"## Rubrics for Overall Rating\n\n{RUBRICS}\n\n"
362
+
363
+ desc_inclusive_ac = ("You are an inclusive area chair. You tend to hear from all reviewers' opinions and combine "
364
+ "them with your own judgments to make the final decision.")
365
+
366
+ desc_conformist_ac = ("You are a conformist area chair who perfunctorily handle area chair duties. You "
367
+ "mostly follow "
368
+ "the reviewers' suggestions to write your metareview, score the paper, and decide whether "
369
+ "to accept a paper.")
370
+
371
+ desc_authoritarian_ac = ("You are an authoritarian area chair. You tend to read the paper on your own, follow your "
372
+ "own "
373
+ "judgment and mostly ignore "
374
+ "the reviewers' opinions.")
375
+
376
+ desc = ""
377
+
378
+ if phase == "ac_write_metareviews":
379
+
380
+ if area_chair_type == "inclusive":
381
+ desc = desc_inclusive_ac
382
+ elif area_chair_type == "conformist":
383
+ desc = desc_conformist_ac
384
+ elif area_chair_type == "authoritarian":
385
+ desc = desc_authoritarian_ac
386
+ elif area_chair_type == "BASELINE":
387
+ desc = ""
388
+
389
+ elif phase == "ac_make_decisions":
390
+ # We do not introduce different types of ACs in the decision phase
391
+ desc = ""
392
+
393
+ else:
394
+ raise ValueError(f"Invalid area chair type: {area_chair_type}. Choose from {','.join(const.AREA_CHAIR_TYPES)}.")
395
+
396
+ if desc != "":
397
+ bio += f"## Your Biography\n{desc}\n\n"
398
+
399
+ return bio
400
+
401
+
402
+ def get_reviewer_player_config(reviewer_index: int, is_benign: bool, is_knowledgeable: bool, is_responsible: bool,
403
+ global_settings: dict) -> dict:
404
+ """
405
+
406
+ Get a Player object that represents a reviewer.
407
+
408
+ Args:
409
+ reviewer_index:
410
+ is_benign (bool): If the reviewer has good intention and provides constructive feedback. If None, we do not add this field to the bio.
411
+ is_knowledgeable (bool): If the reviewer is knowledgeable and has a strong background in the subject areas related
412
+ to the paper. If None, we do not add this field to the bio.
413
+ is_responsible (bool): If the reviewer is responsible and provides detailed feedback.
414
+ provides_numeric_rating (bool): If the reviewer provides an overall rating (e.g. accept, weak accept) to the
415
+ paper. If None, we do not add this field to the bio.
416
+ knows_authors (str): The type of the authors of the paper under review. Must be one of "famous",
417
+ "unfamous", None (Default. Author type is unknown)
418
+
419
+ Return
420
+ player (dict): A player object that represents the reviewer.
421
+
422
+ """
423
+
424
+ knows_authors = "reviewer" in global_settings['persons_aware_of_authors_identities']
425
+ provides_numeric_rating = "reviewer" in global_settings['provides_numeric_rating']
426
+
427
+ reviewer = {
428
+ "name": f"Reviewer {reviewer_index}",
429
+ "role_desc": get_reviewer_description(is_benign, is_knowledgeable, is_responsible, provides_numeric_rating,
430
+ knows_authors),
431
+ # "role_desc": get_reviewer_description(is_benign, is_knowledgeable, is_responsible, provides_numeric_rating),
432
+ "backend": PLAYER_BACKEND,
433
+ "metadata": {
434
+ "is_benign": is_benign,
435
+ "is_knowledgeable": is_knowledgeable,
436
+ "is_responsible": is_responsible,
437
+ "knows_authors": knows_authors,
438
+ }
439
+ }
440
+
441
+ return AgentConfig(**reviewer)
442
+
443
+
444
+ def get_author_config() -> dict:
445
+ author = {
446
+ "name": f"Author",
447
+ "role_desc": get_author_description(),
448
+ "backend": PLAYER_BACKEND
449
+ }
450
+
451
+ return AgentConfig(**author)
452
+
453
+
454
+ def get_paper_extractor_config(**kwargs) -> dict:
455
+ max_tokens = kwargs.pop('max_tokens', 2048)
456
+
457
+ paper_extractor = {
458
+ "name": f"Paper Extractor",
459
+ "role_desc": "This is a player that only extracts content from the paper. No API calls are made",
460
+ "backend": {
461
+ "backend_type": "dummy",
462
+ # "temperature": 0.,
463
+ "max_tokens": max_tokens,
464
+ },
465
+ }
466
+
467
+ return AgentConfig(**paper_extractor)
468
+
469
+
470
+ def get_ac_config(**kwargs) -> dict:
471
+ """
472
+
473
+ Get a Player object that represents an area chair.
474
+
475
+ Args:
476
+ index_ac (int):
477
+ is_benign (bool): If the reviewer has good intention and provides constructive feedback.
478
+ is_knowledgeable: If the reviewer is knowledgeable and has a strong background in the subject areas related
479
+ to the paper.
480
+ is_responsible (bool): If the reviewer is responsible and provides detailed feedback.
481
+ provides_numeric_rating (bool): If the reviewer provides an overall rating (e.g. accept, weak accept) to the
482
+ paper.
483
+
484
+ scoring_method (str): Scoring method for the area chair.
485
+
486
+ Return
487
+ player (dict): A player object that represents the area chair.
488
+
489
+ """
490
+
491
+ env_type = kwargs.pop('env_type')
492
+ global_settings = kwargs.get('global_settings', {})
493
+
494
+ if env_type == "paper_review":
495
+ phase = "ac_write_metareviews"
496
+
497
+ elif env_type == "paper_decision":
498
+ phase = "ac_make_decisions"
499
+
500
+ else:
501
+ raise NotImplementedError
502
+
503
+ kwargs['phase'] = phase
504
+ kwargs['knows_authors'] = "ac" in global_settings['persons_aware_of_authors_identities']
505
+
506
+ area_chair = {
507
+ "name": "AC", # We assume there is only 1 AC for now
508
+ "role_desc": get_ac_description(**kwargs),
509
+ "backend": {'backend_type': 'openai-chat',
510
+ 'temperature': 0.0, # make the AC decision deterministic
511
+ 'max_tokens': 4096},
512
+ "env_type": env_type,
513
+ }
514
+
515
+ return AgentConfig(**area_chair)
agentreview/ui/__init__.py ADDED
File without changes
agentreview/ui/cli.py ADDED
@@ -0,0 +1,269 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import logging
2
+ import logging
3
+ import os
4
+ import os.path as osp
5
+ from typing import Union
6
+
7
+ from colorama import Fore
8
+ from colorama import Style as CRStyle
9
+ from prompt_toolkit import prompt
10
+ from prompt_toolkit.completion import WordCompleter
11
+ from prompt_toolkit.styles import Style
12
+ from rich.console import Console
13
+
14
+ from utility.utils import get_rebuttal_dir, load_gpt4_generated_ac_decisions, \
15
+ save_gpt4_generated_ac_decisions
16
+ from ..arena import Arena, TooManyInvalidActions
17
+ from ..backends.human import HumanBackendError
18
+ from ..environments import PaperReview, PaperDecision
19
+
20
+ # Get the ASCII art from https://patorjk.com/software/taag/#p=display&f=Big&t=Chat%20Arena
21
+ ASCII_ART = r"""
22
+ _ _____ _
23
+ /\ | | | __ \ (_)
24
+ / \ __ _ ___ _ __ | |_| |__) |_____ ___ _____ __
25
+ / /\ \ / _` |/ _ \ '_ \| __| _ // _ \ \ / / |/ _ \ \ /\ / /
26
+ / ____ \ (_| | __/ | | | |_| | \ \ __/\ V /| | __/\ V V /
27
+ /_/ \_\__, |\___|_| |_|\__|_| \_\___| \_/ |_|\___| \_/\_/
28
+ __/ |
29
+ |___/
30
+ """
31
+
32
+ color_dict = {
33
+ "red": Fore.RED,
34
+ "green": Fore.GREEN,
35
+
36
+ "blue": Fore.BLUE, # Paper Extractor
37
+ "light_red": Fore.LIGHTRED_EX, # AC
38
+ "light_green": Fore.LIGHTGREEN_EX, # Author
39
+ "yellow": Fore.YELLOW, # R1
40
+ "magenta": Fore.MAGENTA, # R2
41
+ "cyan": Fore.CYAN,
42
+ "white": Fore.WHITE,
43
+ "black": Fore.BLACK,
44
+
45
+ "light_yellow": Fore.LIGHTYELLOW_EX,
46
+ "light_blue": Fore.LIGHTBLUE_EX,
47
+ "light_magenta": Fore.LIGHTMAGENTA_EX,
48
+ "light_cyan": Fore.LIGHTCYAN_EX,
49
+ "light_white": Fore.LIGHTWHITE_EX,
50
+ "light_black": Fore.LIGHTBLACK_EX,
51
+
52
+ }
53
+
54
+ visible_colors = [
55
+ color
56
+ for color in color_dict # ANSI_COLOR_NAMES.keys()
57
+ if color not in ["black", "white", "red", "green"] and "grey" not in color
58
+ ]
59
+
60
+ try:
61
+ import colorama
62
+ except ImportError:
63
+ raise ImportError(
64
+ "Please install colorama: `pip install colorama`"
65
+ )
66
+
67
+ MAX_STEPS = 20 # We should not need this parameter for paper reviews anyway
68
+
69
+ # Set logging level to ERROR
70
+ logging.getLogger().setLevel(logging.ERROR)
71
+
72
+
73
+ class ArenaCLI:
74
+ """The CLI user interface for ChatArena."""
75
+
76
+ def __init__(self, arena: Arena):
77
+ self.arena = arena
78
+ self.args = arena.args
79
+
80
+
81
+ def launch(self, max_steps: int = None, interactive: bool = True):
82
+ """Run the CLI."""
83
+
84
+ if not interactive and max_steps is None:
85
+ max_steps = MAX_STEPS
86
+
87
+ args = self.args
88
+
89
+ console = Console()
90
+ # Print ascii art
91
+ console.print(ASCII_ART, style="bold dark_orange3")
92
+ timestep = self.arena.reset()
93
+ console.print("🎓AgentReview Initialized!", style="bold green")
94
+
95
+ env: Union[PaperReview, PaperDecision] = self.arena.environment
96
+ players = self.arena.players
97
+
98
+ env_desc = self.arena.global_prompt
99
+ num_players = env.num_players
100
+ player_colors = visible_colors[:num_players] # sample different colors for players
101
+ name_to_color = dict(zip(env.player_names, player_colors))
102
+
103
+ print("name_to_color: ", name_to_color)
104
+ # System and Moderator messages are printed in red
105
+ name_to_color["System"] = "red"
106
+ name_to_color["Moderator"] = "red"
107
+
108
+ console.print(
109
+ f"[bold green underline]Environment ({env.type_name}) description:[/]\n{env_desc}"
110
+ )
111
+
112
+ # Print the player name, role_desc and backend_type
113
+ for i, player in enumerate(players):
114
+ player_name_str = f"[{player.name} ({player.backend.type_name})] Role Description:"
115
+ # player_name = Text(player_name_str)
116
+ # player_name.stylize(f"bold {name_to_color[player.name]} underline")
117
+ # console.print(player_name)
118
+ # console.print(player.role_desc)
119
+
120
+ logging.info(color_dict[name_to_color[player.name]] + player_name_str + CRStyle.RESET_ALL)
121
+ logging.info(color_dict[name_to_color[player.name]] + player.role_desc + CRStyle.RESET_ALL)
122
+
123
+ console.print(Fore.GREEN + "\n========= Arena Start! ==========\n" + CRStyle.RESET_ALL)
124
+
125
+ step = 0
126
+ while not timestep.terminal:
127
+ if env.type_name == "paper_review":
128
+ if env.phase_index > 4:
129
+ break
130
+
131
+ elif env.type_name == "paper_decision":
132
+ # Phase 5: AC makes decisions
133
+ if env.phase_index > 5:
134
+ break
135
+
136
+ else:
137
+ raise NotImplementedError(f"Unknown environment type: {env.type_name}")
138
+
139
+ if interactive:
140
+ command = prompt(
141
+ [("class:command", "command (n/r/q/s/h) > ")],
142
+ style=Style.from_dict({"command": "blue"}),
143
+ completer=WordCompleter(
144
+ [
145
+ "next",
146
+ "n",
147
+ "reset",
148
+ "r",
149
+ "exit",
150
+ "quit",
151
+ "q",
152
+ "help",
153
+ "h",
154
+ "save",
155
+ "s",
156
+ ]
157
+ ),
158
+ )
159
+ command = command.strip()
160
+
161
+ if command == "help" or command == "h":
162
+ console.print("Available commands:")
163
+ console.print(" [bold]next or n or <Enter>[/]: next step")
164
+ console.print(" [bold]exit or quit or q[/]: exit the game")
165
+ console.print(" [bold]help or h[/]: print this message")
166
+ console.print(" [bold]reset or r[/]: reset the game")
167
+ console.print(" [bold]save or s[/]: save the history to file")
168
+ continue
169
+ elif command == "exit" or command == "quit" or command == "q":
170
+ break
171
+ elif command == "reset" or command == "r":
172
+ timestep = self.arena.reset()
173
+ console.print(
174
+ "\n========= Arena Reset! ==========\n", style="bold green"
175
+ )
176
+ continue
177
+ elif command == "next" or command == "n" or command == "":
178
+ pass
179
+ elif command == "save" or command == "s":
180
+ # Prompt to get the file path
181
+ file_path = prompt(
182
+ [("class:command", "save file path > ")],
183
+ style=Style.from_dict({"command": "blue"}),
184
+ )
185
+ file_path = file_path.strip()
186
+ # Save the history to file
187
+ self.arena.save_history(file_path)
188
+ # Print the save success message
189
+ console.print(f"History saved to {file_path}", style="bold green")
190
+ else:
191
+ console.print(f"Invalid command: {command}", style="bold red")
192
+ continue
193
+
194
+ try:
195
+ timestep = self.arena.step()
196
+ except HumanBackendError as e:
197
+ # Handle human input and recover with the game update
198
+ human_player_name = env.get_next_player()
199
+ if interactive:
200
+ human_input = prompt(
201
+ [
202
+ (
203
+ "class:user_prompt",
204
+ f"Type your input for {human_player_name}: ",
205
+ )
206
+ ],
207
+ style=Style.from_dict({"user_prompt": "ansicyan underline"}),
208
+ )
209
+ # If not, the conversation does not stop
210
+ timestep = env.step(human_player_name, human_input)
211
+ else:
212
+ raise e # cannot recover from this error in non-interactive mode
213
+ except TooManyInvalidActions as e:
214
+ # Print the error message
215
+ # console.print(f"Too many invalid actions: {e}", style="bold red")
216
+ print(Fore.RED + "This will be red text" + CRStyle.RESET_ALL)
217
+ break
218
+
219
+ # The messages that are not yet logged
220
+ messages = [msg for msg in env.get_observation() if not msg.logged]
221
+
222
+ # Print the new messages
223
+ for msg in messages:
224
+ message_str = f"[{msg.agent_name}->{msg.visible_to}]: {msg.content}"
225
+ console.print(color_dict[name_to_color[msg.agent_name]] + message_str + CRStyle.RESET_ALL)
226
+ msg.logged = True
227
+
228
+ step += 1
229
+ if max_steps is not None and step >= max_steps:
230
+ break
231
+
232
+ console.print("\n========= Arena Ended! ==========\n", style="bold red")
233
+
234
+ if env.type_name == "paper_review":
235
+
236
+ paper_id = self.arena.environment.paper_id
237
+ rebuttal_dir = get_rebuttal_dir(output_dir=self.args.output_dir,
238
+ paper_id=paper_id,
239
+ experiment_name=self.args.experiment_name,
240
+ model_name=self.args.model_name,
241
+ conference=self.args.conference)
242
+
243
+ os.makedirs(rebuttal_dir, exist_ok=True)
244
+
245
+ path_review_history = f"{rebuttal_dir}/{paper_id}.json"
246
+
247
+ if osp.exists(path_review_history):
248
+ raise Exception(f"History already exists!! ({path_review_history}). There must be something wrong with "
249
+ f"the path to save the history ")
250
+
251
+ self.arena.save_history(path_review_history)
252
+
253
+ elif env.type_name == "paper_decision":
254
+ ac_decisions = load_gpt4_generated_ac_decisions(output_dir=args.output_dir,
255
+ conference=args.conference,
256
+ model_name=args.model_name,
257
+ ac_scoring_method=args.ac_scoring_method,
258
+ experiment_name=args.experiment_name,
259
+ num_papers_per_area_chair=args.num_papers_per_area_chair)
260
+
261
+
262
+ ac_decisions += [env.ac_decisions]
263
+
264
+ save_gpt4_generated_ac_decisions(ac_decisions,
265
+ output_dir=args.output_dir,
266
+ conference=args.conference,
267
+ model_name=args.model_name,
268
+ ac_scoring_method=args.ac_scoring_method,
269
+ experiment_name=args.experiment_name)
agentreview/utils.py ADDED
@@ -0,0 +1,116 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import json
2
+ import re
3
+
4
+
5
+ def is_json(myjson):
6
+ """
7
+ Checks whether a given string is a valid JSON.
8
+
9
+ Parameters:
10
+ myjson (str): The string to be checked.
11
+
12
+ Returns:
13
+ bool: True if the string is a valid JSON, False otherwise.
14
+ """
15
+ try:
16
+ _ = json.loads(myjson)
17
+ except ValueError:
18
+ return False
19
+ return True
20
+
21
+
22
+ def is_json_inside(text):
23
+ """
24
+ Checks whether a given string contains valid JSON(s).
25
+
26
+ Parameters:
27
+ text (str): The string to be checked.
28
+
29
+ Returns:
30
+ bool: True if the string contains valid JSON(s), False otherwise.
31
+ """
32
+ text = re.sub(r"\s+", " ", text)
33
+ matches = re.findall(r"\{.*?\}", text)
34
+ for match in matches:
35
+ if is_json(match):
36
+ return True
37
+ return False
38
+
39
+
40
+ def extract_jsons(text):
41
+ """
42
+ Extracts all valid JSON objects from a given string.
43
+
44
+ Parameters:
45
+ text (str): The string from which JSON objects are to be extracted.
46
+
47
+ Returns:
48
+ List[Dict]: A list of all extracted JSON objects.
49
+ """
50
+ text = re.sub(r"\s+", " ", text)
51
+ matches = re.findall(r"\{.*?\}", text)
52
+ parsed_jsons = []
53
+ for match in matches:
54
+ try:
55
+ json_object = json.loads(match)
56
+ parsed_jsons.append(json_object)
57
+ except ValueError:
58
+ pass
59
+ return parsed_jsons
60
+
61
+
62
+ def extract_code(text):
63
+ """
64
+ Extracts all code blocks encapsulated by '```' from a given string.
65
+
66
+ Parameters:
67
+ text (str): The string from which Python code blocks are to be extracted.
68
+
69
+ Returns:
70
+ List[str]: A list of all extracted Python code blocks.
71
+ """
72
+ text = re.sub("```python", "```", text)
73
+ matches = re.findall(r"```(.*?)```", text, re.DOTALL)
74
+ parsed_codes = []
75
+ for match in matches:
76
+ parsed_codes.append(match)
77
+ return parsed_codes
78
+
79
+
80
+ class AttributedDict(dict):
81
+ """
82
+ A dictionary class whose keys are automatically set as attributes of the class.
83
+
84
+ The dictionary is serializable to JSON.
85
+
86
+ Inherits from:
87
+ dict: Built-in dictionary class in Python.
88
+
89
+ Note:
90
+ This class provides attribute-style access to dictionary keys, meaning you can use dot notation
91
+ (like `my_dict.my_key`) in addition to the traditional bracket notation (`my_dict['my_key']`).
92
+ """
93
+
94
+ def __init__(self, *args, **kwargs):
95
+ super().__init__(*args, **kwargs)
96
+
97
+ def __setattr__(self, key, value):
98
+ self[key] = value
99
+
100
+ def __getattr__(self, key):
101
+ if key in self:
102
+ return self[key]
103
+ raise AttributeError
104
+
105
+ def __delattr__(self, key):
106
+ del self[key]
107
+
108
+ # check whether the key is string when adding the key
109
+ def __setitem__(self, key, value):
110
+ if not isinstance(key, str):
111
+ raise ValueError("The key must be a string")
112
+ super().__setitem__(key, value)
113
+
114
+ def update(self, *args, **kwargs):
115
+ for key, value in dict(*args, **kwargs).items():
116
+ self[key] = value
arguments.py ADDED
@@ -0,0 +1,156 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import argparse
2
+ import logging
3
+ import os
4
+ import sys
5
+
6
+ def parse_args():
7
+ parser = argparse.ArgumentParser(description="Argument parser for configuring OpenAI API and experiment settings")
8
+
9
+ # Authentication details for OpenAI API
10
+ parser.add_argument(
11
+ "--openai_key", type=str, default=None, help="API key to authenticate with OpenAI. Can be set via this argument or through the OPENAI_API_KEY environment variable."
12
+ )
13
+
14
+ parser.add_argument(
15
+ "--deployment", type=str, default=None, help="For Azure OpenAI: the deployment name to be used when calling the API."
16
+ )
17
+
18
+ parser.add_argument(
19
+ "--openai_client_type", type=str, default="openai", choices=["openai", "azure_openai"],
20
+ help="Specify the OpenAI client type to use: 'openai' for standard OpenAI API or 'azure_openai' for Azure-hosted OpenAI services."
21
+ )
22
+
23
+ parser.add_argument(
24
+ "--endpoint", type=str, default=None, help="For Azure OpenAI: custom endpoint to access the API. Should be in the format 'https://<your-endpoint>.openai.azure.com'."
25
+ )
26
+
27
+
28
+ parser.add_argument(
29
+ "--api_version", type=str, default="2023-03-15-preview", help="API version to be used for making requests. Required for Azure OpenAI clients."
30
+ )
31
+
32
+ # Experiment configuration
33
+ parser.add_argument(
34
+ "--ac_scoring_method", type=str, default="ranking", choices=["recommendation", "ranking"],
35
+ help="Specifies the scoring method used by the Area Chair (AC) to evaluate papers: 'recommendation' or 'ranking'."
36
+ )
37
+
38
+ parser.add_argument(
39
+ "--conference", type=str, default="ICLR2023", help="Conference name where the papers are being evaluated, e.g., 'ICLR2023'."
40
+ )
41
+
42
+ parser.add_argument(
43
+ "--num_reviewers_per_paper", type=int, default=3, help="The number of reviewers assigned to each paper."
44
+ )
45
+
46
+ parser.add_argument(
47
+ "--experiment_name",
48
+ type=str, default=None, required=False,
49
+ choices=[
50
+ "BASELINE", "benign_Rx1", "malicious_Rx1", "malicious_Rx2", "malicious_Rx3", "unknowledgeable_Rx1",
51
+ "knowledgeable_Rx1", "responsible_Rx1", "irresponsible_Rx1", "irresponsible_Rx2", "irresponsible_Rx3",
52
+ "inclusive_ACx1", "authoritarian_ACx1", "conformist_ACx1", "no_numeric_ratings"],
53
+ help="Specifies the name of the experiment to run. Choose from predefined experiment types based on the reviewer and AC behavior or experiment configuration."
54
+ )
55
+
56
+ parser.add_argument(
57
+ "--ignore_missing_metareviews", action="store_true", help="If set, missing metareviews are ignored, allowing the experiment to continue without them."
58
+ )
59
+
60
+ parser.add_argument(
61
+ "--overwrite", action="store_true", help="If set, existing results or output files will be overwritten without prompting."
62
+ )
63
+
64
+ parser.add_argument(
65
+ "--num_papers_per_area_chair", type=int, default=10, help="The number of papers each area chair is assigned for evaluation."
66
+ )
67
+
68
+ # Model configuration
69
+ parser.add_argument(
70
+ "--model_name", type=str, default="gpt-4o", choices=["gpt-4", "gpt-4o", "gpt-35-turbo"],
71
+ help="Specifies which GPT model to use: 'gpt-4' for the standard GPT-4 model, 'gpt-35-turbo' for a "
72
+ "cost-effective alternative, or 'gpt-4o' for larger context support."
73
+ )
74
+
75
+ # Output directories
76
+ parser.add_argument(
77
+ "--output_dir", type=str, default="outputs", help="Directory where results, logs, and outputs will be stored."
78
+ )
79
+
80
+ # Output directories
81
+ parser.add_argument(
82
+ "--max_num_words", type=int, default=16384, help="Maximum number of words in the paper."
83
+ )
84
+
85
+ parser.add_argument(
86
+ "--visual_dir", type=str, default="outputs/visual", help="Directory where visualization files (such as graphs and plots) will be stored."
87
+ )
88
+
89
+ # System configuration
90
+ parser.add_argument(
91
+ "--device", type=str, default='cuda', help="The device to be used for processing (e.g., 'cuda' for GPU acceleration or 'cpu' for standard processing)."
92
+ )
93
+
94
+ parser.add_argument(
95
+ "--data_dir", type=str, default='data', help="Directory where input data (e.g., papers) are stored."
96
+ )
97
+
98
+
99
+ parser.add_argument(
100
+ "--acceptance_rate", type=float, default=0.32,
101
+ help="Percentage of papers to accept. We use 0.32, the average acceptance rate for ICLR 2020 - 2023"
102
+ )
103
+
104
+ args = parser.parse_args()
105
+
106
+ # Ensure necessary directories exist
107
+ os.makedirs(args.visual_dir, exist_ok=True)
108
+ os.makedirs(args.output_dir, exist_ok=True)
109
+
110
+ # Set 'player_to_test' based on experiment name
111
+ if args.experiment_name is None:
112
+ args.player_to_test = None
113
+ elif "Rx" in args.experiment_name:
114
+ args.player_to_test = "Reviewer"
115
+ elif "ACx" in args.experiment_name:
116
+ args.player_to_test = "Area Chair"
117
+ elif "no_rebuttal" in args.experiment_name or "no_overall_score" in args.experiment_name:
118
+ args.player_to_test = "Review Mechanism"
119
+
120
+ # Sanity checks for authentication
121
+ print("Running sanity checks for the arguments...")
122
+
123
+ if args.openai_client_type == "openai":
124
+ if os.environ.get('OPENAI_API_KEY') is None:
125
+ assert isinstance(args.openai_key, str), ("Please specify the `--openai_key` argument OR set the "
126
+ "OPENAI_API_KEY environment variable.")
127
+ raise ValueError("OpenAI key is missing.")
128
+
129
+ if args.openai_client_type == "azure_openai":
130
+ if os.environ.get('AZURE_OPENAI_KEY') is None:
131
+ assert isinstance(args.openai_key, str), ("Please specify the `--openai_key` argument OR set the "
132
+ "AZURE_OPENAI_KEY environment variable.")
133
+ os.environ['AZURE_OPENAI_KEY'] = args.openai_key
134
+
135
+ if os.environ.get('AZURE_DEPLOYMENT') is None:
136
+ assert isinstance(args.deployment, str), ("Please specify the `--deployment` argument OR set the "
137
+ "AZURE_DEPLOYMENT environment variable.")
138
+ os.environ['AZURE_DEPLOYMENT'] = args.deployment
139
+
140
+ if os.environ.get('AZURE_ENDPOINT') is None:
141
+ assert isinstance(args.endpoint, str), ("Please specify the `--endpoint` argument OR set the "
142
+ "AZURE_ENDPOINT environment variable.")
143
+ endpoint = args.endpoint
144
+ else:
145
+ endpoint = os.environ.get('AZURE_ENDPOINT')
146
+
147
+ if not endpoint.startswith("https://"):
148
+ endpoint = f"https://{endpoint}.openai.azure.com"
149
+ os.environ['AZURE_ENDPOINT'] = endpoint
150
+
151
+ if os.environ.get('OPENAI_API_VERSION') is None:
152
+ assert isinstance(args.api_version, str), ("Please specify the `--api_version` argument OR set the "
153
+ "OPENAI_API_VERSION environment variable.")
154
+ os.environ['OPENAI_API_VERSION'] = args.api_version
155
+
156
+ return args
const.py ADDED
@@ -0,0 +1,100 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+
3
+ """
4
+ Note
5
+ - ICLR 2021 has a category for "Significant-concerns"
6
+ - ICLR 2023 categories the papers as "Accept-notable-top-5", "Accept-notable-top-25", "Accept-poster", and "Reject"
7
+ """
8
+ PAPER_DECISIONS = ["Reject","Accept-oral", "Accept-spotlight", "Accept-poster",]
9
+ PAPER_DECISIONS_ICLR2019 = ["Accept-oral", "Accept-poster", "Reject"]
10
+
11
+ AREA_CHAIR_TYPES = ['inclusive', 'conformist', 'authoritarian', 'BASELINE']
12
+
13
+ # These are papers that contain potentially sensitive content. GPT-4 refused to generate reviews for these papers.
14
+ FILTERED_PAPER_IDS = {
15
+ "ICLR2020": [],
16
+ "ICLR2021": [],
17
+ "ICLR2022": [186, 200, 270],
18
+ "ICLR2023": []
19
+ }
20
+
21
+ ALL_REVIEW_PHASES = ["reviewer_write_reviews", "author_reviewer_discussion", "reviewer_ac_discussion", "ac_discussion"]
22
+
23
+
24
+ EXPERIMENT_NAME2REVIEWER_TYPES = {
25
+ "BASELINE": "BASELINE",
26
+ "knowledgeable_Rx1": "knowledgeable",
27
+ "unknowledgeable_Rx1": "unknowledgeable",
28
+ "irresponsible_Rx1": "irresponsible",
29
+ "irresponsible_Rx2": "irresponsible",
30
+ "irresponsible_Rx3": "irresponsible",
31
+ "responsible_Rx1": "responsible",
32
+ "malicious_Rx1": "malicious",
33
+ "malicious_Rx2": "malicious",
34
+ "malicious_Rx3": "malicious",
35
+ "benign_Rx1": "benign",
36
+ "inclusive_ACx1": "BASELINE",
37
+ "authoritarian_ACx1": "BASELINE",
38
+ "conformist_ACx1": "BASELINE",
39
+ "authors_are_famous_Rx1": "authors_are_famous",
40
+ "authors_are_famous_Rx2": "authors_are_famous",
41
+ "authors_are_famous_Rx3": "authors_are_famous",
42
+ "authors_are_famous_Rx1_no_rebuttal": "authors_are_famous",
43
+ "authors_are_famous_Rx2_no_rebuttal": "authors_are_famous",
44
+ "authors_are_famous_Rx3_no_rebuttal": "authors_are_famous",
45
+ "no_rebuttal": "BASELINE",
46
+ "no_overall_score": "BASELINE",
47
+ }
48
+
49
+
50
+ year2paper_ids = {
51
+ "ICLR2018": [45, 47, 59, 76, 229, 254, 372, 415, 447, 517, 543, 544, 562, 596, 615, 639] +
52
+ [1, 2, 7, 10, 16, 26, 33, 51, 60, 61, 65,
53
+ 67, 69, 72, 73, 77, 84, 88, 94, 99, 104,
54
+ 117, 121, 124, 131, 132, 134, 136, 143, 147, 148, 149, 155, 162, 164, 166, 168, 169, 171,
55
+ 175, 178, 179, 189, 196, 201, 203, 204, 205] +
56
+ [3, 4, 6, 8, 9, 11, 12, 13, 15, 17, 18, 19, 20, 21, 24, 25, 27, 28, 30, 31, 32, 34, 36, 37, 39, 40,
57
+ 41, 42, 43, 44, 46, 52, 53, 54, 55, 56, 58, 63, 66, 68, 71, 74, 75, 78, 80, 83, 85, 87, 89,
58
+ 91, 92, 93, 95, 96, 97, 100, 101, 102, 103, 105, 106, 107, 108, 109, 110, 111, 113, 114, 115,
59
+ 116, 118, 120, 122, 123, 125, 127, 128, 129, 133, 135, 138, 141, 142, 144, 153, 154, 156, 157, 158,
60
+ 159, 161, 163, 170, 172, 173, 174, 176, 177, 180, 181, 182, 184, 185, 186, 187, 190, 191, 193, 194,
61
+ 197, 200, 206, 207, 209, 211, 213, 214, 218, 219, 221, 222, 223, 225, 226, 230, 234, 237, 238, 241,
62
+ 243, 244, 247, 248, 253, 255, 256, 257, 258, 259, 266, 268, 271, 272, 273, 275, 276, 278, 283,
63
+ 286],
64
+
65
+
66
+ "ICLR2019": [1, 26, 119, 220, 231, 507, 563, 566, 574, 632, 654, 709, 734, 780, 835, 917] + [4, 27, 33, 39, 40,
67
+ 51, 57, 67, 70, 72,
68
+ 73, 76, 77, 82, 87,
69
+ 98, 99, 100, 106,
70
+ 108, 109, 110, 111,
71
+ 113, 114, 116, 123,
72
+ 129, 130, 143, 146,
73
+ 147, 150, 155, 177,
74
+ 184, 187, 190, 194,
75
+ 201, 202, 203, 205,
76
+ 211, 213, 222, 237,
77
+ 238] + [2, 3, 6, 8,
78
+ 9, 11, 12,
79
+ 13, 14, 15,
80
+ 16, 17, 18,
81
+ 19, 20, 21,
82
+ 22, 23, 24,
83
+ 28, 29, 32,
84
+ 35, 36, 37,
85
+ 38, 41, 42,
86
+ 43, 44, 45, 46, 47, 48, 49, 50, 52, 54, 55, 58, 59, 60, 61, 62, 63, 65, 66, 68, 69, 71, 74, 75, 78, 79, 80, 83, 84, 85, 86, 89, 90, 91, 92, 93, 94, 95, 96, 97, 101, 102, 104, 105, 107, 112, 115, 117, 118, 120, 122, 124, 125, 127, 128, 131, 132, 133, 134, 135, 136, 137, 139, 140, 141, 142, 144, 145, 148, 149, 152, 153, 154, 158, 160, 161, 162, 163, 164, 165, 167, 168, 171, 172, 173, 174, 178, 179, 180, 181, 182, 185, 186, 189, 191, 192, 193, 195, 196, 197, 198, 204, 206, 208, 209, 210, 214, 216, 217, 218, 221, 223, 224, 225, 226, 228, 229, 230, 233],
87
+
88
+
89
+
90
+ "ICLR2020": [2, 3, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 17, 19, 20, 21, 23, 27, 28, 31, 32, 33, 34, 35, 37, 38,
91
+ 41, 42, 43, 46, 48, 49, 52, 56, 58, 60, 62, 63, 66, 67, 68, 74, 75, 76, 79, 80, 82, 83, 84, 85, 86,
92
+ 87, 90, 94, 97, 101, 105, 106, 107, 109, 110, 111, 115, 116, 118, 119, 120, 121, 122, 123, 124, 126,
93
+ 1, 16, 18, 22, 24, 29, 40, 45, 50, 53, 55, 57, 61, 70, 73, 77, 91, 93, 95, 103, 108, 112, 113, 117,
94
+ 125, 132, 5, 44, 47, 54, 71, 88, 69, 78, 102, 221],
95
+ "ICLR2021": [140, 218, 294, 332, 362, 420, 1, 5, 12, 20, 28, 52, 67, 69, 75, 102, 103, 110, 126, 135, 138, 147, 149, 151, 160, 170, 174, 181, 182, 190, 44, 3, 4, 10, 11, 14, 17, 18, 21, 22, 24, 27, 33, 36, 38, 45, 47, 49, 59, 70, 72, 73, 78, 79, 82, 84, 86, 88, 89, 91, 92, 98, 100, 101, 104, 105, 106, 107, 108, 111, 114, 120, 123, 124, 125, 130, 131, 133, 137, 141, 142, 143, 154, 155, 156, 157, 159, 161, 162, 164, 166, 167, 168, 171, 172, 176, 193, 194, 81, 94, 95, 146, 177, 179, 184, 186],
96
+ "ICLR2022": [86, 154, 208, 222, 224, 284, 9, 10, 11, 12, 14, 25, 30, 31, 39, 42, 45, 56, 68, 73, 80, 88, 89, 90, 96, 101, 102, 104, 109, 1, 4, 6, 27, 36, 43, 47, 61, 62, 63, 65, 67, 69, 81, 82, 95, 98, 99, 100, 103, 105, 106, 108, 115, 120, 121, 122, 130, 134, 142, 143, 144, 145, 152, 153, 157, 159, 168, 173, 174, 175, 176, 179, 180, 186, 187, 193, 194, 197, 200, 201, 205, 210, 216, 226, 229, 233, 234, 235, 236, 239, 248, 261, 262, 263, 264, 269, 270, 271, 112, 113, 34, 64, 158, 172, 277, 280, 283, 286],
97
+ "ICLR2023": [210, 219, 1759, 1774, 9, 11, 12, 33, 54, 55, 61, 70, 79, 86, 88, 90, 97, 116, 128, 129, 143, 152, 160, 168, 174, 177, 181, 193, 1647, 1651, 1666, 1670, 1673, 1675, 1677, 1678, 1680, 1683, 1692, 1698, 1703, 1709, 1716, 1720, 1723, 1727, 1728, 1742, 1743, 1752, 1754, 1760, 113, 156, 214, 220, 317, 318, 1657, 1686, 1740, 1762, 1783, 1817, 2, 3, 6, 8, 17, 18, 24, 25, 29, 30, 45, 62, 77, 80, 82, 84, 89, 96, 104, 105, 107, 108, 118, 119, 120, 122, 130, 131, 133, 139, 141, 145, 146, 149, 150, 151, 153, 158, 161, 163, 164, 169, 175, 178, 179, 198, 200, 206, 207, 211, 212, 225, 226, 231, 235, 236, 237, 245, 246, 249, 252, 253, 255, 257, 258, 259, 264, 265, 266, 275, 1645, 1649, 1655, 1658, 1663, 1664, 1665, 1672, 1679, 1682, 1685, 1695, 1697, 1701, 1704, 1706, 1708, 1710, 1712, 1713, 1715, 1722, 1726, 1729, 1731, 1734, 1736, 1738, 1739, 1741, 1744, 1745, 1749, 1750, 1755, 1756, 1758, 1761, 1764, 1767, 1772, 1773, 1778, 1779, 1780, 1786, 1788, 1790, 1791, 1795, 1796, 1797, 1800, 1802, 1803, 1805, 1810, 1812, 1813, 1821, 1822, 1827, 1829, 1833, 1840, 1845, 1851, 1856],
98
+ "ICLR2024": [39, 247, 289, 400, 489, 742, 749] + [62, 78, 159, 161, 170, 192, 198, 215, 219, 335, 344, 386, 427, 432, 448, 451, 461, 472, 485, 536, 546, 559, 573, 577, 597] + [5, 9, 11, 19, 20, 30, 31, 32, 40, 49, 52, 53, 54, 56, 61, 66, 67, 73, 74, 77, 85, 87, 100, 104, 114, 116, 124, 130, 133, 138, 145, 151, 153, 156, 165, 166, 172, 181, 183, 187, 195, 204, 212, 221, 224, 230, 237, 243, 248, 257, 258, 259, 263, 272, 278, 287, 288, 291, 292, 298, 300, 302, 304, 306, 308, 318, 320, 321, 324, 325, 326, 327, 331, 332, 334, 336, 338, 340, 345, 349, 350, 356, 357, 358, 360] + [1, 2, 12, 14, 24, 26, 33, 35, 36, 41, 42, 44, 50, 51, 55, 57, 59, 70, 72, 75, 76, 81, 89, 90, 93,
99
+ 94, 97, 99, 101, 105, 110, 111, 112, 117, 119, 120, 125, 128, 129, 131, 134, 135, 140, 148, 150, 157, 158, 163, 167, 173, 175, 177, 182, 185, 186, 188, 189, 197, 202, 207, 209, 210, 214, 216, 226, 231, 234, 236, 238, 239, 241, 244, 245, 249, 260, 262, 264, 265, 271, 276, 277, 279, 281, 282, 284, 286, 290, 294, 295, 301, 303, 307, 309, 313, 315, 319, 322, 333, 337, 339, 342, 354, 363, 364, 369, 373, 374, 375, 377, 378, 381, 382, 385, 388, 398, 399, 401, 407, 412, 413, 415, 416, 417, 420, 421, 422, 426, 428, 436, 437, 444, 446, 449, 453, 454, 463, 464, 469, 478, 480, 487, 490, 496, 498, 501, 502, 504, 506, 513, 516, 517, 518, 520, 521, 523, 524, 525, 537, 541, 545, 551, 552, 554, 555, 558, 562, 563, 574, 575, 579, 581, 584, 588, 595, 596, 598, 607, 608, 615, 622, 624, 625, 627, 629, 630, 634, 636, 641, 645, 647, 648, 651, 652, 654, 655, 662, 667, 668, 671, 672, 673, 681, 682, 685, 689, 690, 691, 697, 698, 701]
100
+ }
docs/devdoc/design.md ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Key Design Choices
2
+ In this document, we will discuss the key concepts and design choices of ChatArena.
3
+ We expect this will be helpful particularly for developers who want to contribute to ChatArena or build their own environments.
4
+
5
+ ## Agent Environment Cycle
6
+ ChatArena in general follows the design principle of openAI gym [1] and pettingzoo [2]. Any agent will interact with the environment and other agents through the agent environment cycle.
7
+ For every single cycle,
8
+ 1. the agent observes the environment
9
+ 2. the agent output an action
10
+ 3. the environment makes a state transition given the action
11
+
12
+ As an optional feature, in each cycle, the environment can also compute a scalar reward for every single agent, along with a terminal signal for the environment.
13
+
14
+ [1] Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, Wojciech Zaremba: OpenAI Gym. CoRR abs/1606.01540 (2016)
15
+
16
+ [2] Justin K. Terry, Benjamin Black, Nathaniel Grammel, Mario Jayakumar, Ananth Hari, Ryan Sullivan, Luis S. Santos, Clemens Dieffendahl, Caroline Horsch, Rodrigo Perez-Vicente, Niall L. Williams, Yashas Lokesh, Praveen Ravi: PettingZoo: Gym for Multi-Agent Reinforcement Learning. NeurIPS 2021: 15032-15043
17
+
18
+ ### Actions
19
+
20
+ In the current version of ChatArena, all the actions are represented as plain text. More structured text outputs, like json or code, can be generated by prompting the LLM to do so.
21
+ We provide simple utilities to extract json and code (with markdown syntax), which should cover common use cases but can break for intentionally crafted edge cases.
22
+
23
+ ### Observations
24
+
25
+ A observation is a list of messages with sender and content. Then sender can be any agent in the environment or the built-in moderator of the environment. The content is again plain text.
26
+
27
+ ## Message Pool and Visibility Control
28
+
29
+ In ChatArena, agents cannot directly talk to each other but exchange information with a [message pool](https://github.com/chatarena/chatarena/blob/main/chatarena/message.py) as a proxy. The message pool is a utility abstraction that can serve as a part of the game state.
30
+
31
+ When an agent takes an action, a message can be created and appended to the message pool. In the message pool, each message will have a receiver, which can be decided by the environment dynamics (game rules) or by the agent itself. The environment itself can also create messages under the name of the moderator which can provide other state information or extra instructions given the current state.
32
+
33
+ To render an observation, the message pool will collect all the messages that are visible to the agent and return a list of these messages.
34
+
35
+ In particular, some of the environments require parallel moves, say, rock-paper-scissors, where the agent shouldn’t see the moves of other agents in the same turn. Such a mechanism is also implemented in the message pool. One can specify the “current turn” or the message of the “current turns” and turns after will be ignored.
36
+
37
+ ## Intelligence Backends
38
+
39
+ In ChatArena, each agent will usually be powered by a language backend. These backends can be LLM APIs (say, from [OpenAI](https://github.com/chatarena/chatarena/blob/main/chatarena/backends/openai.py), [Anthropic](https://github.com/chatarena/chatarena/blob/main/chatarena/backends/anthropic.py) or [Cohere](https://github.com/chatarena/chatarena/blob/main/chatarena/backends/cohere.py)), [local LLM](https://github.com/chatarena/chatarena/blob/main/chatarena/backends/hf_transformers.py) or just [humans](https://github.com/chatarena/chatarena/blob/main/chatarena/backends/human.py) behind a user interface. In [backends](https://github.com/chatarena/chatarena/tree/main/chatarena/backends), we render the observations (list of messages) into the required formats for the downstream models. And the returned text will be the agent’s action [by default](https://github.com/chatarena/chatarena/blob/55c9e6ee4e09d72905eceb0a0e09e93a4179ca39/chatarena/agent.py#L28).
docs/devdoc/mainloop.md ADDED
@@ -0,0 +1,62 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ### Step 1: Define Multiple Players with LLM Backend
2
+
3
+ ```python
4
+ from agentreview.agent import Player
5
+ from agentreview.backends import OpenAIChat
6
+
7
+ # Describe the environment (which is shared by all players)
8
+ environment_description = "It is in a university classroom ..."
9
+
10
+ # A "Professor" player
11
+ player1 = Player(name="Professor", backend=OpenAIChat(),
12
+ role_desc="You are a professor in ...",
13
+ global_prompt=environment_description)
14
+ # A "Student" player
15
+ player2 = Player(name="Student", backend=OpenAIChat(),
16
+ role_desc="You are a student who is interested in ...",
17
+ global_prompt=environment_description)
18
+ # A "Teaching Assistant" player
19
+ player3 = Player(name="Teaching assistant", backend=OpenAIChat(),
20
+ role_desc="You are a teaching assistant of the module ...",
21
+ global_prompt=environment_description)
22
+ ```
23
+
24
+ ### Step 2: Create a Language Game Environment
25
+
26
+ You can also create a language model-driven environment and add it to the ChatArena:
27
+
28
+ ```python
29
+ from agentreview.environments.conversation import Conversation
30
+
31
+ env = Conversation(player_names=[p.name for p in [player1, player2, player3]])
32
+ ```
33
+
34
+ ### Step 3: Run the Language Game using Arena
35
+
36
+ `Arena` is a utility class to help you run language games:
37
+
38
+ ```python
39
+ from agentreview.arena import Arena
40
+
41
+ arena = Arena(players=[player1, player2, player3],
42
+ environment=env, global_prompt=environment_description)
43
+ # Run the game for 10 steps
44
+ arena.run(num_steps=10)
45
+
46
+ # Alternatively, you can run your own main loop
47
+ for _ in range(10):
48
+ arena.step()
49
+ # Your code goes here ...
50
+ ```
51
+
52
+ You can easily save your gameplay history to file:
53
+
54
+ ```python
55
+ arena.save_history(path=...)
56
+ ```
57
+
58
+ and save your game config to file:
59
+
60
+ ```python
61
+ arena.save_config(path=...)
62
+ ```
docs/devdoc/moderated.md ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ### `ModeratedConversation`: a LLM-driven Environment
2
+
3
+ We support a more advanced environment called `ModeratedConversation` that allows you to **control the game dynamics
4
+ using an LLM**.
5
+ The moderator is a special player that controls the game state transition and determines when the game ends.
6
+ For example, you can define a moderator that tracks the board status of a board game and ends the game when a player
7
+ wins.
8
+ You can try out our Tic-tac-toe and Rock-paper-scissors games to get a sense of how it works:
9
+
10
+ ```python
11
+ # Tic-tac-toe example
12
+ Arena.from_config("examples/tic-tac-toe.json").launch_cli()
13
+
14
+ # Rock-paper-scissors example
15
+ Arena.from_config("examples/rock-paper-scissors.json").launch_cli()
16
+ ```
docs/tutorials/create_your_environment.md ADDED
@@ -0,0 +1,90 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # How to create your custom environments
2
+
3
+ As an example to demonstrate how to develop your own environment, we develop a language
4
+ game based on [The Chameleon](https://bigpotato.co.uk/blogs/blog/how-to-play-the-chameleon-instructions).
5
+ The example code is available [here](../../agentreview/environments/chameleon.py).
6
+
7
+ **Here are the detailed steps to develop a custom environment class**
8
+
9
+ 1. **Define the class**: Start by defining the class and inherit from a suitable base class (e.g., `Environment`). In
10
+ this case, the custom class `Chameleon` inherits from the `Environment` base class.
11
+
12
+ ```python
13
+ class Chameleon(Environment):
14
+ type_name = "chameleon"
15
+ ```
16
+
17
+ The `type_name` is required and it is used by the [`ENV_REGISTRY`](chatarena/environments/__init__.py#L13) to identify
18
+ the class when loading the class
19
+ from a config file.
20
+
21
+ Make sure you add the class to [`ALL_ENVIRONMENTS`](chatarena/environments/__init__.py#L17)
22
+ in `environments/__init__.py` so that it can be detected.
23
+
24
+ 2. **Initialize the class**: Define the `__init__` method to initialize the class attributes, such as player names, game
25
+ state, and any other necessary variables.
26
+
27
+ ```python
28
+ def __init__(self, player_names: List[str], topic_codes: Dict[str, List[str]] = None, **kwargs):
29
+ super().__init__(player_names=player_names, ..., **kwargs)
30
+ ...
31
+
32
+ # The "state" of the environment is maintained by the message pool
33
+ self.message_pool = MessagePool()
34
+ ...
35
+ ```
36
+
37
+ 3. **Implement game mechanics**: Write methods that define the game mechanics, such as giving clues, voting, and
38
+ guessing the secret word. In the `Chameleon` class, these mechanics are implemented in the `step` method.
39
+
40
+ ```python
41
+ def step(self, player_name: str, action: str) -> TimeStep:
42
+ ...
43
+ ```
44
+
45
+ 4. **Handle game states and rewards**: Implement methods to manage game states, such as resetting the environment,
46
+ getting
47
+ observations, checking if the game has reached a terminal state, and giving rewards to players.
48
+
49
+ ```python
50
+ def reset(self):
51
+ ...
52
+
53
+
54
+ def get_observation(self, player_name=None) -> List[Message]:
55
+ ...
56
+
57
+
58
+ def is_terminal(self) -> bool:
59
+ ...
60
+
61
+
62
+ def get_rewards(self, ...) -> Dict[str, float]:
63
+ ...
64
+ ```
65
+
66
+ 5. **Develop your role description prompts for the players**: Now that you have defined the game mechanics, you can
67
+ develop the role description prompts for the players. These prompts are used to guide the LLM-powered players to play
68
+ the game
69
+ correctly. You can use the CLI for this purpose. For example, you can run the following code to launch the CLI:
70
+
71
+ ```python
72
+ alice = Player(name="Alice", backend=OpenAIChat(), role_desc="Write your prompt here")
73
+ bob = Player(name="Bob", backend=OpenAIChat(), role_desc="Write your prompt here")
74
+ env = Chameleon(player_names=["Alice", "Bob"], topic_codes=...)
75
+ arena = Arena(players=[alice, bob], environment=env).launch_cli()
76
+ ```
77
+
78
+ Once you are happy with you prompts, you can save them to a config file for future use or sharing.
79
+
80
+ ```python
81
+ arena.save_config(path=...)
82
+ ```
83
+
84
+ Another option is using the Web UI. You can run the following code to launch the Web UI:
85
+
86
+ ```bash
87
+ gradio app.py
88
+ ```
89
+
90
+ and select your custom environment from the dropdown menu.
notebooks/barplot_similarity_between_review_metareview.ipynb ADDED
The diff for this file is too large to render. See raw diff
 
notebooks/demo.ipynb ADDED
The diff for this file is too large to render. See raw diff
 
notebooks/histplots.ipynb ADDED
The diff for this file is too large to render. See raw diff
 
notebooks/lineplots.ipynb ADDED
The diff for this file is too large to render. See raw diff
 
requirements.txt ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ colorama
3
+ llama_index
4
+ matplotlib
5
+ numpy
6
+ openreview_py
7
+ pandas
8
+ prompt_toolkit
9
+ requests
10
+ rich
11
+ setuptools
12
+ tenacity
13
+ tiktoken
14
+ tqdm
15
+ transformers
16
+ tenacity
17
+ openai
18
+ gradio
19
+
review_content_analysis/analysis.py ADDED
@@ -0,0 +1,477 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright (c) Microsoft Corporation.
2
+ # Licensed under the MIT License.
3
+
4
+ import os
5
+ import re
6
+ import time
7
+ import json
8
+ import matplotlib.pyplot as plt
9
+ from openai import OpenAI
10
+ import multiprocessing
11
+
12
+ FONT_SIZE = 20
13
+
14
+ COLORS = ['#26547c', '#06d6a0', '#ef476f', '#ffd166']
15
+
16
+ openai_api_key = os.getenv("OPENAI_KEY")
17
+ # print(openai.api_key)
18
+
19
+ base_dir = '/home/v-qinlinzhao/agent4reviews/simulated_review/reviews'
20
+ save_base_dir = '/home/v-qinlinzhao/agent4reviews/simulated_review/classified_reason/'
21
+
22
+ with open('iter_prompt.txt', 'r') as f:
23
+ iter_prompt = f.read()
24
+
25
+ with open('classification_prompt.txt', 'r') as f:
26
+ classification_prompt = f.read()
27
+
28
+ with open('reason_library.txt', 'r') as f:
29
+ reason_library = f.read()
30
+
31
+ def get_gpt_response(prompt):
32
+ client = OpenAI(api_key=openai_api_key)
33
+ messages = [{'role': 'user', 'content': prompt}]
34
+ completion = client.chat.completions.create(
35
+ model="gpt-4-1106-preview",
36
+ messages=messages,
37
+ temperature=0.7,
38
+ max_tokens=2000,
39
+ )
40
+
41
+ response = completion.choices[0].message.content
42
+ response = response.strip()
43
+
44
+ # time.sleep(5)
45
+ return response
46
+
47
+ def extract_review_from_real_data():
48
+ base_dir = '/home/v-qinlinzhao/agent4reviews/real_review/original_data'
49
+ result_dir = '/home/v-qinlinzhao/agent4reviews/real_review/extracted_real_review/'
50
+ # 目录为 ICLR202X/notes/xxx.json
51
+ # 将其中所有的json文件的review提取处理
52
+ for root, dirs, files in os.walk(base_dir):
53
+ for file in files:
54
+ if file.endswith('.json'):
55
+ with open(os.path.join(root, file), 'r') as f:
56
+ data = json.load(f)
57
+ reviews = []
58
+ data = data['details']['replies']
59
+ id = []
60
+ for d in data:
61
+ if d['id'] not in id:
62
+ id.append(d['id'])
63
+ # 2020-2021
64
+ if 'content' in d and 'review' in d['content']:
65
+ reviews.append(d['content']['review'])
66
+ # 2022
67
+ if 'content' in d and 'main_review' in d['content']:
68
+ reviews.append(d['content']['main_review'])
69
+ # 2023
70
+ if 'content' in d and 'strength_and_weaknesses' in d['content']:
71
+ reviews.append(d['content']['strength_and_weaknesses'])
72
+
73
+ # 将每个review分别存入到json文件中,命名格式为 {当前文件名}_{序号}.json
74
+ # 同时保持每个文件在原目录下相对路径
75
+ relative_dir = os.path.relpath(root, base_dir)
76
+ result_file_dir = os.path.join(result_dir, relative_dir)
77
+ os.makedirs(result_file_dir, exist_ok=True)
78
+
79
+ file_base_name = os.path.splitext(file)[0]
80
+
81
+ for i, review in enumerate(reviews):
82
+ result_file_name = f"{file_base_name}_{i}.json"
83
+ result_file_path = os.path.join(result_file_dir, result_file_name)
84
+
85
+ with open(result_file_path, 'w') as result_file:
86
+ json.dump({"review": review}, result_file, ensure_ascii=False, indent=4)
87
+
88
+ def extract_meta_review_from_simulated_data():
89
+ base_dir = '/home/v-qinlinzhao/agent4reviews/simulated_review/full_paper_discussion'
90
+ result_dir = '/home/v-qinlinzhao/agent4reviews/simulated_review/meta_review/'
91
+ # 目录为 ICLR202X/notes/xxx.json
92
+ # 将其中所有的json文件的review提取处理
93
+ for root, dirs, files in os.walk(base_dir):
94
+ for file in files:
95
+ if file.endswith('.json'):
96
+ with open(os.path.join(root, file), 'r') as f:
97
+ data = json.load(f)
98
+ # review在data['messages']中最后一个元素中的"content"中
99
+ review = data['messages'][-1]['content']
100
+ # write review into file, keep the abstract path
101
+ relative_dir = os.path.relpath(root, base_dir)
102
+ result_file_dir = os.path.join(result_dir, relative_dir)
103
+ os.makedirs(result_file_dir, exist_ok=True)
104
+ result_file_path = os.path.join(result_file_dir, file)
105
+ with open(result_file_path, 'w') as result_file:
106
+ json.dump({"meta_review": review}, result_file, ensure_ascii=False, indent=4)
107
+
108
+ # Select 1% of the data randomly, let GPT-4 summarize the reasons, and add them to the reason library if there are reasons that do not exist
109
+ def construct_reason_library():
110
+
111
+ base_dir = '/home/v-qinlinzhao/agent4reviews/paper_review_and_rebuttal/selected_files/'
112
+
113
+ json_files = []
114
+ for root, dirs, files in os.walk(base_dir):
115
+ for file in files:
116
+ if file.endswith('.json'):
117
+ json_files.append(os.path.join(root, file))
118
+
119
+ for file in json_files:
120
+ with open(file, 'r') as f:
121
+ data = json.load(f)
122
+ review = data['review']
123
+ prompt = iter_prompt.format(review=review,
124
+ reason_library=reason_library)
125
+ ans = get_gpt_response(prompt)
126
+ print(ans)
127
+
128
+ def analyze_reason_in_batch(json_files):
129
+
130
+ for file in json_files:
131
+ with open(file, 'r') as f:
132
+ data = json.load(f)
133
+ review = data['review']
134
+ prompt = classification_prompt.format(review=review)
135
+ res = get_gpt_response(prompt)
136
+
137
+ # 解析res的输出,将accept和reject的原因分别提取出来,写成json格式
138
+ # 依据该字符串分别抽取Accept和Reject的原因
139
+ reason_dict = {}
140
+ if 'Reject' in res:
141
+ accept_reason = re.search(r"Accept: (.+?);", res)
142
+ else:
143
+ accept_reason = re.search(r"Accept: (.+)", res)
144
+
145
+ reject_reason = re.search(r"Reject: (.+)", res)
146
+ # print(reject_reason)
147
+ if accept_reason:
148
+ accept_reason = accept_reason.group(1).split(',')
149
+ reason_dict['accept'] = []
150
+ for r in accept_reason:
151
+ r = r.strip()
152
+ if r in ['1', '2', '3', '4', '5']:
153
+ reason_dict['accept'].append(r)
154
+ if reject_reason:
155
+ reject_reason = reject_reason.group(1).split(',')
156
+ reason_dict['reject'] = []
157
+ for r in reject_reason:
158
+ r = r.strip()
159
+ if r in ['1', '2', '3', '4', '5', '6', '7']:
160
+ reason_dict['reject'].append(r)
161
+
162
+ # print(res)
163
+ relative_path = os.path.relpath(file, base_dir)
164
+ save_path = os.path.join(save_base_dir, relative_path)
165
+ save_dir = os.path.dirname(save_path)
166
+
167
+ # 首先找到原来目录的目录结构,然后在save_dir中按照该目录保存结果保存结果
168
+ if not os.path.exists(save_dir):
169
+ os.makedirs(save_dir)
170
+ with open(save_path, 'w') as f:
171
+ json.dump(reason_dict, f, indent=4)
172
+
173
+ def convert_txt_to_json():
174
+ base_dir = '/home/v-qinlinzhao/agent4reviews/simulated_review/classified_meta_review_reason'
175
+ reason_count = {}
176
+ reason_total_count = {'accept': {}, 'reject': {}}
177
+
178
+ def process_directory(path, reason_dict):
179
+ # 迭代path下的内容
180
+ for item in os.listdir(path):
181
+ item_path = os.path.join(path, item)
182
+ if os.path.isdir(item_path):
183
+ # 如果是目录,递归处理
184
+ reason_dict[item] = {}
185
+ process_directory(item_path, reason_dict[item])
186
+ elif item.endswith('.txt'):
187
+ # 去除txt后缀
188
+ item_name = item.replace('.txt', '')
189
+ reason_dict[item_name] = {'accept': {}, 'reject': {}}
190
+ # 如果是txt文件,处理文件内容
191
+ with open(item_path, 'r') as f:
192
+ content = f.read()
193
+ # "Accept: 1,2,3; Reject: 3,4,7"
194
+ # 依据该字符串分别抽取Accept和Reject的原因
195
+ if 'Reject' in content:
196
+ accept_reason = re.search(r"Accept: (.+?);", content)
197
+ else:
198
+ accept_reason = re.search(r"Accept: (.+)", content)
199
+ reject_reason = re.search(r"Reject: (.+)", content)
200
+ # print(reject_reason)
201
+ if accept_reason:
202
+ accept_reason = accept_reason.group(1).split(',')
203
+ reason_dict[item_name]['accept'] = []
204
+ for r in accept_reason:
205
+ r = r.strip()
206
+ if r in ['1', '2', '3', '4', '5']:
207
+ if r not in reason_total_count['accept']:
208
+ reason_total_count['accept'][r] = 0
209
+ reason_total_count['accept'][r] += 1
210
+ reason_dict[item_name]['accept'].append(r)
211
+ if reject_reason:
212
+ reject_reason = reject_reason.group(1).split(',')
213
+ reason_dict[item_name]['reject'] = []
214
+ for r in reject_reason:
215
+ r = r.strip()
216
+ if r in ['1', '2', '3', '4', '5', '6', '7']:
217
+ if r not in reason_total_count['reject']:
218
+ reason_total_count['reject'][r] = 0
219
+ reason_total_count['reject'][r] += 1
220
+ reason_dict[item_name]['reject'].append(r)
221
+
222
+ process_directory(base_dir, reason_count)
223
+
224
+ # 将统计结果写入文件
225
+ with open('reason.json', 'w') as f:
226
+ json.dump(reason_count, f, indent=4)
227
+
228
+ # 计算accept 和 reject中每一类原因的占比
229
+ # reason_percentage = {'accept': {}, 'reject': {}}
230
+ # for key, value in reason_total_count.items():
231
+ # total = sum(value.values())
232
+ # for k, v in value.items():
233
+ # reason_percentage[key][k] = v / total
234
+
235
+ # with open('reason_count.json', 'w') as f:
236
+ # json.dump(reason_total_count, f, indent=4)
237
+
238
+ # with open('reason_percentage.json', 'w') as f:
239
+ # json.dump(reason_percentage, f, indent=4)
240
+
241
+ def count_reasons():
242
+ with open('../reason_result/reason.json', 'r') as f:
243
+ reason_count = json.load(f)
244
+
245
+ count = {}
246
+ for year, year_dict in reason_count.items():
247
+ count[year] = {}
248
+ for model, model_dict in year_dict.items():
249
+ count[year][model] = {}
250
+ for type, type_dict in model_dict.items():
251
+ count[year][model][type] = {}
252
+ count[year][model][type]['accept'] = {}
253
+ count[year][model][type]['reject'] = {}
254
+ # 只在type层面做统计就好了
255
+ for paper_id, paper_id_dict in type_dict.items():
256
+ for review_id, review_id_dict in paper_id_dict.items():
257
+ print(year, model, type, paper_id, review_id, review_id_dict)
258
+ # {'accept': {'1': 1, '2': 1, '5': 1}, 'reject': {'3': 1, '4': 1, '5': 1, '7': 1}}
259
+ if 'accept' in review_id_dict:
260
+ for accept_reason in review_id_dict['accept']:
261
+ if accept_reason not in count[year][model][type]['accept'] \
262
+ and accept_reason in ['1', '2', '3', '4', '5']:
263
+ count[year][model][type]['accept'][accept_reason] = 0
264
+ count[year][model][type]['accept'][accept_reason] += 1
265
+ if 'reject' in review_id_dict:
266
+ for reject_reason in review_id_dict['reject']:
267
+ if reject_reason not in count[year][model][type]['reject'] \
268
+ and reject_reason in ['1', '2', '3', '4', '5', '6', '7']:
269
+ count[year][model][type]['reject'][reject_reason] = 0
270
+ count[year][model][type]['reject'][reject_reason] += 1
271
+
272
+ with open('reason_count.json', 'w') as f:
273
+ json.dump(count, f, indent=4)
274
+
275
+ def calcu_reason_percentage_every_year():
276
+ with open('../reason_result/reason_count.json', 'r') as f:
277
+ reason_count = json.load(f)
278
+
279
+ distribution = {}
280
+ for year, year_dict in reason_count.items():
281
+ distribution[year] = {}
282
+ for model, model_dict in year_dict.items():
283
+ distribution[year][model] = {}
284
+ for type, type_dict in model_dict.items():
285
+ distribution[year][model][type] = {}
286
+ distribution[year][model][type]['accept'] = {}
287
+ distribution[year][model][type]['reject'] = {}
288
+ # 统计百分比,先将accept下面的count加起来,然后得到每个百分比
289
+ accept_sum = sum(type_dict['accept'].values())
290
+ for reason, count in type_dict['accept'].items():
291
+ distribution[year][model][type]['accept'][reason] = count / accept_sum
292
+ reject_sum = sum(type_dict['reject'].values())
293
+ for reason, count in type_dict['reject'].items():
294
+ distribution[year][model][type]['reject'][reason] = count / reject_sum
295
+
296
+ with open('reason_percentage.json', 'w') as f:
297
+ json.dump(distribution, f, indent=4)
298
+
299
+ def calcu_reason_percentage():
300
+ # 以每种类别为单位,计算每种类别下的accept和reject的百分比
301
+ with open('../reason_result/reason_count.json', 'r') as f:
302
+ reason_count = json.load(f)
303
+
304
+ count_dict = {}
305
+ for year, year_dict in reason_count.items():
306
+ for model, model_dict in year_dict.items():
307
+ for type, type_dict in model_dict.items():
308
+ count_dict[type] = {'accept': {}, 'reject': {}}
309
+ # 得到所有year和model的accept和reject的count
310
+ accept_count = type_dict['accept']
311
+ reject_count = type_dict['reject']
312
+ # 将accept中每一类原因进行累加
313
+ for reason, count in accept_count.items():
314
+ if reason not in count_dict[type]['accept']:
315
+ count_dict[type]['accept'][reason] = 0
316
+ count_dict[type]['accept'][reason] += count
317
+ for reason, count in reject_count.items():
318
+ if reason not in count_dict[type]['reject']:
319
+ count_dict[type]['reject'][reason] = 0
320
+ count_dict[type]['reject'][reason] += count
321
+ # 计算count_dict中accept和reject其中原因的百分比
322
+ reason_percentage = {}
323
+ for type, type_dict in count_dict.items():
324
+ reason_percentage[type] = {'accept': {}, 'reject': {}}
325
+ accept_sum = sum(type_dict['accept'].values())
326
+ for reason, count in type_dict['accept'].items():
327
+ reason_percentage[type]['accept'][reason] = count / accept_sum
328
+ reject_sum = sum(type_dict['reject'].values())
329
+ for reason, count in type_dict['reject'].items():
330
+ reason_percentage[type]['reject'][reason] = count / reject_sum
331
+
332
+ with open('reason_percentage.json', 'w') as f:
333
+ json.dump(reason_percentage, f, indent=4)
334
+
335
+ def draw_bar_chart(accept_or_reject, ax, type, name1, name2):
336
+ # accept_or_reject = 'accept'
337
+
338
+ x = {
339
+ "accept": ['Novelty', 'Significance', 'Theoretical', 'Clarity', 'Future'],
340
+ "reject": ['Novelty', 'Theoretical', 'Validation', 'Practicality', 'Limitations', 'Presentation', 'Related Work']
341
+ }
342
+ x_range = range(1, len(x[accept_or_reject])+1)
343
+
344
+ # 画出每一年的type1 和 type2两种type的比例图
345
+ with open('../reason_result/reason_percentage.json', 'r') as f:
346
+ reason_percentage = json.load(f)
347
+
348
+ # 取出其中的type1和type2两种type
349
+ type1 = reason_percentage[name1][accept_or_reject]
350
+ type2 = reason_percentage[name2][accept_or_reject]
351
+
352
+ # 按照key排序
353
+ type1 = dict(sorted(type1.items(), key=lambda x: int(x[0])))
354
+ type2 = dict(sorted(type2.items(), key=lambda x: int(x[0])))
355
+ # dict中key应该是1-7,如果有的Key没有,就加上这个key,value设置为0
356
+ for i in x_range:
357
+ if str(i) not in type1:
358
+ type1[str(i)] = 0
359
+ if str(i) not in type2:
360
+ type2[str(i)] = 0
361
+
362
+ width = 0.35 # 柱子的宽度
363
+
364
+ # fig, ax = plt.subplots()
365
+ ax.bar([i - width/2 for i in x_range], type1.values(), width, label=name1, color=COLORS[0], alpha=0.3)
366
+ ax.bar([i + width/2 for i in x_range], type2.values(), width, label=name2, color=COLORS[1], alpha=0.3)
367
+
368
+ ax.legend()
369
+ ax.set_xlabel('Reason', fontsize=FONT_SIZE)
370
+ # ax.set_ylabel('Percentage', fontsize=FONT_SIZE)
371
+ ax.set_title(type, fontsize=FONT_SIZE)
372
+ ax.set_xticks(x_range) # 设置x轴刻度为整数
373
+ ax.set_xticklabels(x[accept_or_reject], rotation=30)
374
+
375
+ # plt.savefig(f'reason_distribution_{type}.png')
376
+ # plt.close()
377
+
378
+ def draw_bar_chart_baseline(ax, baseline_or_ground, accept_or_reject):
379
+ # if baseline_or_ground == 'Baseline':
380
+ # with open('../simulated_review/reason_result/reason_percentage.json', 'r') as f:
381
+ # reason_percentage = json.load(f)
382
+ # type_data = reason_percentage['BASELINE'][accept_or_reject]
383
+ # elif baseline_or_ground == 'Ground Truth':
384
+ with open('reason_percentage.json', 'r') as f:
385
+ reason_percentage = json.load(f)
386
+ type_data = reason_percentage[baseline_or_ground][accept_or_reject]
387
+
388
+
389
+ x = {
390
+ "accept": ['Novelty', 'Significance', 'Theoretical', 'Clarity', 'Future'],
391
+ "reject": ['Novelty', 'Theoretical', 'Validation', 'Practicality', 'Limitations', 'Presentation', 'Related Work']
392
+ }
393
+ x_range = range(1, len(x[accept_or_reject])+1)
394
+
395
+ # 按照key排序
396
+ type_data = dict(sorted(type_data.items(), key=lambda x: int(x[0])))
397
+
398
+ # dict中key应该是1-7,如果有的Key没有,就加上这个key,value设置为0
399
+ for i in x_range:
400
+ if str(i) not in type_data:
401
+ type_data[str(i)] = 0
402
+
403
+ # 画图,将单一类型画到图上,选取颜色,设置透明度
404
+ width = 0.35 # 柱子的宽度
405
+
406
+ # fig, ax = plt.subplots()
407
+ ax.bar(x_range, type_data.values(), width, label=accept_or_reject, color=COLORS[0], alpha=0.7)
408
+
409
+ ax.legend()
410
+ ax.set_xlabel('Reason', fontsize=FONT_SIZE)
411
+ # ax.set_ylabel('Percentage', fontsize=FONT_SIZE)
412
+ ax.set_title(baseline_or_ground, fontsize=FONT_SIZE)
413
+ ax.set_xticks(x_range) # 设置x轴刻度为整数
414
+ ax.set_xticklabels(x[accept_or_reject], rotation=30)
415
+
416
+ # plt.savefig(f'{baseline_or_ground}_{accept_or_reject}_reason_distribution.pdf')
417
+ # plt.close()
418
+
419
+ def draw_reason_distribution(accept_or_reject):
420
+ type2name = {'accept': 'Acceptance', 'reject': 'Rejection'}
421
+
422
+ fig, axs = plt.subplots(1, 3, figsize=(15, 5))
423
+ fig.suptitle(f'Distribution of {type2name[accept_or_reject]} Reasons', fontsize=FONT_SIZE)
424
+
425
+ # authoritarian_ACx1 inclusive_ACx1 conformist_ACx1
426
+ draw_bar_chart_baseline(axs[0], 'authoritarian_ACx1', accept_or_reject)
427
+ draw_bar_chart_baseline(axs[1], 'inclusive_ACx1', accept_or_reject)
428
+ draw_bar_chart_baseline(axs[2], 'conformist_ACx1', accept_or_reject)
429
+
430
+ # draw_bar_chart_baseline(axs[0], 'Baseline', accept_or_reject)
431
+ # draw_bar_chart_baseline(axs[1], 'Ground Truth', accept_or_reject)
432
+
433
+ # for i, (key, value) in enumerate(types.items()):
434
+ # if i == 3:
435
+ # break
436
+ # draw_bar_chart(accept_or_reject, axs[i], key, value[0], value[1])
437
+
438
+ axs[0].set_ylabel('Percentage', fontsize=FONT_SIZE)
439
+
440
+ plt.tight_layout()
441
+ plt.savefig(f'reason_distribution_AC_{accept_or_reject}.pdf')
442
+ plt.close()
443
+
444
+
445
+ if __name__ == "__main__":
446
+ # analysis_pipeline()
447
+ # convert_txt_to_json()
448
+ draw_reason_distribution('reject')
449
+
450
+
451
+ # if __name__ == "__main__":
452
+ # # get current path
453
+ # # print(os.getcwd())
454
+ # print("Start analysis...")
455
+
456
+ # json_files = []
457
+ # for root, dirs, files in os.walk(base_dir):
458
+ # for file in files:
459
+ # if file.endswith('.json'):
460
+ # json_files.append(os.path.join(root, file))
461
+
462
+ # # json_files = [f for f in json_files]
463
+ # # print(json_files)
464
+
465
+ # # 将其平均分为6份,每份分配给一个进程
466
+ # n = len(json_files)
467
+ # n_per_process = n // 6
468
+ # processes = []
469
+ # for i in range(6):
470
+ # start = i * n_per_process
471
+ # end = (i + 1) * n_per_process
472
+ # if i == 5:
473
+ # end = n
474
+ # p = multiprocessing.Process(target=analyze_reason_in_batch, args=(json_files[start:end], ))
475
+ # processes.append(p)
476
+ # p.start()
477
+
review_content_analysis/classification_prompt.txt ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ You are outstanding data analysts. Now you need to analyze the reason of acceptance and rejection. Next is a review for a paper:
2
+ {review}
3
+
4
+ Here are some common reasons, please determine which of the following reasons appear in the review.
5
+
6
+ Reasons for Acceptance
7
+ 1. Novelty and Innovation
8
+ - Introduces a new framework, method, or approach.
9
+ - Provides a unique perspective or solution to a problem.
10
+ - Advances the state-of-the-art in the field.
11
+ 2. Significance
12
+ - Addresses a relevant and important problem.
13
+ - Has potential practical applications or implications.
14
+ - Offers significant improvements over existing methods.
15
+ 3. Theoretical and Experimental Rigor
16
+ - Well-grounded in solid theoretical concepts.
17
+ - Provides thorough experimental validation.
18
+ - Includes comparisons with several baselines and ablations.
19
+ 4. Clarity and Motivation
20
+ - Clearly formulates the problem and solution.
21
+ - Motivates the approach with strong reasoning.
22
+ - Presents results that convincingly demonstrate effectiveness.
23
+ 5. Potential for Further Research
24
+ - Opens up new avenues for research.
25
+ - Can inspire future work in the field.
26
+
27
+ Reasons for Rejection
28
+ 1. Lack of Novelty
29
+ - Does not offer a new contribution.
30
+ - Similar to existing work without significant improvements.
31
+ - Fails to differentiate from established methods.
32
+ 2. Insufficient Theoretical Foundation
33
+ - Lacks theoretical analysis or grounding.
34
+ - No proofs or discussions on convergence and stability.
35
+ - Unclear theoretical implications of the method.
36
+ 3. Inadequate Experimental Validation
37
+ - Limited or unconvincing experimental results.
38
+ - Lacks comparisons with strong baselines or state-of-the-art methods.
39
+ - Uses environments that do not capture real-world complexities.
40
+ 4. Scalability and Practicality Issues
41
+ - Does not address computational complexity or scalability.
42
+ - Unclear how the method performs with large or high-dimensional action spaces.
43
+ - Potential practical limitations not discussed.
44
+ 5. Insufficient Discussion of Limitations
45
+ - Does not explore potential drawbacks or failure modes.
46
+ - Lacks discussion on when the method may not perform well.
47
+ - No investigation of the impact of key parameters.
48
+ 6. Clarity and Presentation Issues
49
+ - Poorly articulated problem and solution.
50
+ - Dense or hard-to-follow sections.
51
+ - Missing or unclear figures and tables.
52
+ 7. Lack of Related Work Comparison
53
+ - Does not adequately compare with related work.
54
+ - Fails to position contributions within the broader context.
55
+ - Lacks comprehensive discussion on how it advances the field.
56
+
57
+ Only output the final reason list, for example:
58
+ "Accept: 1,3,5; Reject: 2,4,7"