Heyter17 commited on
Commit
363e513
·
verified ·
1 Parent(s): ee30f09

Upload folder using huggingface_hub

Browse files
.gradio/certificate.pem ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ -----BEGIN CERTIFICATE-----
2
+ MIIFazCCA1OgAwIBAgIRAIIQz7DSQONZRGPgu2OCiwAwDQYJKoZIhvcNAQELBQAw
3
+ TzELMAkGA1UEBhMCVVMxKTAnBgNVBAoTIEludGVybmV0IFNlY3VyaXR5IFJlc2Vh
4
+ cmNoIEdyb3VwMRUwEwYDVQQDEwxJU1JHIFJvb3QgWDEwHhcNMTUwNjA0MTEwNDM4
5
+ WhcNMzUwNjA0MTEwNDM4WjBPMQswCQYDVQQGEwJVUzEpMCcGA1UEChMgSW50ZXJu
6
+ ZXQgU2VjdXJpdHkgUmVzZWFyY2ggR3JvdXAxFTATBgNVBAMTDElTUkcgUm9vdCBY
7
+ MTCCAiIwDQYJKoZIhvcNAQEBBQADggIPADCCAgoCggIBAK3oJHP0FDfzm54rVygc
8
+ h77ct984kIxuPOZXoHj3dcKi/vVqbvYATyjb3miGbESTtrFj/RQSa78f0uoxmyF+
9
+ 0TM8ukj13Xnfs7j/EvEhmkvBioZxaUpmZmyPfjxwv60pIgbz5MDmgK7iS4+3mX6U
10
+ A5/TR5d8mUgjU+g4rk8Kb4Mu0UlXjIB0ttov0DiNewNwIRt18jA8+o+u3dpjq+sW
11
+ T8KOEUt+zwvo/7V3LvSye0rgTBIlDHCNAymg4VMk7BPZ7hm/ELNKjD+Jo2FR3qyH
12
+ B5T0Y3HsLuJvW5iB4YlcNHlsdu87kGJ55tukmi8mxdAQ4Q7e2RCOFvu396j3x+UC
13
+ B5iPNgiV5+I3lg02dZ77DnKxHZu8A/lJBdiB3QW0KtZB6awBdpUKD9jf1b0SHzUv
14
+ KBds0pjBqAlkd25HN7rOrFleaJ1/ctaJxQZBKT5ZPt0m9STJEadao0xAH0ahmbWn
15
+ OlFuhjuefXKnEgV4We0+UXgVCwOPjdAvBbI+e0ocS3MFEvzG6uBQE3xDk3SzynTn
16
+ jh8BCNAw1FtxNrQHusEwMFxIt4I7mKZ9YIqioymCzLq9gwQbooMDQaHWBfEbwrbw
17
+ qHyGO0aoSCqI3Haadr8faqU9GY/rOPNk3sgrDQoo//fb4hVC1CLQJ13hef4Y53CI
18
+ rU7m2Ys6xt0nUW7/vGT1M0NPAgMBAAGjQjBAMA4GA1UdDwEB/wQEAwIBBjAPBgNV
19
+ HRMBAf8EBTADAQH/MB0GA1UdDgQWBBR5tFnme7bl5AFzgAiIyBpY9umbbjANBgkq
20
+ hkiG9w0BAQsFAAOCAgEAVR9YqbyyqFDQDLHYGmkgJykIrGF1XIpu+ILlaS/V9lZL
21
+ ubhzEFnTIZd+50xx+7LSYK05qAvqFyFWhfFQDlnrzuBZ6brJFe+GnY+EgPbk6ZGQ
22
+ 3BebYhtF8GaV0nxvwuo77x/Py9auJ/GpsMiu/X1+mvoiBOv/2X/qkSsisRcOj/KK
23
+ NFtY2PwByVS5uCbMiogziUwthDyC3+6WVwW6LLv3xLfHTjuCvjHIInNzktHCgKQ5
24
+ ORAzI4JMPJ+GslWYHb4phowim57iaztXOoJwTdwJx4nLCgdNbOhdjsnvzqvHu7Ur
25
+ TkXWStAmzOVyyghqpZXjFaH3pO3JLF+l+/+sKAIuvtd7u+Nxe5AW0wdeRlN8NwdC
26
+ jNPElpzVmbUq4JUagEiuTDkHzsxHpFKVK7q4+63SM1N95R1NbdWhscdCb+ZAJzVc
27
+ oyi3B43njTOQ5yOf+1CceWxG1bQVs5ZufpsMljq4Ui0/1lvh+wjChP4kqKOJ2qxq
28
+ 4RgqsahDYVvTH9w7jXbyLeiNdd8XM2w9U/t7y0Ff/9yi0GE44Za4rF2LN9d11TPA
29
+ mRGunUHBcnWEvgJBQl9nJEiU0Zsnvgc/ubhPgXRR4Xq37Z0j4r7g1SgEEzwxA57d
30
+ emyPxgcYxn/eR44/KJ4EBs+lVDR3veyJm+kXQ99b21/+jh5Xos1AnX5iItreGCc=
31
+ -----END CERTIFICATE-----
LICENSE ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ MIT License
2
+
3
+ Copyright (c) 2023 Magnetic2014
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
README.md CHANGED
@@ -1,12 +1,56 @@
1
- ---
2
- title: Demo
3
- emoji: 🐢
4
- colorFrom: gray
5
- colorTo: yellow
6
- sdk: gradio
7
- sdk_version: 5.5.0
8
- app_file: app.py
9
- pinned: false
10
- ---
11
-
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: demo
3
+ app_file: chatbot.py
4
+ sdk: gradio
5
+ sdk_version: 5.5.0
6
+ ---
7
+ # OpenCodeInterpreter Demo
8
+
9
+ Based on our powerful OpenCodeInterpreter models, this project allows LLM to generate code, execute it, receive feedback, debug, and answer questions based on the whole process. It is designed to be intuitive and versatile, capable of dealing with multiple languages and frameworks.
10
+
11
+
12
+ ## Disclaimer
13
+
14
+ This demo is designed to leverage large language models for generating code, which is then executed in a Jupyter environment. Before you begin using this project, it is important that you read and understand the following disclaimer:
15
+
16
+ - **Academic Nature and Security Risks:** This project is developed for academic purposes only and is not designed to be fully secure against all forms of code attacks. While we strive to maintain a safe environment, we cannot guarantee the security of your data during use. We urge all users to refrain from executing malicious code intentionally. By choosing to use this project, you acknowledge the potential risks to your data and agree to proceed with caution.
17
+
18
+ - **Model Compatibility Notice:** Please be advised that our demo is only guaranteed to be compatible with the `opencodeinterpreter` model. We cannot ensure that using other models will achieve the expected output or performance. Users attempting to substitute or use models other than the officially recommended ones do so at their own risk, and may encounter issues with performance mismatches or other related risks. We encourage users to fully understand the potential impacts before making any such modifications.
19
+
20
+ - **User Responsibility:** Users are responsible for the code they generate and execute using this project. We strongly advise against running any code without a thorough understanding of its function and potential impact. Users should take precautions to protect their own data and the integrity of their systems.
21
+
22
+ - **Limitation of Liability:** The creators and maintainers of this project will not be liable for any damages, data loss, or security breaches that may occur from using this service. Users assume all responsibility and risk associated with their use of the project.
23
+
24
+ - **Changes to the Disclaimer:** This disclaimer is subject to change at any time. We will make efforts to communicate any changes through the project's official channels, but it is the responsibility of the users to review this disclaimer periodically to ensure they are aware of any updates.
25
+
26
+ By using this demo, you acknowledge that you have read this disclaimer, understand its terms, and agree to be bound by them.
27
+
28
+
29
+ ## Features
30
+
31
+ - **Multi-user support**
32
+
33
+ - **Save your conversations to both Huggingface datasets and offline json files**
34
+
35
+ ## License
36
+
37
+ Distributed under the MIT License. See `LICENSE` for more information.
38
+
39
+ ## Acknowledgement
40
+
41
+ This project is based on [Llama2-Code-Interpreter](https://github.com/SeungyounShin/Llama2-Code-Interpreter).
42
+
43
+ ---
44
+
45
+ ## Citation
46
+
47
+ If you find this demo useful for your research, please kindly cite our paper:
48
+
49
+ ```
50
+ @article{zheng2024opencodeinterpreter,
51
+ title={OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement},
52
+ author={Zheng, Tianyu and Zhang, Ge and Shen, Tianhao and Liu, Xueling and Lin, Bill Yuchen and Fu, Jie and Chen, Wenhu and Yue, Xiang},
53
+ journal={arXiv preprint arXiv:2402.14658},
54
+ year={2024}
55
+ }
56
+ ```
assets/assistant.pic.jpg ADDED
assets/user.pic.jpg ADDED
chatbot.py ADDED
@@ -0,0 +1,316 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import ast
2
+ import gradio as gr
3
+ import os
4
+ import re
5
+ import json
6
+ import logging
7
+
8
+ import torch
9
+ from datetime import datetime
10
+
11
+ from threading import Thread
12
+ from typing import Optional
13
+ from transformers import TextIteratorStreamer
14
+ from functools import partial
15
+ from huggingface_hub import CommitScheduler
16
+ from uuid import uuid4
17
+ from pathlib import Path
18
+
19
+ from code_interpreter.JupyterClient import JupyterNotebook
20
+
21
+ MAX_INPUT_TOKEN_LENGTH = int(os.getenv("MAX_INPUT_TOKEN_LENGTH", "4096"))
22
+
23
+ import warnings
24
+
25
+ warnings.filterwarnings("ignore", category=UserWarning, module="transformers")
26
+ os.environ["TF_CPP_MIN_LOG_LEVEL"] = "2"
27
+
28
+
29
+ from code_interpreter.OpenCodeInterpreter import OpenCodeInterpreter
30
+
31
+ JSON_DATASET_DIR = Path("json_dataset")
32
+ JSON_DATASET_DIR.mkdir(parents=True, exist_ok=True)
33
+
34
+ scheduler = CommitScheduler(
35
+ repo_id="opencodeinterpreter_user_data",
36
+ repo_type="dataset",
37
+ folder_path=JSON_DATASET_DIR,
38
+ path_in_repo="data",
39
+ private=True
40
+ )
41
+
42
+ logging.basicConfig(level=logging.INFO)
43
+
44
+ class StreamingOpenCodeInterpreter(OpenCodeInterpreter):
45
+ streamer: Optional[TextIteratorStreamer] = None
46
+
47
+ # overwirte generate function
48
+ @torch.inference_mode()
49
+ def generate(
50
+ self,
51
+ prompt: str = "",
52
+ max_new_tokens = 1024,
53
+ do_sample: bool = False,
54
+ top_p: float = 0.95,
55
+ top_k: int = 50,
56
+ ) -> str:
57
+ # Get the model and tokenizer, and tokenize the user text.
58
+
59
+ self.streamer = TextIteratorStreamer(
60
+ self.tokenizer, skip_prompt=True, Timeout=5
61
+ )
62
+
63
+ inputs = self.tokenizer([prompt], return_tensors="pt", truncation=True, max_length=MAX_INPUT_TOKEN_LENGTH)
64
+ inputs = inputs.to(self.model.device)
65
+
66
+ kwargs = dict(
67
+ **inputs,
68
+ streamer=self.streamer,
69
+ max_new_tokens=max_new_tokens,
70
+ do_sample=do_sample,
71
+ top_p=top_p,
72
+ top_k=top_k,
73
+ eos_token_id=self.tokenizer.eos_token_id
74
+ )
75
+
76
+ thread = Thread(target=self.model.generate, kwargs=kwargs)
77
+ thread.start()
78
+
79
+ return ""
80
+
81
+ def save_json(dialog, mode, json_file_path, dialog_id) -> None:
82
+ with scheduler.lock:
83
+ with json_file_path.open("a") as f:
84
+ json.dump({"id": dialog_id, "dialog": dialog, "mode": mode, "datetime": datetime.now().isoformat()}, f, ensure_ascii=False)
85
+ f.write("\n")
86
+
87
+ def convert_history(gradio_history: list[list], interpreter_history: list[dict]):
88
+ interpreter_history = [interpreter_history[0]] if interpreter_history and interpreter_history[0]["role"] == "system" else []
89
+ if not gradio_history:
90
+ return interpreter_history
91
+ for item in gradio_history:
92
+ if item[0] is not None:
93
+ interpreter_history.append({"role": "user", "content": item[0]})
94
+ if item[1] is not None:
95
+ interpreter_history.append({"role": "assistant", "content": item[1]})
96
+ return interpreter_history
97
+
98
+ def update_uuid(dialog_info):
99
+ new_uuid = str(uuid4())
100
+ logging.info(f"allocating new uuid {new_uuid} for conversation...")
101
+ return [new_uuid, dialog_info[1]]
102
+
103
+ def is_valid_python_code(code):
104
+ try:
105
+ ast.parse(code)
106
+ return True
107
+ except SyntaxError:
108
+ return False
109
+
110
+
111
+ class InputFunctionVisitor(ast.NodeVisitor):
112
+ def __init__(self):
113
+ self.found_input = False
114
+
115
+ def visit_Call(self, node):
116
+ if isinstance(node.func, ast.Name) and node.func.id == 'input':
117
+ self.found_input = True
118
+ self.generic_visit(node)
119
+
120
+ def has_input_function_calls(code):
121
+ try:
122
+ tree = ast.parse(code)
123
+ except SyntaxError:
124
+ return False
125
+ visitor = InputFunctionVisitor()
126
+ visitor.visit(tree)
127
+ return visitor.found_input
128
+
129
+ def gradio_launch(model_path: str, MAX_TRY: int = 3):
130
+ with gr.Blocks() as demo:
131
+ chatbot = gr.Chatbot(height=600, label="OpenCodeInterpreter", avatar_images=["assets/user.pic.jpg", "assets/assistant.pic.jpg"], show_copy_button=True)
132
+ with gr.Group():
133
+ with gr.Row():
134
+ msg = gr.Textbox(
135
+ container=False,
136
+ show_label=False,
137
+ label="Message",
138
+ placeholder="Type a message...",
139
+ scale=7,
140
+ autofocus=True
141
+ )
142
+ sub = gr.Button(
143
+ "Submit",
144
+ variant="primary",
145
+ scale=1,
146
+ min_width=150
147
+ )
148
+ # stop = gr.Button(
149
+ # "Stop",
150
+ # variant="stop",
151
+ # visible=False,
152
+ # scale=1,
153
+ # min_width=150
154
+ # )
155
+
156
+ with gr.Row():
157
+ # retry = gr.Button("🔄 Retry", variant="secondary")
158
+ # undo = gr.Button("↩️ Undo", variant="secondary")
159
+ clear = gr.Button("🗑️ Clear", variant="secondary")
160
+
161
+ session_state = gr.State([])
162
+ jupyter_state = gr.State(JupyterNotebook())
163
+ dialog_info = gr.State(["", 0])
164
+ demo.load(update_uuid, dialog_info, dialog_info)
165
+
166
+ def bot(user_message, history, jupyter_state, dialog_info, interpreter):
167
+ logging.info(f"user message: {user_message}")
168
+ interpreter.dialog = convert_history(gradio_history=history, interpreter_history=interpreter.dialog)
169
+ history.append([user_message, None])
170
+
171
+ interpreter.dialog.append({"role": "user", "content": user_message})
172
+
173
+ # setup
174
+ HAS_CODE = False # For now
175
+ prompt = interpreter.dialog_to_prompt(dialog=interpreter.dialog)
176
+
177
+ _ = interpreter.generate(prompt)
178
+ history[-1][1] = ""
179
+ generated_text = ""
180
+ for character in interpreter.streamer:
181
+ history[-1][1] += character
182
+ history[-1][1] = history[-1][1].replace("<|EOT|>","")
183
+ generated_text += character
184
+ yield history, history, jupyter_state, dialog_info
185
+
186
+ if is_valid_python_code(history[-1][1].strip()):
187
+ history[-1][1] = f"```python\n{history[-1][1].strip()}\n```"
188
+ generated_text = history[-1][1]
189
+
190
+ HAS_CODE, generated_code_block = interpreter.extract_code_blocks(
191
+ generated_text
192
+ )
193
+
194
+ interpreter.dialog.append(
195
+ {
196
+ "role": "assistant",
197
+ "content": generated_text.replace("<unk>_", "")
198
+ .replace("<unk>", "")
199
+ .replace("<|EOT|>", ""),
200
+ }
201
+ )
202
+
203
+ logging.info(f"saving current dialog to file {dialog_info[0]}.json...")
204
+ logging.info(f"current dialog: {interpreter.dialog}")
205
+ save_json(interpreter.dialog, mode="openci_only", json_file_path=JSON_DATASET_DIR/f"{dialog_info[0]}.json", dialog_id=dialog_info[0])
206
+
207
+ attempt = 1
208
+ while HAS_CODE:
209
+ if attempt > MAX_TRY:
210
+ break
211
+ # if no code then doesn't have to execute it
212
+ generated_text = "" # clear generated text
213
+
214
+ yield history, history, jupyter_state, dialog_info
215
+
216
+ # replace unknown thing to none ''
217
+ generated_code_block = generated_code_block.replace(
218
+ "<unk>_", ""
219
+ ).replace("<unk>", "")
220
+
221
+ if has_input_function_calls(generated_code_block):
222
+ code_block_output = "Please directly assign the value of inputs instead of using input() function in your code."
223
+ else:
224
+ (
225
+ code_block_output,
226
+ error_flag,
227
+ ) = interpreter.execute_code_and_return_output(
228
+ f"{generated_code_block}",
229
+ jupyter_state
230
+ )
231
+ if error_flag == "Timeout":
232
+ logging.info(f"{dialog_info[0]}: Restart jupyter kernel due to timeout")
233
+ jupyter_state = JupyterNotebook()
234
+ code_block_output = interpreter.clean_code_output(code_block_output)
235
+
236
+ if code_block_output.strip():
237
+ code_block_output = "Execution result: \n" + code_block_output
238
+ else:
239
+ code_block_output = "Code is executed, but result is empty. Please make sure that you include test case in your code."
240
+
241
+ history.append([code_block_output, ""])
242
+
243
+ interpreter.dialog.append({"role": "user", "content": code_block_output})
244
+
245
+ yield history, history, jupyter_state, dialog_info
246
+
247
+ prompt = interpreter.dialog_to_prompt(dialog=interpreter.dialog)
248
+
249
+ logging.info(f"generating answer for dialog {dialog_info[0]}")
250
+ _ = interpreter.generate(prompt)
251
+ for character in interpreter.streamer:
252
+ history[-1][1] += character
253
+ history[-1][1] = history[-1][1].replace("<|EOT|>","")
254
+ generated_text += character
255
+ yield history, history, jupyter_state, dialog_info
256
+ logging.info(f"finish generating answer for dialog {dialog_info[0]}")
257
+
258
+ HAS_CODE, generated_code_block = interpreter.extract_code_blocks(
259
+ history[-1][1]
260
+ )
261
+
262
+ interpreter.dialog.append(
263
+ {
264
+ "role": "assistant",
265
+ "content": generated_text.replace("<unk>_", "")
266
+ .replace("<unk>", "")
267
+ .replace("<|EOT|>", ""),
268
+ }
269
+ )
270
+
271
+ attempt += 1
272
+
273
+ logging.info(f"saving current dialog to file {dialog_info[0]}.json...")
274
+ logging.info(f"current dialog: {interpreter.dialog}")
275
+ save_json(interpreter.dialog, mode="openci_only", json_file_path=JSON_DATASET_DIR/f"{dialog_info[0]}.json", dialog_id=dialog_info[0])
276
+
277
+ if generated_text.endswith("<|EOT|>"):
278
+ continue
279
+
280
+ return history, history, jupyter_state, dialog_info
281
+
282
+
283
+ def reset_textbox():
284
+ return gr.update(value="")
285
+
286
+
287
+ def clear_history(history, jupyter_state, dialog_info, interpreter):
288
+ interpreter.dialog = []
289
+ jupyter_state.close()
290
+ return [], [], JupyterNotebook(), update_uuid(dialog_info)
291
+
292
+ interpreter = StreamingOpenCodeInterpreter(model_path=model_path)
293
+
294
+ sub.click(partial(bot, interpreter=interpreter), [msg, session_state, jupyter_state, dialog_info], [chatbot, session_state, jupyter_state, dialog_info])
295
+ sub.click(reset_textbox, [], [msg])
296
+
297
+ clear.click(partial(clear_history, interpreter=interpreter), [session_state, jupyter_state, dialog_info], [chatbot, session_state, jupyter_state, dialog_info], queue=False)
298
+
299
+ demo.queue(max_size=20)
300
+ demo.launch(share=True)
301
+
302
+
303
+ if __name__ == "__main__":
304
+ import argparse
305
+
306
+ parser = argparse.ArgumentParser()
307
+ parser.add_argument(
308
+ "--path",
309
+ type=str,
310
+ required=False,
311
+ help="Path to the OpenCodeInterpreter Model.",
312
+ default="m-a-p/OpenCodeInterpreter-DS-6.7B",
313
+ )
314
+ args = parser.parse_args()
315
+
316
+ gradio_launch(model_path=args.path)
code_interpreter/BaseCodeInterpreter.py ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import sys
3
+ import re
4
+
5
+ prj_root_path = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
6
+ sys.path.append(prj_root_path)
7
+
8
+
9
+ from utils.const import *
10
+
11
+ class BaseCodeInterpreter:
12
+ def __init__(self):
13
+ self.dialog = [
14
+ {
15
+ "role": "system",
16
+ "content": CODE_INTERPRETER_SYSTEM_PROMPT,
17
+ },
18
+ ]
19
+
20
+ @staticmethod
21
+ def extract_code_blocks(text: str):
22
+ pattern = r"```(?:python\n)?(.*?)```" # Match optional 'python\n' but don't capture it
23
+ code_blocks = re.findall(pattern, text, re.DOTALL)
24
+ return [block.strip() for block in code_blocks]
25
+
26
+ def execute_code_and_return_output(self, code_str: str, nb):
27
+ _, _ = nb.add_and_run(GUARD_CODE)
28
+ outputs, error_flag = nb.add_and_run(code_str)
29
+ return outputs, error_flag
code_interpreter/JupyterClient.py ADDED
@@ -0,0 +1,85 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from jupyter_client import KernelManager
2
+ import threading
3
+ import re
4
+ from utils.const import *
5
+
6
+
7
+ class JupyterNotebook:
8
+ def __init__(self):
9
+ self.km = KernelManager()
10
+ self.km.start_kernel()
11
+ self.kc = self.km.client()
12
+ _ = self.add_and_run(TOOLS_CODE)
13
+
14
+ def clean_output(self, outputs):
15
+ outputs_only_str = list()
16
+ for i in outputs:
17
+ if type(i) == dict:
18
+ if "text/plain" in list(i.keys()):
19
+ outputs_only_str.append(i["text/plain"])
20
+ elif type(i) == str:
21
+ outputs_only_str.append(i)
22
+ elif type(i) == list:
23
+ error_msg = "\n".join(i)
24
+ error_msg = re.sub(r"\x1b\[.*?m", "", error_msg)
25
+ outputs_only_str.append(error_msg)
26
+
27
+ return "\n".join(outputs_only_str).strip()
28
+
29
+ def add_and_run(self, code_string):
30
+ # This inner function will be executed in a separate thread
31
+ def run_code_in_thread():
32
+ nonlocal outputs, error_flag
33
+
34
+ # Execute the code and get the execution count
35
+ msg_id = self.kc.execute(code_string)
36
+
37
+ while True:
38
+ try:
39
+ msg = self.kc.get_iopub_msg(timeout=20)
40
+
41
+ msg_type = msg["header"]["msg_type"]
42
+ content = msg["content"]
43
+
44
+ if msg_type == "execute_result":
45
+ outputs.append(content["data"])
46
+ elif msg_type == "stream":
47
+ outputs.append(content["text"])
48
+ elif msg_type == "error":
49
+ error_flag = True
50
+ outputs.append(content["traceback"])
51
+
52
+ # If the execution state of the kernel is idle, it means the cell finished executing
53
+ if msg_type == "status" and content["execution_state"] == "idle":
54
+ break
55
+ except:
56
+ break
57
+
58
+ outputs = []
59
+ error_flag = False
60
+
61
+ # Start the thread to run the code
62
+ thread = threading.Thread(target=run_code_in_thread)
63
+ thread.start()
64
+
65
+ # Wait for 20 seconds for the thread to finish
66
+ thread.join(timeout=20)
67
+
68
+ # If the thread is still alive after 20 seconds, it's a timeout
69
+ if thread.is_alive():
70
+ outputs = ["Execution timed out."]
71
+ # outputs = ["Error"]
72
+ error_flag = "Timeout"
73
+
74
+ return self.clean_output(outputs), error_flag
75
+
76
+ def close(self):
77
+ """Shutdown the kernel."""
78
+ self.km.shutdown_kernel()
79
+
80
+ def __deepcopy__(self, memo):
81
+ if id(self) in memo:
82
+ return memo[id(self)]
83
+ new_copy = type(self)()
84
+ memo[id(self)] = new_copy
85
+ return new_copy
code_interpreter/OpenCodeInterpreter.py ADDED
@@ -0,0 +1,80 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import sys
2
+ import os
3
+
4
+ prj_root_path = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
5
+ sys.path.append(prj_root_path)
6
+
7
+ from code_interpreter.BaseCodeInterpreter import BaseCodeInterpreter
8
+ from utils.const import *
9
+
10
+ from typing import List, Tuple, Dict
11
+ import re
12
+
13
+ import torch
14
+ from transformers import AutoModelForCausalLM, AutoTokenizer
15
+
16
+
17
+ sys.path.append(os.path.dirname(__file__))
18
+ sys.path.append(os.path.dirname(os.path.abspath(__file__)))
19
+
20
+ import warnings
21
+
22
+ warnings.filterwarnings("ignore", category=UserWarning, module="transformers")
23
+ os.environ["TF_CPP_MIN_LOG_LEVEL"] = "2"
24
+
25
+
26
+ class OpenCodeInterpreter(BaseCodeInterpreter):
27
+ def __init__(
28
+ self,
29
+ model_path: str,
30
+ load_in_8bit: bool = False,
31
+ load_in_4bit: bool = False,
32
+ ):
33
+ # build tokenizer
34
+ self.tokenizer = AutoTokenizer.from_pretrained(
35
+ model_path,
36
+ padding_side="right",
37
+ trust_remote_code=True
38
+ )
39
+
40
+ self.model = AutoModelForCausalLM.from_pretrained(
41
+ model_path,
42
+ device_map="auto",
43
+ load_in_4bit=load_in_4bit,
44
+ load_in_8bit=load_in_8bit,
45
+ torch_dtype=torch.float16,
46
+ trust_remote_code=True
47
+ )
48
+
49
+ self.model.resize_token_embeddings(len(self.tokenizer))
50
+
51
+ self.model = self.model.eval()
52
+
53
+ self.dialog = []
54
+ self.MAX_CODE_OUTPUT_LENGTH = 1000
55
+
56
+
57
+ def dialog_to_prompt(self, dialog: List[Dict]) -> str:
58
+ full_str = self.tokenizer.apply_chat_template(dialog, tokenize=False)
59
+
60
+ return full_str
61
+
62
+ def extract_code_blocks(self, prompt: str) -> Tuple[bool, str]:
63
+ pattern = re.escape("```python") + r"(.*?)" + re.escape("```")
64
+ matches = re.findall(pattern, prompt, re.DOTALL)
65
+
66
+ if matches:
67
+ # Return the last matched code block
68
+ return True, matches[-1].strip()
69
+ else:
70
+ return False, ""
71
+
72
+ def clean_code_output(self, output: str) -> str:
73
+ if self.MAX_CODE_OUTPUT_LENGTH < len(output):
74
+ return (
75
+ output[: self.MAX_CODE_OUTPUT_LENGTH // 5]
76
+ + "\n...(truncated due to length)...\n"
77
+ + output[-self.MAX_CODE_OUTPUT_LENGTH // 5 :]
78
+ )
79
+
80
+ return output
code_interpreter/__pycache__/BaseCodeInterpreter.cpython-312.pyc ADDED
Binary file (1.9 kB). View file
 
code_interpreter/__pycache__/JupyterClient.cpython-312.pyc ADDED
Binary file (3.84 kB). View file
 
code_interpreter/__pycache__/OpenCodeInterpreter.cpython-312.pyc ADDED
Binary file (3.91 kB). View file
 
requirements.txt ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ accelerate==0.21.0
2
+ bitsandbytes==0.41.1
3
+ colorama==0.4.6
4
+ coloredlogs==15.0.1
5
+ colorlog==6.7.0
6
+ datasets==2.12.0
7
+ deepspeed==0.10.1
8
+ diffusers==0.20.0
9
+ einops==0.6.1
10
+ gradio==3.48.0
11
+ ipykernel==6.25.1
12
+ ipython==8.12.2
13
+ jupyter_client==8.3.0
14
+ jupyter_core==5.3.0
15
+ Markdown==3.4.3
16
+ nbclient==0.8.0
17
+ nbconvert==7.7.1
18
+ nbformat==5.8.0
19
+ omegaconf==2.3.0
20
+ openai==0.27.7
21
+ rich==13.7.0
22
+ scikit-learn==1.4.0
23
+ scipy==1.12.0
24
+ seaborn==0.13.2
25
+ sentencepiece==0.1.99
26
+ termcolor==2.3.0
27
+ tqdm==4.66.1
28
+ transformers==4.37.1
29
+ triton==2.0.0
30
+ yfinance==0.2.28
31
+ retrying==1.3.4
32
+ pydantic<2.0.0
utils/__pycache__/const.cpython-312.pyc ADDED
Binary file (3.63 kB). View file
 
utils/cleaner.py ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import re
2
+ import os
3
+
4
+ PYTHON_PREFIX = os.environ.get("CONDA_PREFIX", "/usr/local")
5
+
6
+ SITE_PKG_ERROR_PREFIX = f'File {PYTHON_PREFIX}/lib/python3.10/'
7
+
8
+ def get_error_header(traceback_str):
9
+ lines = traceback_str.split('\n')
10
+ for line in lines:
11
+ if 'Error:' in line:
12
+ return line
13
+ return '' # Return None if no error message is found
14
+
15
+ def clean_error_msg(error_str:str =''):
16
+ filtered_error_msg = error_str.__str__().split('An error occurred while executing the following cell')[-1].split("\n------------------\n")[-1]
17
+ raw_error_msg = "".join(filtered_error_msg)
18
+
19
+ # Remove escape sequences for colored text
20
+ ansi_escape = re.compile(r'\x1b\[[0-?]*[ -/]*[@-~]')
21
+ error_msg = ansi_escape.sub('', raw_error_msg)
22
+
23
+ error_str_out = ''
24
+ error_msg_only_cell = error_msg.split(SITE_PKG_ERROR_PREFIX)
25
+
26
+ error_str_out += f'{error_msg_only_cell[0]}\n'
27
+ error_header = get_error_header(error_msg_only_cell[-1])
28
+ if error_header not in error_str_out:
29
+ error_str_out += get_error_header(error_msg_only_cell[-1])
30
+
31
+ return error_str_out
utils/const.py ADDED
@@ -0,0 +1,88 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ TOOLS_CODE = """
2
+ import numpy as np
3
+ import pandas as pd
4
+ import matplotlib.pyplot as plt
5
+ import seaborn as sns
6
+ from scipy import stats
7
+ import os,sys
8
+ import re
9
+ from datetime import datetime
10
+ from sympy import symbols, Eq, solve
11
+ import torch
12
+ import requests
13
+ from bs4 import BeautifulSoup
14
+ import json
15
+ import math
16
+ import yfinance
17
+ import time
18
+ """
19
+
20
+ write_denial_function = 'lambda *args, **kwargs: (_ for _ in ()).throw(PermissionError("Writing to disk operation is not permitted due to safety reasons. Please do not try again!"))'
21
+ read_denial_function = 'lambda *args, **kwargs: (_ for _ in ()).throw(PermissionError("Reading from disk operation is not permitted due to safety reasons. Please do not try again!"))'
22
+ class_denial = """Class Denial:
23
+ def __getattr__(self, name):
24
+ def method(*args, **kwargs):
25
+ return "Using this class is not permitted due to safety reasons. Please do not try again!"
26
+ return method
27
+ """
28
+
29
+ GUARD_CODE = f"""
30
+ import os
31
+
32
+ os.kill = {write_denial_function}
33
+ os.system = {write_denial_function}
34
+ os.putenv = {write_denial_function}
35
+ os.remove = {write_denial_function}
36
+ os.removedirs = {write_denial_function}
37
+ os.rmdir = {write_denial_function}
38
+ os.fchdir = {write_denial_function}
39
+ os.setuid = {write_denial_function}
40
+ os.fork = {write_denial_function}
41
+ os.forkpty = {write_denial_function}
42
+ os.killpg = {write_denial_function}
43
+ os.rename = {write_denial_function}
44
+ os.renames = {write_denial_function}
45
+ os.truncate = {write_denial_function}
46
+ os.replace = {write_denial_function}
47
+ os.unlink = {write_denial_function}
48
+ os.fchmod = {write_denial_function}
49
+ os.fchown = {write_denial_function}
50
+ os.chmod = {write_denial_function}
51
+ os.chown = {write_denial_function}
52
+ os.chroot = {write_denial_function}
53
+ os.fchdir = {write_denial_function}
54
+ os.lchflags = {write_denial_function}
55
+ os.lchmod = {write_denial_function}
56
+ os.lchown = {write_denial_function}
57
+ os.getcwd = {write_denial_function}
58
+ os.chdir = {write_denial_function}
59
+ os.popen = {write_denial_function}
60
+
61
+ import shutil
62
+
63
+ shutil.rmtree = {write_denial_function}
64
+ shutil.move = {write_denial_function}
65
+ shutil.chown = {write_denial_function}
66
+
67
+ import subprocess
68
+
69
+ subprocess.Popen = {write_denial_function} # type: ignore
70
+
71
+ import sys
72
+
73
+ sys.modules["ipdb"] = {write_denial_function}
74
+ sys.modules["joblib"] = {write_denial_function}
75
+ sys.modules["resource"] = {write_denial_function}
76
+ sys.modules["psutil"] = {write_denial_function}
77
+ sys.modules["tkinter"] = {write_denial_function}
78
+ """
79
+
80
+ CODE_INTERPRETER_SYSTEM_PROMPT = """You are an AI code interpreter.
81
+ Your goal is to help users do a variety of jobs by executing Python code.
82
+
83
+ You should:
84
+ 1. Comprehend the user's requirements carefully & to the letter.
85
+ 2. Give a brief description for what you plan to do & call the provided function to run code.
86
+ 3. Provide results analysis based on the execution output.
87
+ 4. If error occurred, try to fix it.
88
+ 5. Response in the same language as the user."""