devilyouwei commited on
Commit
8e53fcf
·
1 Parent(s): 20847d2

update readme

Browse files
Files changed (2) hide show
  1. README.md +39 -17
  2. framework.png +0 -0
README.md CHANGED
@@ -1,25 +1,47 @@
1
- # SmartBERT CodeBERT 16000
2
 
3
- Trained on **16000** (1, 16000) smart contracts
4
 
5
- Evaluated on **4000** (20000, 4000) smart contracts
6
 
7
- Replace all the `\n`, `\t` in the _function_ code with one space.
8
 
9
- Base Model: **CodeBERT-mlm-base**
 
 
 
10
 
11
- Training Setup
 
 
 
 
 
 
 
 
12
 
13
  ```python
14
- training_args = TrainingArguments(
15
- output_dir=OUTPUT_DIR,
16
- overwrite_output_dir=True,
17
- num_train_epochs=20,
18
- per_device_train_batch_size=64,
19
- save_steps=10000,
20
- save_total_limit=2,
21
- evaluation_strategy="steps",
22
- eval_steps=10000,
23
- resume_from_checkpoint=checkpoint
24
- )
 
 
25
  ```
 
 
 
 
 
 
 
 
 
 
1
+ # SmartBERT V2 CodeBERT
2
 
3
+ ![SmartBERT](./framework.png)
4
 
5
+ ## Overview
6
 
7
+ SmartBERT V2 CodeBERT is a pre-trained model, initialized with **[CodeBERT-base-mlm](https://huggingface.co/microsoft/codebert-base-mlm)**, designed to transfer **Smart Contract** function-level code into embeddings effectively.
8
 
9
+ - **Training Data:** Trained on **16,000** smart contracts.
10
+ - **Hardware:** Utilized 2 Nvidia A100 80G GPUs.
11
+ - **Training Duration:** More than 10 hours.
12
+ - **Evaluation Data:** Evaluated on **4,000** smart contracts.
13
 
14
+ ## Preprocessing
15
+
16
+ All newline (`\n`) and tab (`\t`) characters in the function code were replaced with a single space to ensure consistency in the input data format.
17
+
18
+ ## Base Model
19
+
20
+ - **Base Model**: [CodeBERT-base-mlm](https://huggingface.co/microsoft/codebert-base-mlm)
21
+
22
+ ## Training Setup
23
 
24
  ```python
25
+ from transformers import TrainingArguments
26
+
27
+ training_args = TrainingArguments(
28
+ output_dir=OUTPUT_DIR,
29
+ overwrite_output_dir=True,
30
+ num_train_epochs=20,
31
+ per_device_train_batch_size=64,
32
+ save_steps=10000,
33
+ save_total_limit=2,
34
+ evaluation_strategy="steps",
35
+ eval_steps=10000,
36
+ resume_from_checkpoint=checkpoint
37
+ )
38
  ```
39
+
40
+ ## How to Use
41
+
42
+ To train and deploy the SmartBERT V2 model for Web API services, please refer to our GitHub repository: [web3se-lab/SmartBERT](https://github.com/web3se-lab/SmartBERT).
43
+
44
+ ## Contributors
45
+
46
+ - [Youwei Huang](https://www.devil.ren)
47
+ - [Sen Fang](https://github.com/TomasAndersonFang)
framework.png ADDED