web3se
/

SmartBERT-v3

software-engineering

Inference Endpoints

Model card Files Files and versions Community

SmartBERT-v3 / README.md

devilyouwei's picture

add fill-mask to tags

c947ee2 verified about 2 months ago

|

2.54 kB

	---
	license: mit
	language:
	- en
	inference: true
	base_model:
	- microsoft/codebert-base-mlm
	- web3se/SmartBERT-v2
	pipeline_tag: fill-mask
	tags:
	- fill-mask
	- smart-contract
	- web3
	- software-engineering
	- embedding
	- codebert
	---

	# SmartBERT V3 CodeBERT

	![SmartBERT](https://huggingface.co/web3se/SmartBERT-v2/resolve/main/framework.png)

	## Overview

	SmartBERT V3 is a pre-trained programming language model, initialized with [CodeBERT-base-mlm](https://huggingface.co/microsoft/codebert-base-mlm). It has been further trained on [SmartBERT V2](https://huggingface.co/web3se/SmartBERT-v2) with an additional 64,000 smart contracts, to enhance its robustness in representing smart contract code at the _function_ level.

	- Training Data: Trained on a total of 80,000 smart contracts, including 16,000 from [SmartBERT V2](https://huggingface.co/web3se/SmartBERT-v2) and 64,000 (starts from 30001) new contracts.
	- Hardware: Utilized 2 Nvidia A100 80G GPUs.
	- Training Duration: Over 30 hours.
	- Evaluation Data: Evaluated on 1,500 (starts from 96425) smart contracts.

	## Usage

	```python
	from transformers import RobertaTokenizer, RobertaForMaskedLM, pipeline

	model = RobertaForMaskedLM.from_pretrained('web3se/SmartBERT-v3')
	tokenizer = RobertaTokenizer.from_pretrained('web3se/SmartBERT-v3')

	code_example = "function totalSupply() external view <mask> (uint256);"
	fill_mask = pipeline('fill-mask', model=model, tokenizer=tokenizer)

	outputs = fill_mask(code_example)
	print(outputs)
	```

	## Preprocessing

	All newline (`\n`) and tab (`\t`) characters in the _function_ code were replaced with a single space to ensure consistency in the input data format.

	## Base Model

	- Original Model: [CodeBERT-base-mlm](https://huggingface.co/microsoft/codebert-base-mlm)

	## Training Setup

	```python
	training_args = TrainingArguments(
	output_dir=OUTPUT_DIR,
	overwrite_output_dir=True,
	num_train_epochs=20,
	per_device_train_batch_size=64,
	save_steps=10000,
	save_total_limit=2,
	evaluation_strategy="steps",
	eval_steps=10000,
	resume_from_checkpoint=checkpoint
	)
	```

	## How to Use

	To train and deploy the SmartBERT V3 model for Web API services, please refer to our GitHub repository: [web3se-lab/SmartBERT](https://github.com/web3se-lab/SmartBERT).

	## Contributors

	- [Youwei Huang](https://www.devil.ren)
	- [Sen Fang](https://github.com/TomasAndersonFang)

	## Sponsors

	- [Institute of Intelligent Computing Technology, Suzhou, CAS](http://iict.ac.cn/)
	- CAS Mino (中科劢诺)