File size: 3,625 Bytes
19fe7f8
c1275c9
 
19fe7f8
d313c91
 
ea56d6a
19fe7f8
d313c91
19fe7f8
 
 
 
 
d313c91
19fe7f8
d313c91
19fe7f8
d313c91
19fe7f8
 
 
 
 
ecc8bf9
81179f4
 
 
ecc8bf9
 
 
81179f4
 
ecc8bf9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19fe7f8
81179f4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19fe7f8
 
d313c91
19fe7f8
 
 
ea56d6a
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
---
language:
- en
library_name: transformers
tags:
- code
- solidity
---
# Solidity Llama 3

## Model Details

### Model Description

Solidity Llama 3 is a Large Language Model specifically designed for Solidity code completion and infilling. It's based on the LLAMA-3 8b model and has been further trained on the DISL dataset, which contains a large and diverse collection of real-world Solidity smart contracts that have been deployed to Ethereum mainnet. The model is intended to be used for tasks such as code completion within code editors, and users should be aware of its limitations based on its training data and the inherent limitations of the technology.

- **Model type:** Code Completion
- **License:** [More Information Needed]
- **Finetuned from model:** LLAMA-3 8b

## Uses

### Direct Use

Solidity Llama 3 can be used for code completion and infilling tasks within Solidity code editors. It was trained for this task using the fill-in-the-middle (FIM) objective, where you provide a prefix and a suffix as context for the completion. The following tokens are used to separate the different parts of the input:
- <|reserved_special_token_11|> precedes the context before the completion we want to run.
- <|reserved_special_token_10|> precedes the suffix. You must put this token exactly where the cursor would be positioned in an editor, as this is the location that will be completed by the model.
- <|reserved_special_token_12|> is the prompt that invites the model to run the generation.


```python
from transformers import AutoTokenizer, AutoModelForCausalLM

FIM_SUFFIX = "<|reserved_special_token_10|>"
FIM_PREFIX = "<|reserved_special_token_11|>"
FIM_MIDDLE = "<|reserved_special_token_12|>"
tokenizer = AutoTokenizer.from_pretrained("andrijdavid/Solidity-Llama3-8b")
model = AutoModelForCausalLM.from_pretrained("andrijdavid/Solidity-Llama3-8b")

prompt = f'''{FIM_PREFIX}contract SendEther {{
    function sendViaTransfer(address payable _to) public payable {{
        // This function is no longer recommended for sending Ether.
        _to.transfer(msg.value);
    }}

    function sendViaSend(address payable _to) public payable {{
        // Send returns a boolean value indicating success or failure.
        // This function is not recommended for sending Ether.
        {FIM_SUFFIX}
    }}

    function sendViaCall(address payable _to) public payable {{
        // Call returns a boolean value indicating success or failure.
        // This is the current recommended method to use.
        (bool sent, bytes memory data) = _to.call{{value: msg.value}}("");
        require(sent, "Failed to send Ether");
    }}{FIM_MIDDLE}
'''
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
prompt_len = inputs["input_ids"].shape[-1]
outputs = model.generate(**inputs, max_new_tokens=1024)
print(tokenizer.decode(outputs[0][prompt_len:]))

```

You can provide a list of terminators to the generate function, like this:

```python

terminators = tokenizer.convert_tokens_to_ids([FIM_PREFIX, FIM_MIDDLE, FIM_SUFFIX])
terminators += [tokenizer.eos_token_id]

outputs = model.generate(
  **inputs,
  max_new_tokens=1024,
  eos_token_id=terminators,
)
print(tokenizer.decode(outputs[0][prompt_len:]))

```

### Out-of-Scope Use

The model may not perform well for tasks outside of Solidity code completion and infilling, and users should be aware of its limitations in these areas.

## Bias, Risks, and Limitations

The model's performance may be affected by biases in the training data, and users should be aware of these limitations. More information is needed for further recommendations.