YuWangX commited on
Commit
a8dec23
·
verified ·
1 Parent(s): 628a32e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -1
README.md CHANGED
@@ -16,8 +16,10 @@ Then simply use the following code to load the model:
16
  ```python
17
  from modeling_memoryllm import MemoryLLM
18
  from transformers import AutoTokenizer
19
- model = MemoryLLM.from_pretrained("YuWangX/memoryllm-8b-chat")
 
20
  tokenizer = AutoTokenizer.from_pretrained("YuWangX/memoryllm-8b-chat")
 
21
  ```
22
 
23
  ### How to use the model
 
16
  ```python
17
  from modeling_memoryllm import MemoryLLM
18
  from transformers import AutoTokenizer
19
+ # load chat model
20
+ model = MemoryLLM.from_pretrained("YuWangX/memoryllm-8b-chat", attn_implementation="flash_attention_2", torch_dtype=torch.float16)
21
  tokenizer = AutoTokenizer.from_pretrained("YuWangX/memoryllm-8b-chat")
22
+ model = model.cuda()
23
  ```
24
 
25
  ### How to use the model