YuWangX
/

memoryllm-8b-chat

Model card Files Files and versions Community

YuWangX commited on Nov 17, 2024

Commit

a8dec23

·

verified ·

1 Parent(s): 628a32e

Update README.md

Files changed (1) hide show

README.md +3 -1

README.md CHANGED Viewed

@@ -16,8 +16,10 @@ Then simply use the following code to load the model:
 ```python
 from modeling_memoryllm import MemoryLLM
 from transformers import AutoTokenizer
-model = MemoryLLM.from_pretrained("YuWangX/memoryllm-8b-chat")
 tokenizer = AutoTokenizer.from_pretrained("YuWangX/memoryllm-8b-chat")
 ```
 ### How to use the model

 ```python
 from modeling_memoryllm import MemoryLLM
 from transformers import AutoTokenizer
+# load chat model
+model = MemoryLLM.from_pretrained("YuWangX/memoryllm-8b-chat", attn_implementation="flash_attention_2", torch_dtype=torch.float16)
 tokenizer = AutoTokenizer.from_pretrained("YuWangX/memoryllm-8b-chat")
+model = model.cuda()
 ```
 ### How to use the model