YuWangX
/

memoryllm-8b

Model card Files Files and versions Community

YuWangX commited on Aug 23, 2024

Commit

e2a1285

·

verified ·

1 Parent(s): 4084804

Update README.md

Files changed (1) hide show

README.md +43 -3

README.md CHANGED Viewed

@@ -1,3 +1,43 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+---
+This model is continually pre-trained from [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) with the structure proposed in [MemoryLLM](https://arxiv.org/abs/2402.04624).
+We equip Llama-3 with 12800 memory tokens in each layer, leading to a memory pool of 1.67B parameters.
+To use the model, please use the following code:
+```
+git clone [email protected]:wangyu-ustc/MemoryLLM.git
+cd MemoryLLM
+```
+Then simply use the following code to load the model:
+```python
+from modeling_memoryllm import MemoryLLM
+from configuration_memoryllm import MemoryLLMConfig
+from transformers import AutoTokenizer
+model = MemoryLLM.from_pretrained("YuWangX/memoryllm-8b")
+tokenizer = AutoTokenizer.from_pretrained("YuWangX/memoryllm-8b")
+```
+### How to use the model
+Inject a piece of context into the model using the following script:
+```python
+model = model.cuda()
+# Self-Update with the new context
+ctx = "Last week, John had a wonderful picnic with David. During their conversation, David mentioned multiple times that he likes eating apples. Though he didn't mention any other fruits, John says he can infer that David also like bananas."
+# please make sure the context to inject into the memory is larger than 16 tokens, this is the hard minimum when training the model. The memory will be disturbed when less than 16 tokens are injected into the memory.
+model.inject_memory(tokenizer(ctx, return_tensors='pt', add_special_tokens=False).input_ids.cuda(), update_memory=True)
+# Generation
+inputs = tokenizer("Question: What fruits does David like? Answer:", return_tensors='pt', add_special_tokens=False).input_ids.cuda()
+outputs = model.generate(input_ids=inputs, max_new_tokens=20)
+response = tokenizer.decode(outputs[0][inputs.shape[1]:])
+print(response)
+```