croissantllm
/

CroissantLLMChat-v0.1

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

manu commited on Feb 3, 2024

Commit

9cef06a

·

verified ·

1 Parent(s): 104db24

Update README.md

Files changed (1) hide show

README.md +17 -5

README.md CHANGED Viewed

@@ -27,7 +27,17 @@ https://arxiv.org/abs/2402.00786
 For best performance, it should be used with a temperature of above 0.4, and with the exact template described below:
 ```python
-CHAT = """<|im_start|>user
 {USER QUERY}<|im_end|>
 <|im_start|>assistant\n"""
 ```
@@ -68,11 +78,13 @@ model_name = "croissantllm/CroissantLLMChat-v0.1"
 tokenizer = AutoTokenizer.from_pretrained(model_name)
 model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16, device_map="auto")
-CHAT = """<|im_start|>user
-Que puis-je faire à Marseille?<|im_end|>
-<|im_start|>assistant\n"""
-inputs = tokenizer(CHAT, return_tensors="pt", add_special_tokens=True).to(model.device)
 tokens = model.generate(**inputs, max_new_tokens=150, do_sample=True, top_p=0.95, top_k=60, temperature=0.5)
 print(tokenizer.decode(tokens[0]))
 ```

 For best performance, it should be used with a temperature of above 0.4, and with the exact template described below:
 ```python
+chat = [
+   {"role": "user", "content": "Que puis-je faire à Marseille en hiver?"},
+]
+chat_input = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
+```
+corresponding to:
+```python
+chat_input = """<|im_start|>user
 {USER QUERY}<|im_end|>
 <|im_start|>assistant\n"""
 ```
 tokenizer = AutoTokenizer.from_pretrained(model_name)
 model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16, device_map="auto")
+chat = [
+   {"role": "user", "content": "Que puis-je faire à Marseille en hiver?"},
+]
+chat_input = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
+inputs = tokenizer(chat_input, return_tensors="pt", add_special_tokens=True).to(model.device)
 tokens = model.generate(**inputs, max_new_tokens=150, do_sample=True, top_p=0.95, top_k=60, temperature=0.5)
 print(tokenizer.decode(tokens[0]))
 ```