nvidia
/

Llama3-ChatQA-1.5-8B

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

zihanliu commited on May 2, 2024

Commit

8f93ac5

·

verified ·

1 Parent(s): 3c7506a

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -104,7 +104,7 @@ print(tokenizer.decode(response, skip_special_tokens=True))
 ```
 ### run retrieval to get top-n chunks as context
-This can be applied to the scenario when the document is very long, so that it is necessary to run retrieval. Here, we use our [Dragon-multiturn](https://huggingface.co/nvidia/dragon-multiturn-query-encoder) retriever which can handle conversatinoal query. In addition, we provide a few [documents]() for users to play with.
 ```python
 from transformers import AutoTokenizer, AutoModelForCausalLM, AutoModel

 ```
 ### run retrieval to get top-n chunks as context
+This can be applied to the scenario when the document is very long, so that it is necessary to run retrieval. Here, we use our [Dragon-multiturn](https://huggingface.co/nvidia/dragon-multiturn-query-encoder) retriever which can handle conversatinoal query. In addition, we provide a few [documents](https://huggingface.co/nvidia/ChatQA-1.5-8B/tree/main/docs) for users to play with.
 ```python
 from transformers import AutoTokenizer, AutoModelForCausalLM, AutoModel