setting max model length to reasonable number / max_pos_encodings, e.g. 8192

#11
by michaelfeil - opened
Files changed (2) hide show
  1. README.md +10 -0
  2. tokenizer_config.json +1 -1
README.md CHANGED
@@ -172,6 +172,16 @@ with torch.no_grad():
172
  print(scores)
173
  ```
174
 
 
 
 
 
 
 
 
 
 
 
175
  ## Load model in local
176
 
177
  1. make sure `gemma_config.py` and `gemma_model.py` from [BAAI/bge-reranker-v2.5-gemma2-lightweight](https://huggingface.co/BAAI/bge-reranker-v2.5-gemma2-lightweight/tree/main) in your local path.
 
172
  print(scores)
173
  ```
174
 
175
+ ## Infinity:
176
+
177
+ For an OpenAI API-compatible local deploment and [Infinity](https://github.com/michaelfeil/infinity)
178
+
179
+ ```
180
+ docker run -it --gpus all -v $volume:/app/.cache -p 7997:7997 \
181
+ michaelf34/infinity:0.0.70 \
182
+ v2 infinity_emb v2 --model-id BAAI/bge-reranker-v2.5-gemma2-lightweight --device cuda --no-bettertransformer
183
+ ```
184
+
185
  ## Load model in local
186
 
187
  1. make sure `gemma_config.py` and `gemma_model.py` from [BAAI/bge-reranker-v2.5-gemma2-lightweight](https://huggingface.co/BAAI/bge-reranker-v2.5-gemma2-lightweight/tree/main) in your local path.
tokenizer_config.json CHANGED
@@ -1746,7 +1746,7 @@
1746
  "bos_token": "<bos>",
1747
  "clean_up_tokenization_spaces": false,
1748
  "eos_token": "<eos>",
1749
- "model_max_length": 1000000000000000019884624838656,
1750
  "pad_token": "<pad>",
1751
  "sp_model_kwargs": {},
1752
  "spaces_between_special_tokens": false,
 
1746
  "bos_token": "<bos>",
1747
  "clean_up_tokenization_spaces": false,
1748
  "eos_token": "<eos>",
1749
+ "model_max_length": 8192,
1750
  "pad_token": "<pad>",
1751
  "sp_model_kwargs": {},
1752
  "spaces_between_special_tokens": false,