cicdatopea commited on
Commit
951efd7
·
verified ·
1 Parent(s): db0e3e3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -3
README.md CHANGED
@@ -6,11 +6,15 @@ base_model:
6
  ---
7
  ## Model Details
8
 
9
- This model is an int4 model with group_size 128 and and symmetric quantization of [deepseek-ai/DeepSeek-V3](https://huggingface.co/deepseek-ai/DeepSeek-V3) generated by [intel/auto-round](https://github.com/intel/auto-round) algorithm. **Please note that loading the model in Transformers can be quite slow. Consider using an alternative serving framework for better performance.** Due to limited GPU resources, we have only tested a few prompts on a CPU backend using QBits . If you found this model not perform well, **you can explore a quantized model in AWQ format with different hyperparameters generated via AutoRound** which will be uploaded soon
 
 
 
 
10
 
11
  ## How To Use
12
 
13
- ### INT4 Inference(CPU/CUDA)
14
 
15
  ````python
16
  from transformers import AutoModelForCausalLM, AutoTokenizer,GenerationConfig
@@ -107,7 +111,7 @@ prompt = "strawberry中有几个r?"
107
  8. r - 是r
108
  """
109
 
110
- prompt = "strawberry中有几个r?"
111
  ##INT4
112
  """The word "strawberry" contains **3 "r"s.
113
  """
 
6
  ---
7
  ## Model Details
8
 
9
+ This model is an int4 model with group_size 128 and and symmetric quantization of [deepseek-ai/DeepSeek-V3](https://huggingface.co/deepseek-ai/DeepSeek-V3) generated by [intel/auto-round](https://github.com/intel/auto-round) algorithm.
10
+
11
+ **Please note that loading the model in Transformers can be quite slow. Consider using an alternative serving framework for better performance.**
12
+
13
+ Due to limited GPU resources, we have only tested a few prompts on a CPU backend using QBits . If you found this model not perform well, **you can explore a quantized model in AWQ format with different hyperparameters generated via AutoRound** which will be uploaded soon
14
 
15
  ## How To Use
16
 
17
+ ### INT4 Inference
18
 
19
  ````python
20
  from transformers import AutoModelForCausalLM, AutoTokenizer,GenerationConfig
 
111
  8. r - 是r
112
  """
113
 
114
+ prompt = "How many r in strawberry."
115
  ##INT4
116
  """The word "strawberry" contains **3 "r"s.
117
  """