cicdatopea
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -10,14 +10,16 @@ This model is an int4 model with group_size 128 and and symmetric quantization o
|
|
10 |
|
11 |
**Please note that loading the model in Transformers can be quite slow. Consider using an alternative serving framework for better performance.**
|
12 |
|
13 |
-
Due to limited GPU resources,
|
|
|
|
|
14 |
|
15 |
## How To Use
|
16 |
|
17 |
### INT4 Inference
|
18 |
|
19 |
````python
|
20 |
-
from transformers import AutoModelForCausalLM, AutoTokenizer
|
21 |
import torch
|
22 |
quantized_model_dir = "OPEA/DeepSeek-V3-int4-sym-gptq-inc-preview"
|
23 |
|
@@ -53,7 +55,7 @@ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
|
|
53 |
print(response)
|
54 |
|
55 |
|
56 |
-
## The following result is
|
57 |
prompt = "9.11和9.8哪个数字大"
|
58 |
|
59 |
##INT4
|
|
|
10 |
|
11 |
**Please note that loading the model in Transformers can be quite slow. Consider using an alternative serving framework for better performance.**
|
12 |
|
13 |
+
Due to limited GPU resources, we have only tested a few prompts on a CPU backend using QBits. If you found this modelnot perform well, **you can explore a quantized model in AWQ format with different hyperparameters generated via AutoRound** which will be uploaded soon
|
14 |
+
|
15 |
+
Please follow the license of the original model.
|
16 |
|
17 |
## How To Use
|
18 |
|
19 |
### INT4 Inference
|
20 |
|
21 |
````python
|
22 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
23 |
import torch
|
24 |
quantized_model_dir = "OPEA/DeepSeek-V3-int4-sym-gptq-inc-preview"
|
25 |
|
|
|
55 |
print(response)
|
56 |
|
57 |
|
58 |
+
## The following result is inferred on CPU with qbits backend
|
59 |
prompt = "9.11和9.8哪个数字大"
|
60 |
|
61 |
##INT4
|