OPEA
/

DeepSeek-V3-int4-sym-gptq-inc

4-bit precision

Model card Files Files and versions Community

cicdatopea commited on 23 days ago

Commit

df9474a

·

verified ·

1 Parent(s): 951efd7

Update README.md

Files changed (1) hide show

README.md +5 -3

README.md CHANGED Viewed

@@ -10,14 +10,16 @@ This model is an int4 model with group_size 128 and and symmetric quantization o
 **Please note that loading the model in Transformers can be quite slow. Consider using an alternative serving framework for better performance.**
-Due to limited GPU resources,  we have only tested a few prompts on a CPU backend using QBits . If you found this model not perform well, **you can explore a quantized model in AWQ format with different hyperparameters generated via AutoRound** which will be uploaded soon
 ## How To Use
 ### INT4 Inference
 ````python
-from transformers import AutoModelForCausalLM, AutoTokenizer,GenerationConfig
 import torch
 quantized_model_dir = "OPEA/DeepSeek-V3-int4-sym-gptq-inc-preview"
@@ -53,7 +55,7 @@ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
 print(response)
-## The following result is infercenced on CPU with qbits backend
 prompt = "9.11和9.8哪个数字大"
 ##INT4

 **Please note that loading the model in Transformers can be quite slow. Consider using an alternative serving framework for better performance.**
+Due to limited GPU resources, we have only tested a few prompts on a CPU backend using QBits. If you found this modelnot perform well, **you can explore a quantized model in AWQ format with different hyperparameters generated via AutoRound** which will be uploaded soon
+Please follow the license of the original model.
 ## How To Use
 ### INT4 Inference
 ````python
+from transformers import AutoModelForCausalLM, AutoTokenizer
 import torch
 quantized_model_dir = "OPEA/DeepSeek-V3-int4-sym-gptq-inc-preview"
 print(response)
+## The following result is inferred on CPU with qbits backend
 prompt = "9.11和9.8哪个数字大"
 ##INT4