TheBloke commited on
Commit
8bb1ffa
·
1 Parent(s): e0c1569

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +65 -21
README.md CHANGED
@@ -56,31 +56,75 @@ or
56
 
57
  ## How to easily download and use this model in text-generation-webui
58
 
59
- ### Downloading the model
60
 
61
  1. Click the **Model tab**.
62
  2. Under **Download custom model or LoRA**, enter `TheBloke/Nous-Hermes-13B-GPTQ`.
63
  3. Click **Download**.
64
- 4. Wait until it says it's finished downloading.
65
- 5. Untick "Autoload model"
66
- 6. Click the **Refresh** icon next to **Model** in the top left.
67
-
68
- ### To use with AutoGPTQ (if installed)
69
-
70
- 1. In the **Model drop-down**: choose the model you just downloaded, `Nous-Hermes-13B-GPTQ`.
71
- 2. Under **GPTQ**, tick **AutoGPTQ**.
72
- 3. Click **Save settings for this model** in the top right.
73
- 4. Click **Reload the Model** in the top right.
74
- 5. Once it says it's loaded, click the **Text Generation tab** and enter a prompt!
75
-
76
- ### To use with GPTQ-for-LLaMa
77
-
78
- 1. In the **Model drop-down**: choose the model you just downloaded, `Nous-Hermes-13B-GPTQ`.
79
- 2. If you see an error in the bottom right, ignore it - it's temporary.
80
- 3. Fill out the `GPTQ parameters` on the right: `Bits = 4`, `Groupsize = 128`, `model_type = Llama`
81
- 4. Click **Save settings for this model** in the top right.
82
- 5. Click **Reload the Model** in the top right.
83
- 6. Once it says it's loaded, click the **Text Generation tab** and enter a prompt!
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
84
 
85
  ## Provided files
86
 
 
56
 
57
  ## How to easily download and use this model in text-generation-webui
58
 
59
+ Please make sure you're using the latest version of text-generation-webui
60
 
61
  1. Click the **Model tab**.
62
  2. Under **Download custom model or LoRA**, enter `TheBloke/Nous-Hermes-13B-GPTQ`.
63
  3. Click **Download**.
64
+ 4. The model will start downloading. Once it's finished it will say "Done"
65
+ 5. In the top left, click the refresh icon next to **Model**.
66
+ 6. In the **Model** dropdown, choose the model you just downloaded: `Nous-Hermes-13B-GPTQ`
67
+ 7. The model will automatically load, and is now ready for use!
68
+ 8. If you want any custom settings, set them and then click **Save settings for this model** followed by **Reload the Model** in the top right.
69
+ * Note that you do not need to set GPTQ parameters any more. These are set automatically from the file `quantize_config.json`.
70
+ 9. Once you're ready, click the **Text Generation tab** and enter a prompt to get started!
71
+
72
+ ## How to use this GPTQ model from Python code
73
+
74
+ First make sure you have [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ) installed:
75
+
76
+ `pip install auto-gptq`
77
+
78
+ Then try the following example code:
79
+
80
+ ```python
81
+ from transformers import AutoTokenizer, pipeline, logging
82
+ from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
83
+ import argparse
84
+
85
+ model_name_or_path = "TheBloke/Nous-Hermes-13B-GPTQ"
86
+ model_basename = "nous-hermes-13b-GPTQ-4bit-128g.no-act.order"
87
+
88
+ use_triton = False
89
+
90
+ tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
91
+
92
+ model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
93
+ model_basename=model_basename,
94
+ use_safetensors=True,
95
+ trust_remote_code=True,
96
+ device="cuda:0",
97
+ use_triton=use_triton,
98
+ quantize_config=None)
99
+
100
+ print("\n\n*** Generate:")
101
+
102
+ input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
103
+ output = model.generate(inputs=input_ids, temperature=0.7, max_new_tokens=512)
104
+ print(tokenizer.decode(output[0]))
105
+
106
+ # Inference can also be done using transformers' pipeline
107
+
108
+ # Prevent printing spurious transformers error when using pipeline with AutoGPTQ
109
+ logging.set_verbosity(logging.CRITICAL)
110
+
111
+ prompt = "Tell me about AI"
112
+ prompt_template=f'''### Human: {prompt}
113
+ ### Assistant:'''
114
+
115
+ print("*** Pipeline:")
116
+ pipe = pipeline(
117
+ "text-generation",
118
+ model=model,
119
+ tokenizer=tokenizer,
120
+ max_new_tokens=512,
121
+ temperature=0.7,
122
+ top_p=0.95,
123
+ repetition_penalty=1.15
124
+ )
125
+
126
+ print(pipe(prompt_template)[0]['generated_text'])
127
+ ```
128
 
129
  ## Provided files
130