elinas commited on
Commit
b7b3f5d
·
1 Parent(s): 05cd441

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -9
README.md CHANGED
@@ -10,18 +10,12 @@ This LoRA trained for 3 epochs and has been converted to int4 (4bit) via GPTQ me
10
 
11
  Use the one of the two **safetensors** versions, the **pt** version is an old quantization that is no longer supported and will be removed in the future. Make sure you only have **ONE** checkpoint from the two in your model directory! See the repo below for more info.
12
 
13
- https://github.com/qwopqwop200/GPTQ-for-LLaMa
14
-
15
  LoRA credit to https://huggingface.co/baseten/alpaca-30b
16
 
17
- # Important - Update 2023-04-03
18
- Recent GPTQ commits have introduced breaking changes to model loading and you should use commit `a6f363e3f93b9fb5c26064b5ac7ed58d22e3f773` in the `cuda` branch.
19
-
20
- If you're not familiar with the Git process
21
- 1. `git checkout a6f363e3f93b9fb5c26064b5ac7ed58d22e3f773`
22
- 2. `git switch -c cuda-stable`
23
 
24
- This creates and switches to a `cuda-stable` branch to continue using the quantized models.
25
 
26
  # Update 2023-03-29
27
  There is also a non-groupsize quantized model that is 1GB smaller in size, which should allow running at max context tokens with 24GB VRAM. The evaluations are better
 
10
 
11
  Use the one of the two **safetensors** versions, the **pt** version is an old quantization that is no longer supported and will be removed in the future. Make sure you only have **ONE** checkpoint from the two in your model directory! See the repo below for more info.
12
 
 
 
13
  LoRA credit to https://huggingface.co/baseten/alpaca-30b
14
 
15
+ # Important - Update 2023-04-05
16
+ Recent GPTQ commits have introduced breaking changes to model loading and you should this fork for a stable experience https://github.com/oobabooga/GPTQ-for-LLaMa
 
 
 
 
17
 
18
+ Curently only cuda is supported.
19
 
20
  # Update 2023-03-29
21
  There is also a non-groupsize quantized model that is 1GB smaller in size, which should allow running at max context tokens with 24GB VRAM. The evaluations are better