elinas
/

alpaca-30b-lora-int4

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

elinas commited on Apr 3, 2023

Commit

8092d4a

·

1 Parent(s): 33f0dd9

update on breaking changes

Files changed (1) hide show

README.md +9 -0

README.md CHANGED Viewed

@@ -9,6 +9,15 @@ https://github.com/qwopqwop200/GPTQ-for-LLaMa
 LoRA credit to https://huggingface.co/baseten/alpaca-30b
 # Update 2023-03-29
 There is also a non-groupsize quantized model that is 1GB smaller in size, which should allow running at max context tokens with 24GB VRAM. The evaluations are better
 on the 128 groupsize version, but the tradeoff is not being able to run it at full context without offloading or a GPU with more VRAM.

 LoRA credit to https://huggingface.co/baseten/alpaca-30b
+# Update 2023-04-03
+Recent GPTQ commits have introduced breaking changes to model loading and you should use commit `a6f363e3f93b9fb5c26064b5ac7ed58d22e3f773` in the `cuda` branch.
+If you're not familiar with the Git process
+1. `git checkout a6f363e3f93b9fb5c26064b5ac7ed58d22e3f773`
+2. `git switch -c cuda-stable`
+This creates and switches to a `cuda-stable` branch to continue using the quantized models.
 # Update 2023-03-29
 There is also a non-groupsize quantized model that is 1GB smaller in size, which should allow running at max context tokens with 24GB VRAM. The evaluations are better
 on the 128 groupsize version, but the tradeoff is not being able to run it at full context without offloading or a GPU with more VRAM.