Spaces:

pandora-s
/

Pixtral-12B-EXL2

Sleeping

pandora-s commited on Nov 11, 2024

Commit

1f6b391

verified ·

1 Parent(s): 4624aaa

Update app.py

Files changed (1) hide show

app.py CHANGED Viewed

@@ -33,6 +33,8 @@ import requests
 from huggingface_hub import snapshot_download
 default_max_context = 16384
 default_max_output = 512
@@ -48,7 +50,7 @@ available_models = [
     "8.0bpw"
 ]
 dirs = {}
-for model in available_models:
     dirs.update({model: snapshot_download(repo_id="turboderp/pixtral-12b-exl2", revision=model)})
 @spaces.GPU(duration=45)
@@ -137,15 +139,14 @@ def run_inference(message, history, model_picked, context_size, max_output):
     return result
 description="""A demo chat interface with Pixtral 12B EXL2 Quants, deployed using **ExllamaV2**!
-The model will be loaded once the GPU is available. This space specifically will load by default Pixtral at 4bpw from the following repository: [turboderp/pixtral-12b-exl2](https://huggingface.co/turboderp/pixtral-12b-exl2). Other quantization options are available.
-The current version of ExllamaV2 running is the dev branch, not the master branch: [ExllamaV2](https://github.com/turboderp/exllamav2/tree/dev).
 The model at **4bpw and 16k context size fits in less than 12GB of VRAM**, and at **2.5bpw and short context can potentially fit in 8GB of VRAM**!
 The current default settings are:
 - Model Quant: 4.0bpw
 - Context Size: 16k tokens
-- Max Output: 512 tokens
 You can select other quants and experiment!
 Thanks, turboderp!"""

 from huggingface_hub import snapshot_download
+from tqdm import tqdm
 default_max_context = 16384
 default_max_output = 512
     "8.0bpw"
 ]
 dirs = {}
+for model in tqdm(available_models):
     dirs.update({model: snapshot_download(repo_id="turboderp/pixtral-12b-exl2", revision=model)})
 @spaces.GPU(duration=45)
     return result
 description="""A demo chat interface with Pixtral 12B EXL2 Quants, deployed using **ExllamaV2**!
+The model will be loaded once the GPU is available. This space specifically will load by default Pixtral at 4bpw from the following repository: [turboderp/pixtral-12b-exl2](https://huggingface.co/turboderp/pixtral-12b-exl2). Other quantization options are available.
+The current version of ExllamaV2 running is the dev branch, not the master branch: [ExllamaV2](https://github.com/turboderp/exllamav2/tree/dev).
 The model at **4bpw and 16k context size fits in less than 12GB of VRAM**, and at **2.5bpw and short context can potentially fit in 8GB of VRAM**!
 The current default settings are:
 - Model Quant: 4.0bpw
 - Context Size: 16k tokens
+- Max Output: 512 tokens
 You can select other quants and experiment!
 Thanks, turboderp!"""