ModelCloud
/

Qwen2.5-Coder-32B-Instruct-gptqmodel-4bit-vortex-mlx-v1

4-bit precision

Model card Files Files and versions Community

Create README.md

#1

by zx-modelcloud - opened 5 days ago

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

Files changed (1) hide show

README.md +26 -0

README.md ADDED Viewed

	@@ -0,0 +1,26 @@

+This model was exported using [GPTQModel](https://github.com/ModelCloud/GPTQModel). Below is example code for exporting a model from GPTQ format to MLX format.
+## Example:
+```python
+from gptqmodel import GPTQModel
+# load gptq quantized model
+gptq_model_path = "ModelCloud/Qwen2.5-Coder-32B-Instruct-gptqmodel-4bit-vortex-v1"
+mlx_path = f"./vortex/ModelCloud/Qwen2.5-Coder-32B-Instruct-gptqmodel-4bit-vortex-mlx-v1"
+# export to mlx model
+GPTQModel.export(gptq_model_path, mlx_path, "mlx")
+# load mlx model check if it works
+from mlx_lm import load, generate
+mlx_model, tokenizer = load(mlx_path)
+prompt = "The capital of France is"
+messages = [{"role": "user", "content": prompt}]
+prompt = tokenizer.apply_chat_template(
+    messages, add_generation_prompt=True
+)
+text = generate(mlx_model, tokenizer, prompt=prompt, verbose=True)
+```