Pinkstack
/

PARM-V2-QwQ-Qwen-2.5-o1-3B-GGUF

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Pinkstack commited on 3 days ago

Commit

97b2a29

·

verified ·

1 Parent(s): 8ba7766

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -24,7 +24,7 @@ We are proud to announce, our new high quality flagship model series - ***PARM2*
  🧀 Which quant is right for you?
 - ***Q4:*** This model should be used on edge devices like high end phones or laptops due to its very compact size, quality is okay but fully usable.
-- ***Q8:*** This model should be used on most high end modern devices like rtx 3080, Responses are very high quality, but its slightly slower than Q4.
 *other formats were not included as Q4,Q8 have the best performance, quality.*
 This Parm v2 is based on Qwen 2.5 3B which has gotten many extra reasoning training parameters so it would have similar outputs to qwen QwQ / O.1 mini (only much, smaller.). We've trained it using the datasets [here](https://huggingface.co/collections/Pinkstackorg/pram-v2-67612d3c542b9121bf15891c)

  🧀 Which quant is right for you?
 - ***Q4:*** This model should be used on edge devices like high end phones or laptops due to its very compact size, quality is okay but fully usable.
+- ***Q8:*** This model should be used on most high end modern devices like rtx 3080, Responses are very high quality, but its slightly slower than Q4. (Runs at 9.89 tokens per second on a Samsung z fold 5 smartphone.)
 *other formats were not included as Q4,Q8 have the best performance, quality.*
 This Parm v2 is based on Qwen 2.5 3B which has gotten many extra reasoning training parameters so it would have similar outputs to qwen QwQ / O.1 mini (only much, smaller.). We've trained it using the datasets [here](https://huggingface.co/collections/Pinkstackorg/pram-v2-67612d3c542b9121bf15891c)