Yakov Saparov's picture

Yakov Saparov

Outrun32
Ā·

AI & ML interests

ML Engineering

Recent Activity

liked a model about 1 month ago
timm/tf_efficientnetv2_b0.in1k
liked a model 2 months ago
ali-vilab/In-Context-LoRA
View all activity

Organizations

ControlNet on SegAny's profile picture stamps labs's profile picture

Outrun32's activity

New activity in mlabonne/BigLlama-3.1-1T-Instruct 5 months ago

Recommended hardware?

13
#1 opened 5 months ago by
sdalemorrey
reacted to mlabonne's post with šŸš€ 9 months ago
view post
Post
9264
āš” AutoQuant

AutoQuant is the evolution of my previous AutoGGUF notebook (https://colab.research.google.com/drive/1P646NEg33BZy4BfLDNpTz0V0lwIU3CHu). It allows you to quantize your models in five different formats:

- GGUF: perfect for inference on CPUs (and LM Studio)
- GPTQ/EXL2: fast inference on GPUs
- AWQ: super fast inference on GPUs with vLLM (https://github.com/vllm-project/vllm)
- HQQ: extreme quantization with decent 2-bit and 3-bit models

Once the model is converted, it automatically uploads it on the Hugging Face Hub. To quantize a 7B model, GGUF only needs a T4 GPU, while the other methods require an A100 GPU.

Here's an example of a model I quantized using HQQ and AutoQuant: mlabonne/AlphaMonarch-7B-2bit-HQQ

I hope you'll enjoy it and quantize lots of models! :)

šŸ’» AutoQuant: https://colab.research.google.com/drive/1b6nqC7UZVt8bx4MksX7s656GXPM-eWw4
Ā·