Can you make a 2.4bpw quantization?
#1
by
xldistance
- opened
2.65bpw quantization set max_position_embeddings to 10000, occupy more than 25GB of video memory, 4090 graphics card with very bad
I can add a 2.4bpw quant, you may need to adjust max tokens if it doesn't fit.