nintwentydo
/

pixtral-12b-FP8-dynamic-FP8-KV-cache

Image-Text-to-Text

compressed-tensors

Model card Files Files and versions Community

nintwentydo commited on Dec 29, 2024

Commit

72f5202

·

verified ·

1 Parent(s): 4e8cca7

Create README.md

Files changed (1) hide show

README.md +31 -0

README.md ADDED Viewed

	@@ -0,0 +1,31 @@

+---
+tags:
+- fp8
+- vllm
+language:
+- en
+- de
+- fr
+- it
+- pt
+- hi
+- es
+- th
+pipeline_tag: image-text-to-text
+license: apache-2.0
+library_name: vllm
+base_model:
+- mistral-community/pixtral-12b
+- mistralai/Pixtral-12B-2409
+base_model_relation: quantized
+datasets:
+- HuggingFaceH4/ultrachat_200k
+---
+# Pixtral-12B-2409: FP8 Dynamic Quant + FP8 KV Cache
+Quant of [mistral-community/pixtral-12b](https://huggingface.co/mistral-community/pixtral-12b) using [LLM Compressor](https://github.com/vllm-project/llm-compressor) for optimised inference on VLLM.
+FP8 dynamic quant on language model, and FP8 quant of KV cache. multi_modal_projector and vision_tower left in FP16 since it's a small part of the model.
+Calibrated on 2048 ultrachat samples.