FP8-Dynamic quant created with llm-compressor, can run on 16 VRAM cards. Update vLLM and Transformers:
pip install vllm>=0.7.2
pip install git+https://github.com/huggingface/transformers
Then run with:
vllm serve leon-se/Qwen2.5-VL-7B-Instruct-FP8-Dynamic --trust-remote-code
- Downloads last month
- 369
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no library tag.
Model tree for leon-se/Qwen2.5-VL-7B-Instruct-FP8-Dynamic
Base model
Qwen/Qwen2.5-VL-7B-Instruct