is it fp16?
There are two version of the model one is fp32, so the other: ggml-distil-large-v3.bin must be fp16, right?
Why don't you have int4 or qint5? are they not good enough?
Yes that's correct: ggml-distil-large-v3.bin
weights are in fp16, ggml-distil-large-v3.fp32.bin
are in fp32. These weights are intended to provide an easy starting point to use distil-whisper in whisper.cpp. We haven't done a full sweep of the possible quantisation methods and their effect on quality. You're free to explore other quantisation methods for the weights by converting them yourself as per the instructions here:
# quantize a distil-whisper model with Q5_0 method
make quantize
./quantize models/ggml-distil-large-v3.bin models/ggml-distil-large-v3-q5_0.bin q5_0
# run the examples as usual, specifying the quantized model file
./main -m models/ggml-distil-large-v3-q5_0.bin ./samples/gb0.wav
Tried it for Urdu way much worse as compared to original v3 quantized. It was just unusable, though not tried for English probably that is good as stated in model card.
@supercharge19 As stated in https://huggingface.co/distil-whisper/distil-large-v3#intended-use:
Distil-Whisper is intended to be a drop-in replacement for Whisper large-v3 on English speech recognition.
Meaning, the model was not trained to transcript Urdu.
Then, as @sanchit-gandhi points out here:
The checkpoints on the distil-whisper organisation on the Hub currently only support English. However, it's possible to distil Whisper models in languages of your choice. See the provided training code and this checkpoint as examples. You can quite easily extend this code to train the model on multiple languages to do language switching within an utterance.
Hopefully that helps!