--- license: cc-by-4.0 base_model: BAAI/bge-m3 language: ["vi"] library_name: sentence-transformers pipeline_tag: sentence-similarity inference: false --- # `BAAI/bge-m3` in GGUF format original: https://huggingface.co/BAAI/bge-m3 quantization: ```bash REL=b3827 # can change to a later release wget https://github.com/ggerganov/llama.cpp/releases/download/$REL/llama-$REL-bin-ubuntu-x64.zip --content-disposition --continue &> /dev/null wget https://github.com/ggerganov/llama.cpp/archive/refs/tags/$REL.zip --content-disposition --continue &> /dev/null unzip -q llama-$REL-bin-ubuntu-x64.zip unzip -q llama.cpp-$REL.zip mv llama.cpp-$REL/* . rm -r llama.cpp-$REL/ llama-$REL-bin-ubuntu-x64.zip llama.cpp-$REL.zip pip install -q -r requirements.txt rm -rf models/tmp/ git clone --depth=1 --single-branch https://huggingface.co/BAAI/bge-m3 models/tmp python convert_hf_to_gguf.py models/tmp/ --outfile model-f32.gguf --outtype f32 build/bin/llama-quantize model-f32.gguf model-f16.gguf f16 2> /dev/null build/bin/llama-quantize model-f32.gguf model-bf16.gguf bf16 2> /dev/null build/bin/llama-quantize model-f32.gguf model-q8_0.gguf q8_0 2> /dev/null build/bin/llama-quantize model-f32.gguf model-q6_k.gguf q6_k 2> /dev/null build/bin/llama-quantize model-f32.gguf model-q5_k_m.gguf q5_k_m 2> /dev/null build/bin/llama-quantize model-f32.gguf model-q5_k_s.gguf q5_k_s 2> /dev/null build/bin/llama-quantize model-f32.gguf model-q4_k_m.gguf q4_k_m 2> /dev/null build/bin/llama-quantize model-f32.gguf model-q4_k_s.gguf q4_k_s 2> /dev/null rm -rf models/yolo/ mkdir -p models/yolo mv model-*.gguf models/yolo/ touch models/yolo/README.md huggingface-cli upload bge-m3-gguf models/yolo . ``` usage: ```bash build/bin/llama-embedding -m model-q5_k_m.gguf -p "Cô ấy cười nói suốt cả ngày" --embd-output-format array 2> /dev/null # OR build/bin/llama-server --embedding -c 8192 -m model-q5_k_m.gguf ```