etemiz
/

Llama-3.1-405B-Inst-GGUF

Inference Endpoints

Model card Files Files and versions Community

Llama-3.1-405B-Inst-GGUF / README.md

etemiz's picture

Update README.md

be2fd08 verified about 1 month ago

|

history blame contribute delete

634 Bytes

	---
	license: llama3.1
	base_model:
	- meta-llama/Llama-3.1-405B-Instruct
	---
	Llama 3.1 405B Quants and llama.cpp versions that is used for quantization
	- IQ1_S: 86.8 GB - b3459
	- IQ1_M: 95.1 GB - b3459
	- IQ2_XXS: 109.0 GB - b3459
	- IQ3_XXS: 157.7 GB - b3484

	Quantization from BF16 here:
	https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/

	which is converted from Llama 3.1 405B:
	https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct

	imatrix file https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/blob/main/405imatrix.dat

	Lmk if you need bigger quants.

	Sponsored by: https://pickabrain.ai