CodeLlama-70b-Instruct-hf-GGUF
Original Model
codellama/CodeLlama-70b-Instruct-hf
Run with LlamaEdge
LlamaEdge version: v0.2.11 and above
Prompt template
Prompt type:
codellama-super-instruct
Prompt string
<s>Source: system\n\n {system_prompt} <step> Source: user\n\n {user_message_1} <step> Source: assistant\n\n {ai_message_1} <step> Source: user\n\n {user_message_2} <step> Source: assistant\nDestination: user\n\n
Reverse prompt:
<step> Source: assistant\nEOT: true
Context size:
8192
Run as LlamaEdge service
wasmedge --dir .:. --nn-preload default:GGML:AUTO:CodeLlama-70b-Instruct-hf-Q2_K.gguf llama-api-server.wasm -p codellama-super-instruct -c 1024 --reverse-prompt 'Source: assistant\nEOT: true'
Note that the model only works in the non-streaming mode.
Quantized GGUF Models
Name | Quant method | Bits | Size | Use case |
---|---|---|---|---|
CodeLlama-70b-Instruct-hf-Q2_K.gguf | Q2_K | 2 | 25.5 GB | smallest, significant quality loss - not recommended for most purposes |
CodeLlama-70b-Instruct-hf-Q3_K_L.gguf | Q3_K_L | 3 | 36.1 GB | small, substantial quality loss |
CodeLlama-70b-Instruct-hf-Q3_K_M.gguf | Q3_K_M | 3 | 33.3 GB | very small, high quality loss |
CodeLlama-70b-Instruct-hf-Q3_K_S.gguf | Q3_K_S | 3 | 29.9 GB | very small, high quality loss |
CodeLlama-70b-Instruct-hf-Q4_0.gguf | Q4_0 | 4 | 38.9 GB | legacy; small, very high quality loss - prefer using Q3_K_M |
CodeLlama-70b-Instruct-hf-Q4_K_M.gguf | Q4_K_M | 4 | 41.4 GB | medium, balanced quality - recommended |
CodeLlama-70b-Instruct-hf-Q4_K_S.gguf | Q4_K_S | 4 | 39.2 GB | small, greater quality loss |
CodeLlama-70b-Instruct-hf-Q5_0.gguf | Q5_0 | 5 | 47.5 GB | legacy; medium, balanced quality - prefer using Q4_K_M |
CodeLlama-70b-Instruct-hf-Q5_K_M.gguf | Q5_K_M | 5 | 48.8 GB | large, very low quality loss - recommended |
CodeLlama-70b-Instruct-hf-Q5_K_S.gguf | Q5_K_S | 5 | 47.5 GB | large, low quality loss - recommended |
- Downloads last month
- 92
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for second-state/CodeLlama-70b-Instruct-hf-GGUF
Base model
codellama/CodeLlama-70b-Instruct-hf