Sakalti/ultiima-32B-Q3-mlx

The Model Sakalti/ultiima-32B-Q3-mlx was converted to MLX format from Sakalti/ultiima-32B using mlx-lm version 0.20.5.

Use with mlx

pip install mlx-lm

from mlx_lm import load, generate

model, tokenizer = load("Sakalti/ultiima-32B-Q3-mlx")

prompt="hello"

if hasattr(tokenizer, "apply_chat_template") and tokenizer.chat_template is not None:
    messages = [{"role": "user", "content": prompt}]
    prompt = tokenizer.apply_chat_template(
        messages, tokenize=False, add_generation_prompt=True
    )

response = generate(model, tokenizer, prompt=prompt, verbose=True)

Downloads last month: 31

Safetensors

Model size

4.1B params

Tensor type

FP16

U32

Inference Providers NEW

Text Generation

This model is not currently available via any of the supported Inference Providers.

Model tree for Sakalti/ultiima-32B-Q3-mlx

Base model

Sakalti/ultiima-32B

Quantized

(5)

this model

Evaluation results

strict accuracy on IFEval (0-Shot)
Open LLM Leaderboard

68.540
normalized accuracy on BBH (3-Shot)
Open LLM Leaderboard

58.110
exact match on MATH Lvl 5 (4-Shot)
Open LLM Leaderboard

43.130
acc_norm on GPQA (0-shot)
Open LLM Leaderboard

17.450
acc_norm on MuSR (0-shot)
Open LLM Leaderboard

24.130
accuracy on MMLU-PRO (5-shot)
test set Open LLM Leaderboard

54.560

View on Papers With Code