File size: 1,375 Bytes

16a46d6
 
4d32216
 
 
06d923b
 
 
 
16a46d6
 
06d923b
16a46d6
06d923b
16a46d6
06d923b
16a46d6
06d923b
16a46d6
06d923b
 
 
 
 
16a46d6
06d923b
16a46d6
06d923b
16a46d6
06d923b
16a46d6
06d923b
16a46d6
06d923b
 
16a46d6
06d923b
 
16a46d6
06d923b
 
16a46d6
06d923b
 
16a46d6
06d923b
16a46d6
06d923b

---
library_name: transformers
tags:
- trl
- sft
base_model:
- meta-llama/Llama-3.2-1B-Instruct
datasets:
- ngxson/MiniThinky-dataset
---

# MiniThinky 1B

This is the newer checkpoint of [MiniThinky-1B-Llama-3.2 (version 1)](https://huggingface.co/ngxson/MiniThinky-1B-Llama-3.2), which the loss decreased from 0.7 to 0.5

Link to GGUF version: [click here](https://huggingface.co/ngxson/MiniThinky-v2-1B-Llama-3.2-Q8_0-GGUF)

Chat template is the same with llama 3, but the response will be as follow:

```
<|thinking|>{thinking_process}
<|answer|>
{real_answer}
```

## IMPORTANT: System message

The model is **very sensitive** to system message. Make sure you're using this system message (system role) at the beginning of the conversation:

`You are MiniThinky, a helpful AI assistant. You always think before giving the answer. Use <|thinking|> before thinking and <|answer|> before giving the answer.`

## Q&A

**Hardware used to trained it?**  
I used a HF space with 4xL40S, trained for 5 hours. Eval loss is about 0.8

**Benchmark?**  
I don't have time to do it alone. If you can help, please open a discussion!

**Can it count number of "r" in "raspberry"?**  
Unfortunately no

**Other things that I can tune?**  
Maybe lower temperature, or set top_k=1

---

TODO: include more info here + maybe do some benchmarks? (Plz add a discussion if you're interested)