|
--- |
|
library_name: transformers |
|
tags: |
|
- trl |
|
- sft |
|
base_model: |
|
- meta-llama/Llama-3.2-1B-Instruct |
|
datasets: |
|
- ngxson/MiniThinky-dataset |
|
--- |
|
|
|
# MiniThinky 1B |
|
|
|
This is the newer checkpoint of [MiniThinky-1B-Llama-3.2 (version 1)](https://huggingface.co/ngxson/MiniThinky-1B-Llama-3.2), which the loss decreased from 0.7 to 0.5 |
|
|
|
Link to GGUF version: [click here](https://huggingface.co/ngxson/MiniThinky-v2-1B-Llama-3.2-Q8_0-GGUF) |
|
|
|
Chat template is the same with llama 3, but the response will be as follow: |
|
|
|
``` |
|
<|thinking|>{thinking_process} |
|
<|answer|> |
|
{real_answer} |
|
``` |
|
|
|
## IMPORTANT: System message |
|
|
|
The model is **very sensitive** to system message. Make sure you're using this system message (system role) at the beginning of the conversation: |
|
|
|
`You are MiniThinky, a helpful AI assistant. You always think before giving the answer. Use <|thinking|> before thinking and <|answer|> before giving the answer.` |
|
|
|
## Q&A |
|
|
|
**Hardware used to trained it?** |
|
I used a HF space with 4xL40S, trained for 5 hours. Eval loss is about 0.8 |
|
|
|
**Benchmark?** |
|
I don't have time to do it alone. If you can help, please open a discussion! |
|
|
|
**Can it count number of "r" in "raspberry"?** |
|
Unfortunately no |
|
|
|
**Other things that I can tune?** |
|
Maybe lower temperature, or set top_k=1 |
|
|
|
--- |
|
|
|
TODO: include more info here + maybe do some benchmarks? (Plz add a discussion if you're interested) |
|
|