File size: 3,078 Bytes
fe49583
8561f5d
 
 
 
 
 
 
 
 
 
 
fe49583
8561f5d
 
 
 
 
 
 
fe49583
8561f5d
fe49583
8561f5d
 
fe49583
8561f5d
fe49583
8561f5d
fe49583
8561f5d
fe49583
8561f5d
fe49583
8561f5d
fe49583
8561f5d
fe49583
8561f5d
f42df00
 
 
 
fe49583
8561f5d
fe49583
8561f5d
fe49583
8561f5d
fe49583
8561f5d
fe49583
8561f5d
fe49583
8561f5d
 
 
fe49583
8561f5d
fe49583
8561f5d
 
 
 
 
 
fe49583
8561f5d
 
fe49583
8561f5d
fe49583
8561f5d
fe49583
8561f5d
fe49583
8561f5d
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
---
license: llama3.2
language:
  - en
  - ja
  - de
  - fr
  - it
  - pt
  - hi
  - es
  - th
library_name: transformers
pipeline_tag: text-generation
base_model: meta-llama/Llama-3.2-3B
datasets:
  - ryota39/izumi-lab-dpo-45k
  - Aratako/Magpie-Tanuki-8B-97k
  - kunishou/databricks-dolly-15k-ja
  - kunishou/oasst1-89k-ja
tags:
  - llama3.2
---
![chibi-img](./chibi.png)
## Preface

The importance of a small parameter large language model (LLM) lies in its ability to balance performance and efficiency. As LLMs grow increasingly sophisticated, the trade-off between model size and computational resource demands becomes critical. A smaller parameter model offers significant advantages, such as reduced memory usage, faster inference times, and lower energy consumption, all while retaining a high level of accuracy and contextual understanding. These models are particularly valuable in real-world applications where resources like processing power and storage are limited, such as on mobile devices, edge computing, or low-latency environments.

## Llama 3.2 Chibi 3B

This experimental model is the result from continual pre-training of [Meta's Llama 3.2 3B](https://huggingface.co/meta-llama/Llama-3.2-3B) on a small mixture of japanese datasets.

## Architecture

[Llama 3.2 3B](https://huggingface.co/meta-llama/Llama-3.2-3B)

## Training

The model has been trained with a following mixture of datasets:
  - [ryota39/izumi-lab-dpo-45k](https://huggingface.co/datasets/ryota39/izumi-lab-dpo-45k)
  - [Aratako/Magpie-Tanuki-8B-97k](https://huggingface.co/datasets/Aratako/Magpie-Tanuki-8B-97k)
  - [kunishou/databricks-dolly-15k-ja](https://huggingface.co/datasets/kunishou/databricks-dolly-15k-ja)
  - [kunishou/oasst1-89k-ja](https://huggingface.co/datasets/kunishou/oasst1-89k-ja)

## Contributors

- [Hammaam](https://huggingface.co/AELLM)

## How to use

Starting with transformers >= 4.43.0 onward, you can run conversational inference using the Transformers pipeline abstraction or by leveraging the Auto classes with the generate() function.

Make sure to update your transformers installation via pip install --upgrade transformers.

```python
import torch
from transformers import pipeline

model_id = "AELLM/Llama-3.2-Chibi-3B"

pipe = pipeline(
    "text-generation", 
    model=model_id, 
    torch_dtype=torch.bfloat16, 
    device_map="auto"
)

pipe("人生の鍵は")
```

# License

Refer to [Llama 3.2 Community License](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/LICENSE)

# References

```bibtex
@inproceedings{zheng2024llamafactory,
  title={LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models},
  author={Yaowei Zheng and Richong Zhang and Junhao Zhang and Yanhan Ye and Zheyan Luo and Zhangchi Feng and Yongqiang Ma},
  booktitle={Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)},
  address={Bangkok, Thailand},
  publisher={Association for Computational Linguistics},
  year={2024},
  url={http://arxiv.org/abs/2403.13372}
}
```