|
--- |
|
license: llama3.2 |
|
language: |
|
- en |
|
- ja |
|
- de |
|
- fr |
|
- it |
|
- pt |
|
- hi |
|
- es |
|
- th |
|
library_name: transformers |
|
pipeline_tag: text-generation |
|
base_model: meta-llama/Llama-3.2-3B |
|
datasets: |
|
- ryota39/izumi-lab-dpo-45k |
|
- Aratako/Magpie-Tanuki-8B-97k |
|
- kunishou/databricks-dolly-15k-ja |
|
- kunishou/oasst1-89k-ja |
|
tags: |
|
- llama3.2 |
|
--- |
|
![chibi-img](./chibi.png) |
|
## Preface |
|
|
|
The importance of a small parameter large language model (LLM) lies in its ability to balance performance and efficiency. As LLMs grow increasingly sophisticated, the trade-off between model size and computational resource demands becomes critical. A smaller parameter model offers significant advantages, such as reduced memory usage, faster inference times, and lower energy consumption, all while retaining a high level of accuracy and contextual understanding. These models are particularly valuable in real-world applications where resources like processing power and storage are limited, such as on mobile devices, edge computing, or low-latency environments. |
|
|
|
## Llama 3.2 Chibi 3B |
|
|
|
This experimental model is the result from continual pre-training of [Meta's Llama 3.2 3B](https://huggingface.co/meta-llama/Llama-3.2-3B) on a small mixture of japanese datasets. |
|
|
|
## Architecture |
|
|
|
[Llama 3.2 3B](https://huggingface.co/meta-llama/Llama-3.2-3B) |
|
|
|
## Training |
|
|
|
The model has been trained with a following mixture of datasets: |
|
- [ryota39/izumi-lab-dpo-45k](https://huggingface.co/datasets/ryota39/izumi-lab-dpo-45k) |
|
- [Aratako/Magpie-Tanuki-8B-97k](https://huggingface.co/datasets/Aratako/Magpie-Tanuki-8B-97k) |
|
- [kunishou/databricks-dolly-15k-ja](https://huggingface.co/datasets/kunishou/databricks-dolly-15k-ja) |
|
- [kunishou/oasst1-89k-ja](https://huggingface.co/datasets/kunishou/oasst1-89k-ja) |
|
|
|
## Contributors |
|
|
|
- [Hammaam](https://huggingface.co/AELLM) |
|
|
|
## How to use |
|
|
|
Starting with transformers >= 4.43.0 onward, you can run conversational inference using the Transformers pipeline abstraction or by leveraging the Auto classes with the generate() function. |
|
|
|
Make sure to update your transformers installation via pip install --upgrade transformers. |
|
|
|
```python |
|
import torch |
|
from transformers import pipeline |
|
|
|
model_id = "AELLM/Llama-3.2-Chibi-3B" |
|
|
|
pipe = pipeline( |
|
"text-generation", |
|
model=model_id, |
|
torch_dtype=torch.bfloat16, |
|
device_map="auto" |
|
) |
|
|
|
pipe("人生の鍵は") |
|
``` |
|
|
|
# License |
|
|
|
Refer to [Llama 3.2 Community License](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/LICENSE) |
|
|
|
# References |
|
|
|
```bibtex |
|
@inproceedings{zheng2024llamafactory, |
|
title={LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models}, |
|
author={Yaowei Zheng and Richong Zhang and Junhao Zhang and Yanhan Ye and Zheyan Luo and Zhangchi Feng and Yongqiang Ma}, |
|
booktitle={Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)}, |
|
address={Bangkok, Thailand}, |
|
publisher={Association for Computational Linguistics}, |
|
year={2024}, |
|
url={http://arxiv.org/abs/2403.13372} |
|
} |
|
``` |