nicholasKluge
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -43,7 +43,9 @@ co2_eq_emissions:
|
|
43 |
geographical_location: United States of America
|
44 |
hardware_used: NVIDIA A100-SXM4-40GB
|
45 |
---
|
46 |
-
# TeenyTinyLlama-460m-Chat
|
|
|
|
|
47 |
|
48 |
TeenyTinyLlama is a pair of small foundational models trained in Brazilian Portuguese.
|
49 |
|
@@ -55,17 +57,26 @@ This repository contains a version of [TeenyTinyLlama-460m](https://huggingface.
|
|
55 |
- **Batch size:** 4
|
56 |
- **Optimizer:** `torch.optim.AdamW` (warmup_steps = 1e3, learning_rate = 1e-5, epsilon = 1e-8)
|
57 |
- **GPU:** 1 NVIDIA A100-SXM4-40GB
|
58 |
-
- **
|
|
|
|
|
|
|
|
|
|
|
59 |
|
60 |
-
This repository has the [source code](https://github.com/Nkluge-correa/
|
61 |
|
62 |
## Usage
|
63 |
|
|
|
|
|
64 |
The following special tokens are used to mark the user side of the interaction and the model's response:
|
65 |
|
66 |
`<instruction>`What is a language model?`</instruction>`A language model is a probability distribution over a vocabulary.`</s>`
|
67 |
|
68 |
```python
|
|
|
|
|
69 |
from transformers import AutoTokenizer, AutoModelForCausalLM
|
70 |
import torch
|
71 |
|
|
|
43 |
geographical_location: United States of America
|
44 |
hardware_used: NVIDIA A100-SXM4-40GB
|
45 |
---
|
46 |
+
# TeenyTinyLlama-460m-Chat-awq
|
47 |
+
|
48 |
+
**Note: This model is a quantized version of [TeenyTinyLlama-460m-Chat](https://huggingface.co/nicholasKluge/TeenyTinyLlama-460m-Chat). Quantization was performed using [AutoAWQ](https://github.com/casper-hansen/AutoAWQ), allowing this version to be 80% lighter with almost no performance loss. A GPU is required to run the AWQ-quantized models.**
|
49 |
|
50 |
TeenyTinyLlama is a pair of small foundational models trained in Brazilian Portuguese.
|
51 |
|
|
|
57 |
- **Batch size:** 4
|
58 |
- **Optimizer:** `torch.optim.AdamW` (warmup_steps = 1e3, learning_rate = 1e-5, epsilon = 1e-8)
|
59 |
- **GPU:** 1 NVIDIA A100-SXM4-40GB
|
60 |
+
- **Quantization Configuration:**
|
61 |
+
- `bits`: 4
|
62 |
+
- `group_size`: 128
|
63 |
+
- `quant_method`: "awq"
|
64 |
+
- `version`: "gemm"
|
65 |
+
- `zero_point`: True
|
66 |
|
67 |
+
This repository has the [source code](https://github.com/Nkluge-correa/TeenyTinyLlama) used to train this model.
|
68 |
|
69 |
## Usage
|
70 |
|
71 |
+
**Note: Using quantized models required the installation of `autoawq==0.1.7`. A GPU is required to run the AWQ-quantized models.**
|
72 |
+
|
73 |
The following special tokens are used to mark the user side of the interaction and the model's response:
|
74 |
|
75 |
`<instruction>`What is a language model?`</instruction>`A language model is a probability distribution over a vocabulary.`</s>`
|
76 |
|
77 |
```python
|
78 |
+
!pip install autoawq==0.1.7 -q
|
79 |
+
|
80 |
from transformers import AutoTokenizer, AutoModelForCausalLM
|
81 |
import torch
|
82 |
|