language: | |
- ja | |
license: mit | |
tags: | |
- ja | |
- japanese | |
- gpt_neox | |
- gpt | |
- text-generation | |
- lm | |
- nlp | |
- int8 | |
- neural-compressor | |
- Intel® Neural Compressor | |
- PostTrainingStatic | |
datasets: | |
- oscar | |
model-index: | |
- name: gpt-neox-japanese-2.7b-int8 | |
results: | |
- task: | |
name: Text Generation | |
type: text-generation | |
dataset: | |
name: oscar | |
type: oscar | |
args: unshuffled_original_ast | |
metrics: | |
- name: Acurracy | |
type: loss | |
value: 4.9920 | |
# INT8 gpt-neox-japanese-2.7b-int8 | |
## Post-training static quantization | |
### PyTorch | |
This is an INT8 PyTorch model quantized with [Intel® Neural Compressor](https://github.com/intel/neural-compressor). | |
The original fp32 model comes from the fine-tuned model [abeja/gpt-neox-japanese-2.7b](https://huggingface.co/abeja/gpt-neox-japanese-2.7b). | |
The calibration dataloader is the train dataloader. The default calibration sampling size 100 isn't divisible exactly by batch size 8, so the real sampling size is 104. | |
#### Test result | |
| |INT8|FP32| | |
|---|:---:|:---:| | |
| **Accuracy (eval-loss)** |4.9920|3.5219| | |
| **Model size (MB)** |2570|5360| | |
#### Load with Intel® Neural Compressor: | |
```python | |
from optimum.intel import INCModelForCausalLM | |
model_id = "Intel/gpt-neox-japanese-2.7b-int8" | |
int8_model = INCModelForCausalLM.from_pretrained(model_id) | |
``` | |