|
--- |
|
license: cc-by-4.0 |
|
tags: |
|
- nougat |
|
- small |
|
- ocr |
|
--- |
|
|
|
|
|
# nougat-small onnx |
|
|
|
|
|
https://huggingface.co/facebook/nougat-small but exported to onnx. This is **not quantized**. |
|
|
|
|
|
```python |
|
from transformers import NougatProcessor |
|
from optimum.onnxruntime import ORTModelForVision2Seq |
|
|
|
model_name = 'pszemraj/nougat-small-onnx' |
|
processor = NougatProcessor.from_pretrained(model_name) |
|
model = ORTModelForVision2Seq.from_pretrained( |
|
model_name, |
|
provider="CPUExecutionProvider", # 'CUDAExecutionProvider' for gpu |
|
use_merged=False, |
|
use_io_binding=True |
|
) |
|
``` |
|
|
|
on colab CPU-only (_at time of writing_) you may get `CuPy` errors, to solve this uninstall it: |
|
|
|
```sh |
|
pip uninstall cupy-cuda11x -y |
|
``` |
|
|
|
## how do da inference? |
|
|
|
See [here](https://github.com/NielsRogge/Transformers-Tutorials/blob/b46d3e89e631701ef205297435064ab780c4853a/Nougat/Inference_with_Nougat_to_read_scientific_PDFs.ipynb) or [this basic notebook](https://huggingface.co/pszemraj/nougat-small-onnx/blob/main/nougat-small-onnx-example.ipynb) I uploaded. It seems ONNX brings CPU inference times to 'feasible' - it took ~15 mins for _Attention is All You Meme_ on Colab free CPU runtime. |
|
|
|
|