pszemraj
/

nougat-small-onnx

Image-Text-to-Text

vision-encoder-decoder

Inference Endpoints

Model card Files Files and versions

nougat-small-onnx / README.md

pszemraj's picture

Update README.md

237104f about 1 year ago

|

1.17 kB

	---
	license: cc-by-4.0
	tags:
	- nougat
	- small
	- ocr
	---


	# nougat-small onnx


	https://huggingface.co/facebook/nougat-small but exported to onnx. This is not quantized.


	```python
	from transformers import NougatProcessor
	from optimum.onnxruntime import ORTModelForVision2Seq

	model_name = 'pszemraj/nougat-small-onnx'
	processor = NougatProcessor.from_pretrained(model_name)
	model = ORTModelForVision2Seq.from_pretrained(
	model_name,
	provider="CPUExecutionProvider", # 'CUDAExecutionProvider' for gpu
	use_merged=False,
	use_io_binding=True
	)
	```

	on colab CPU-only (_at time of writing_) you may get `CuPy` errors, to solve this uninstall it:

	```sh
	pip uninstall cupy-cuda11x -y
	```

	## how do da inference?

	See [here](https://github.com/NielsRogge/Transformers-Tutorials/blob/b46d3e89e631701ef205297435064ab780c4853a/Nougat/Inference_with_Nougat_to_read_scientific_PDFs.ipynb) or [this basic notebook](https://huggingface.co/pszemraj/nougat-small-onnx/blob/main/nougat-small-onnx-example.ipynb) I uploaded. It seems ONNX brings CPU inference times to 'feasible' - it took ~15 mins for _Attention is All You Meme_ on Colab free CPU runtime.