nielsr HF staff commited on
Commit
2452050
·
1 Parent(s): ed9e35b

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +51 -0
README.md ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - donut
5
+ - image-to-text
6
+ - vision
7
+ ---
8
+
9
+ # Donut (base-sized model, fine-tuned on CORD)
10
+
11
+ Donut model fine-tuned on CORD. It was introduced in the paper [OCR-free Document Understanding Transformer](https://arxiv.org/abs/2111.15664) by Geewok et al. and first released in [this repository](https://github.com/clovaai/donut).
12
+
13
+ Disclaimer: The team releasing Donut did not write a model card for this model so this model card has been written by the Hugging Face team.
14
+
15
+ ## Model description
16
+
17
+ Donut consists of a vision encoder (Swin Transformer) and a text decoder (BART). Given an image, the encoder first encodes the image into a tensor of embeddings (of shape batch_size, seq_len, hidden_size), after which the decoder autoregressively generates text, conditioned on the encoding of the encoder.
18
+
19
+ ![model image](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/donut_architecture.jpg)
20
+
21
+ ## Intended uses & limitations
22
+
23
+ This model is fine-tuned on CORD, a document parsing dataset.
24
+
25
+ We refer to the [documentation](https://huggingface.co/docs/transformers/main/en/model_doc/donut) which includes code examples.
26
+
27
+ ### BibTeX entry and citation info
28
+
29
+ ```bibtex
30
+ @article{DBLP:journals/corr/abs-2111-15664,
31
+ author = {Geewook Kim and
32
+ Teakgyu Hong and
33
+ Moonbin Yim and
34
+ Jinyoung Park and
35
+ Jinyeong Yim and
36
+ Wonseok Hwang and
37
+ Sangdoo Yun and
38
+ Dongyoon Han and
39
+ Seunghyun Park},
40
+ title = {Donut: Document Understanding Transformer without {OCR}},
41
+ journal = {CoRR},
42
+ volume = {abs/2111.15664},
43
+ year = {2021},
44
+ url = {https://arxiv.org/abs/2111.15664},
45
+ eprinttype = {arXiv},
46
+ eprint = {2111.15664},
47
+ timestamp = {Thu, 02 Dec 2021 10:50:44 +0100},
48
+ biburl = {https://dblp.org/rec/journals/corr/abs-2111-15664.bib},
49
+ bibsource = {dblp computer science bibliography, https://dblp.org}
50
+ }
51
+ ```