File size: 2,326 Bytes
57b0bdd
1e3ebac
57b0bdd
33d6607
bbf42c3
 
 
 
b28dd3e
 
 
 
 
57b0bdd
73a21c8
1e3ebac
73a21c8
0a587ab
73a21c8
1e3ebac
667af5a
1e3ebac
667af5a
1e3ebac
ff904df
49a967d
1e3ebac
ff904df
1e3ebac
 
49a967d
1e3ebac
 
 
 
 
49a967d
1e3ebac
 
 
 
 
49a967d
1e3ebac
 
 
 
 
 
49a967d
7743a4e
3a93017
 
7743a4e
3a93017
 
1e3ebac
667af5a
1e3ebac
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
---
language: en
license: mit
pipeline_tag: document-question-answering
tags:
 - layoutlm
 - document-question-answering
 - pdf
widget:
- text: "What is the invoice number?"
  src: "https://huggingface.co/spaces/impira/docquery/resolve/2359223c1837a7587402bda0f2643382a6eefeab/invoice.png"
- text: "What is the purchase amount?"
  src: "https://huggingface.co/spaces/impira/docquery/resolve/2359223c1837a7587402bda0f2643382a6eefeab/contract.jpeg"
---

# LayoutLM for Visual Question Answering

This is a fine-tuned version of the multi-modal [LayoutLM](https://aka.ms/layoutlm) model for the task of question answering on documents. It has been fine-tuned using both the [SQuAD2.0](https://huggingface.co/datasets/squad_v2) and [DocVQA](https://www.docvqa.org/) datasets.

## Getting started with the model

To run these examples, you must have [PIL](https://pillow.readthedocs.io/en/stable/installation.html), [pytesseract](https://pypi.org/project/pytesseract/), and [PyTorch](https://pytorch.org/get-started/locally/) installed in addition to [transformers](https://huggingface.co/docs/transformers/index).

```python
from transformers import pipeline

nlp = pipeline(
    "document-question-answering",
    model="impira/layoutlm-document-qa",
)

nlp(
    "https://templates.invoicehome.com/invoice-template-us-neat-750px.png",
    "What is the invoice number?"
)
# {'score': 0.9943977, 'answer': 'us-001', 'start': 15, 'end': 15}

nlp(
    "https://miro.medium.com/max/787/1*iECQRIiOGTmEFLdWkVIH2g.jpeg",
    "What is the purchase amount?"
)
# {'score': 0.9912159, 'answer': '$1,000,000,000', 'start': 97, 'end': 97}

nlp(
    "https://www.accountingcoach.com/wp-content/uploads/2013/10/[email protected]",
    "What are the 2020 net sales?"
)
# {'score': 0.59147286, 'answer': '$ 3,750', 'start': 19, 'end': 20}
```

**NOTE**: This model and pipeline was recently landed in transformers via [PR #18407](https://github.com/huggingface/transformers/pull/18407) and [PR #18414](https://github.com/huggingface/transformers/pull/18414), so you'll need to use a recent version of transformers, for example:

```bash
pip install git+https://github.com/huggingface/transformers.git@2ef774211733f0acf8d3415f9284c49ef219e991
```

## About us

This model was created by the team at [Impira](https://www.impira.com/).