File size: 1,513 Bytes
8fe318c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a8ec4f6
8fe318c
 
 
 
 
 
 
 
 
b4d82b2
 
8fe318c
b4d82b2
 
 
 
 
 
 
 
 
 
 
 
 
8fe318c
b4d82b2
 
8fe318c
 
b4d82b2
 
 
 
 
8fe318c
b4d82b2
8fe318c
 
 
b4d82b2
 
bb37265
 
b4d82b2
bb37265
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
---
datasets:
- pierreguillou/DocLayNet-base
metrics:
- accuracy
base_model:
- facebook/deit-base-distilled-patch16-224
library_name: transformers
tags:
- vision
- document-layout-analysis
- document-classification
- deit
- doclaynet
---

# Data-efficient Image Transformer(DeiT) for Document Classification(DocLayNet)

This model is a fine-tuned Data-efficient Image Transformer(DeiT) for document image classification based on the DocLayNet dataset.

Trained on images of the document categories from DocLayNet dataset where the categories namely(with their indexes) are :

{'financial_reports': 0,
 'government_tenders': 1,
 'laws_and_regulations': 2,
 'manuals': 3,
 'patents': 4,
 'scientific_articles': 5} 
## Model description

DeiT(facebook/deit-base-distilled-patch16-224) finetuned on document classification



## Training data
DocLayNet-base
https://huggingface.co/datasets/pierreguillou/DocLayNet-base

## Training procedure


hyperparameters:

{
    'batch_size': 128,
    'num_epochs': 20,
    'learning_rate': 1e-4,
    'weight_decay': 0.1,
    'warmup_ratio': 0.1,
    'gradient_clip': 0.1,
    'dropout_rate': 0.1,
    'label_smoothing': 0.1
    'optmizer': 'AdamW'
}

## Evaluation results

Test Loss: 0.8134, Test Acc: 81.56%


## Usage
```python
from transformers import pipeline

# Load the model using the image-classification pipeline
pipe = pipeline("image-classification", model="kaixkhazaki/vit_doclaynet_base")

# Test it with an image
result = pipe("path_to_image.jpg")
print(result)