File size: 1,513 Bytes
8fe318c a8ec4f6 8fe318c b4d82b2 8fe318c b4d82b2 8fe318c b4d82b2 8fe318c b4d82b2 8fe318c b4d82b2 8fe318c b4d82b2 bb37265 b4d82b2 bb37265 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 |
---
datasets:
- pierreguillou/DocLayNet-base
metrics:
- accuracy
base_model:
- facebook/deit-base-distilled-patch16-224
library_name: transformers
tags:
- vision
- document-layout-analysis
- document-classification
- deit
- doclaynet
---
# Data-efficient Image Transformer(DeiT) for Document Classification(DocLayNet)
This model is a fine-tuned Data-efficient Image Transformer(DeiT) for document image classification based on the DocLayNet dataset.
Trained on images of the document categories from DocLayNet dataset where the categories namely(with their indexes) are :
{'financial_reports': 0,
'government_tenders': 1,
'laws_and_regulations': 2,
'manuals': 3,
'patents': 4,
'scientific_articles': 5}
## Model description
DeiT(facebook/deit-base-distilled-patch16-224) finetuned on document classification
## Training data
DocLayNet-base
https://huggingface.co/datasets/pierreguillou/DocLayNet-base
## Training procedure
hyperparameters:
{
'batch_size': 128,
'num_epochs': 20,
'learning_rate': 1e-4,
'weight_decay': 0.1,
'warmup_ratio': 0.1,
'gradient_clip': 0.1,
'dropout_rate': 0.1,
'label_smoothing': 0.1
'optmizer': 'AdamW'
}
## Evaluation results
Test Loss: 0.8134, Test Acc: 81.56%
## Usage
```python
from transformers import pipeline
# Load the model using the image-classification pipeline
pipe = pipeline("image-classification", model="kaixkhazaki/vit_doclaynet_base")
# Test it with an image
result = pipe("path_to_image.jpg")
print(result)
|