kaixkhazaki
/

vit_doclaynet_base

Image Classification

document-layout-analysis

document-classification

Inference Endpoints

Model card Files Files and versions Community

kaixkhazaki commited on 6 days ago

Commit

20cb3df

·

verified ·

1 Parent(s): aee8c0b

Update README.md

Files changed (1) hide show

README.md +28 -1

README.md CHANGED Viewed

@@ -17,6 +17,15 @@ tags:
 This model is a fine-tuned Vision Transformer (ViT) for document layout classification based on the DocLayNet dataset.
 ## Model description
 This model is built upon the `google/vit-base-patch16-224-in21k` Vision Transformer architecture and fine-tuned specifically for document layout classification. The base ViT model uses a patch size of 16x16 pixels and was pre-trained on ImageNet-21k. The model has been optimized to recognize and classify different types of document layouts from the DocLayNet dataset.
@@ -44,9 +53,27 @@ The training was made with following hyperparameters:
     'optimizer': 'AdamW'
 }
 ## Evaluation results
 The model achieved the following performance metrics on the test set:
 Test Loss: 0.8622
-Test Accuracy: 81.36%

 This model is a fine-tuned Vision Transformer (ViT) for document layout classification based on the DocLayNet dataset.
+Trained on images of the document categories from DocLayNet dataset where the categories namely(with their indexes) are :
+{'financial_reports': 0,
+ 'government_tenders': 1,
+ 'laws_and_regulations': 2,
+ 'manuals': 3,
+ 'patents': 4,
+ 'scientific_articles': 5}
 ## Model description
 This model is built upon the `google/vit-base-patch16-224-in21k` Vision Transformer architecture and fine-tuned specifically for document layout classification. The base ViT model uses a patch size of 16x16 pixels and was pre-trained on ImageNet-21k. The model has been optimized to recognize and classify different types of document layouts from the DocLayNet dataset.
     'optimizer': 'AdamW'
 }
+```
 ## Evaluation results
 The model achieved the following performance metrics on the test set:
 Test Loss: 0.8622
+Test Accuracy: 81.36%
+## Usage
+```python
+from transformers import pipeline
+# Load the model using the image-classification pipeline
+pipe = pipeline("image-classification", model="kaixkhazaki/vit_doclaynet_base")
+# Test it with an image
+result = pipe("path_to_image.jpg")
+print(result)
+```