kaixkhazaki
/

vit_doclaynet_base

Image Classification

document-layout-analysis

document-classification

Inference Endpoints

Model card Files Files and versions Community

kaixkhazaki commited on 3 days ago

Commit

48d5888

·

verified ·

1 Parent(s): d318475

Update README.md

Files changed (1) hide show

README.md +3 -0

README.md CHANGED Viewed

@@ -19,6 +19,7 @@ This model is a fine-tuned Vision Transformer (ViT) for document layout classifi
 Trained on images of the document categories from DocLayNet dataset where the categories namely(with their indexes) are :
 {'financial_reports': 0,
  'government_tenders': 1,
  'laws_and_regulations': 2,
@@ -26,6 +27,8 @@ Trained on images of the document categories from DocLayNet dataset where the ca
  'patents': 4,
  'scientific_articles': 5}
 ## Model description
 This model is built upon the `google/vit-base-patch16-224-in21k` Vision Transformer architecture and fine-tuned specifically for document layout classification. The base ViT model uses a patch size of 16x16 pixels and was pre-trained on ImageNet-21k. The model has been optimized to recognize and classify different types of document layouts from the DocLayNet dataset.

 Trained on images of the document categories from DocLayNet dataset where the categories namely(with their indexes) are :
+```python
 {'financial_reports': 0,
  'government_tenders': 1,
  'laws_and_regulations': 2,
  'patents': 4,
  'scientific_articles': 5}
+```
 ## Model description
 This model is built upon the `google/vit-base-patch16-224-in21k` Vision Transformer architecture and fine-tuned specifically for document layout classification. The base ViT model uses a patch size of 16x16 pixels and was pre-trained on ImageNet-21k. The model has been optimized to recognize and classify different types of document layouts from the DocLayNet dataset.