kaixkhazaki commited on
Commit
20cb3df
·
verified ·
1 Parent(s): aee8c0b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +28 -1
README.md CHANGED
@@ -17,6 +17,15 @@ tags:
17
 
18
  This model is a fine-tuned Vision Transformer (ViT) for document layout classification based on the DocLayNet dataset.
19
 
 
 
 
 
 
 
 
 
 
20
  ## Model description
21
 
22
  This model is built upon the `google/vit-base-patch16-224-in21k` Vision Transformer architecture and fine-tuned specifically for document layout classification. The base ViT model uses a patch size of 16x16 pixels and was pre-trained on ImageNet-21k. The model has been optimized to recognize and classify different types of document layouts from the DocLayNet dataset.
@@ -44,9 +53,27 @@ The training was made with following hyperparameters:
44
  'optimizer': 'AdamW'
45
  }
46
 
 
47
 
48
  ## Evaluation results
49
  The model achieved the following performance metrics on the test set:
50
 
51
  Test Loss: 0.8622
52
- Test Accuracy: 81.36%
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
 
18
  This model is a fine-tuned Vision Transformer (ViT) for document layout classification based on the DocLayNet dataset.
19
 
20
+ Trained on images of the document categories from DocLayNet dataset where the categories namely(with their indexes) are :
21
+
22
+ {'financial_reports': 0,
23
+ 'government_tenders': 1,
24
+ 'laws_and_regulations': 2,
25
+ 'manuals': 3,
26
+ 'patents': 4,
27
+ 'scientific_articles': 5}
28
+
29
  ## Model description
30
 
31
  This model is built upon the `google/vit-base-patch16-224-in21k` Vision Transformer architecture and fine-tuned specifically for document layout classification. The base ViT model uses a patch size of 16x16 pixels and was pre-trained on ImageNet-21k. The model has been optimized to recognize and classify different types of document layouts from the DocLayNet dataset.
 
53
  'optimizer': 'AdamW'
54
  }
55
 
56
+ ```
57
 
58
  ## Evaluation results
59
  The model achieved the following performance metrics on the test set:
60
 
61
  Test Loss: 0.8622
62
+ Test Accuracy: 81.36%
63
+
64
+
65
+
66
+ ## Usage
67
+
68
+
69
+ ```python
70
+ from transformers import pipeline
71
+
72
+ # Load the model using the image-classification pipeline
73
+ pipe = pipeline("image-classification", model="kaixkhazaki/vit_doclaynet_base")
74
+
75
+ # Test it with an image
76
+ result = pipe("path_to_image.jpg")
77
+ print(result)
78
+
79
+ ```