--- datasets: - pierreguillou/DocLayNet-base metrics: - accuracy base_model: - facebook/deit-base-distilled-patch16-224 library_name: transformers tags: - vision - document-layout-analysis - document-classification - deit - doclaynet --- # Data-efficient Image Transformer(DeiT) for Document Classification(DocLayNet) This model is a fine-tuned Data-efficient Image Transformer(DeiT) for document image classification based on the DocLayNet dataset. Trained on images of the document categories from DocLayNet dataset where the categories namely(with their indexes) are : {'financial_reports': 0, 'government_tenders': 1, 'laws_and_regulations': 2, 'manuals': 3, 'patents': 4, 'scientific_articles': 5} ## Model description DeiT(facebook/deit-base-distilled-patch16-224) finetuned on document classification ## Training data DocLayNet-base https://huggingface.co/datasets/pierreguillou/DocLayNet-base ## Training procedure hyperparameters: { 'batch_size': 128, 'num_epochs': 20, 'learning_rate': 1e-4, 'weight_decay': 0.1, 'warmup_ratio': 0.1, 'gradient_clip': 0.1, 'dropout_rate': 0.1, 'label_smoothing': 0.1 'optmizer': 'AdamW' } ## Evaluation results Test Loss: 0.8134, Test Acc: 81.56% ## Usage ```python from transformers import pipeline # Load the model using the image-classification pipeline pipe = pipeline("image-classification", model="kaixkhazaki/vit_doclaynet_base") # Test it with an image result = pipe("path_to_image.jpg") print(result)