|
---
|
|
language:
|
|
- en
|
|
- zh
|
|
tags:
|
|
- florence-2
|
|
- document-vqa
|
|
- image-text-retrieval
|
|
- fine-tuned
|
|
license: mit
|
|
base_model: microsoft/Florence-2-base-ft
|
|
---
|
|
|
|
# adamchanadam/Test_Florence-2-FT-DocVQA
|
|
|
|
This model is fine-tuned from [microsoft/Florence-2-base-ft](https://huggingface.co/microsoft/Florence-2-base-ft) for Document Visual Question Answering (DocVQA) tasks.
|
|
|
|
## Model description
|
|
|
|
- Fine-tuned for answering questions about images, specifically focused on logo recognition and company information.
|
|
- The model uses the `<DocVQA>` prompt to indicate the task type.
|
|
- Number of unique images: 28
|
|
- Number of epochs: 7
|
|
- Learning rate: 1e-06
|
|
- Optimizer: AdamW
|
|
- Early stopping: Patience of 2 epochs, delta of 0.0001
|
|
|
|
Dataset statistics: Total number of questions for fine-tuning: 560.
|
|
logo_recognition: 49 (8.75%) brand_identification: 48 (8.57%) visual_elements: 65 (11.61%) text_in_logo: 57 (10.18%) industry_classification: 49 (8.75%) product_service: 55 (9.82%) company_details: 89 (15.89%) negative_sample: 148 (26.43%)
|
|
|
|
## Intended use & limitations
|
|
|
|
- Use for answering questions about logos and company information in images
|
|
- Performance may be limited for questions or image content not represented in the training data
|
|
|
|
## Training procedure
|
|
|
|
- Images were resized and normalized according to Florence-2's preprocessing standards.
|
|
- The `<DocVQA>` prompt was used during fine-tuning to indicate the task type.
|
|
- Questions and answers were provided for each image in the training set.
|
|
- Batch size: 4
|
|
- Evaluation metric: Cross-entropy loss on a held-out validation set
|
|
|
|
For more information, please contact the model creators.
|
|
|