language: | |
- en | |
- zh | |
tags: | |
- florence-2 | |
- document-vqa | |
- image-text-retrieval | |
- fine-tuned | |
license: mit | |
base_model: microsoft/Florence-2-base-ft | |
# adamchanadam/Test_Florence-2-FT-DocVQA | |
This model is fine-tuned from [microsoft/Florence-2-base-ft](https://huggingface.co/microsoft/Florence-2-base-ft) for Document Visual Question Answering (DocVQA) tasks. | |
## Model description | |
- Fine-tuned for answering questions about images, specifically focused on logo recognition and company information. | |
- The model uses the `<DocVQA>` prompt to indicate the task type. | |
- Number of training images: 13 | |
- Number of validation images: 2 | |
- Number of epochs: 7 | |
- Learning rate: 1e-06 | |
- Optimizer: AdamW | |
- Early stopping: Patience of 3 epochs, delta of 0.01 | |
## Intended use & limitations | |
- Use for answering questions about logos and company information in images | |
- Performance may be limited for questions or image content not represented in the training data | |
## Training procedure | |
- Images were resized and normalized according to Florence-2's preprocessing standards. | |
- The `<DocVQA>` prompt was used during fine-tuning to indicate the task type. | |
- Questions and answers were provided for each image in the training set. | |
- Batch size: 6 | |
- Evaluation metric: Cross-entropy loss on a held-out validation set | |
For more information, please contact the model creators. | |