cxfajar197
/

urdu-ocr

vision-encoder-decoder

image-text-to-text

Inference Endpoints

Model card Files Files and versions Community

cxfajar197 commited on Nov 26, 2024

Commit

250cd40

·

verified ·

1 Parent(s): 631f287

Update README.md

Files changed (1) hide show

README.md +1 -27

README.md CHANGED Viewed

@@ -19,21 +19,9 @@ language:
 <!-- Provide a quick summary of what the model is/does. -->
-This is an Urdu OCR model designed for handwriting recognition tasks. It utilizes a VisionEncoderDecoderModel with a ViT-based encoder and a BERT-based decoder, fine-tuned on a custom dataset for robust and accurate text extraction from images.
-## Model Details
-### Model Description
-<!-- Provide a longer summary of what this model is. -->
-This model leverages the combination of a Vision Transformer (ViT) encoder (`facebook/deit-base-distilled-patch16-384`) and a multilingual BERT decoder (`bert-base-multilingual-cased`) to perform OCR tasks in Urdu. The model is fine-tuned on a dataset of 46,742 image-text pairs, using advanced data augmentation techniques to improve robustness.
-- **Developed by:** Fajar Pervaiz
-- **Funded by:** [More Information Needed]
-- **Shared by:** [More Information Needed]
-- **Model type:** VisionEncoderDecoderModel
-- **Language(s) (NLP):** Urdu (`ur`)
@@ -86,10 +74,6 @@ print("Generated Text:", generated_text)
 The model was tested on handwritten text images with varying font styles and complexities.
-#### Metrics
-- **Character Error Rate (CER):** [Value Needed]
-- **Word Error Rate (WER):** [Value Needed]
@@ -102,11 +86,8 @@ The model achieves competitive accuracy on Urdu handwritten text recognition tas
 - **Hardware Type:** NVIDIA GPU
-## Technical Specifications
-### Model Architecture and Objective
-The model uses a VisionEncoderDecoder architecture combining a ViT encoder and a BERT decoder.
 ### Compute Infrastructure
@@ -133,15 +114,8 @@ Python, PyTorch, Hugging Face Transformers
-## Glossary
-- **CER:** Character Error Rate
-- **WER:** Word Error Rate
-- **OCR:** Optical Character Recognition
-## More Information
-[More Information Needed]
 ## Model Card Authors

 <!-- Provide a quick summary of what the model is/does. -->
+This model, cxfajar197/urdu-ocr, is trained on Urdu data specifically designed for OCR tasks. It works best with single-line Urdu text images, primarily focusing on printed text. The model is optimized for extracting accurate Urdu text from such images and can be easily utilized using the Hugging Face pipeline API.
 The model was tested on handwritten text images with varying font styles and complexities.
 - **Hardware Type:** NVIDIA GPU
 ### Compute Infrastructure
 ## Model Card Authors