cxfajar197
/

urdu-ocr

@@ -22,11 +22,8 @@ This is an Urdu OCR model designed for handwriting recognition tasks. It utilize
 This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
 - **Developed by:** Fajar Pervaiz
-- **Funded by [optional]:** [More Information Needed]
-- **Shared by [optional]:** [More Information Needed]
 - **Model type:** VisionEncoderDecoderModel
 - **Language(s) (NLP):** Urdu (ur)
-- **License:** [More Information Needed]
 - **Finetuned from model [optional]:** facebook/deit-base-distilled-patch16-384, bert-base-multilingual-cased
 ### Model Sources [optional]
@@ -46,28 +43,25 @@ This is the model card of a 🤗 transformers model that has been pushed on the
 <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
 This model can be directly used for Urdu handwriting recognition tasks, particularly for extracting text from scanned documents or handwritten notes.
-[More Information Needed]
 ### Downstream Use [optional]
 <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
 This model can be fine-tuned further for specific handwriting datasets or integrated into larger OCR systems for Urdu or multilingual text recognition.
-[More Information Needed]
 ### Out-of-Scope Use
 <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
 The model is not suitable for languages other than Urdu or domains with highly noisy or distorted images without further fine-tuning.
-[More Information Needed]
 ## Bias, Risks, and Limitations
 <!-- This section is meant to convey both technical and sociotechnical limitations. -->
 The model may exhibit biases inherent in the training data. Misrecognition of complex or ambiguous handwriting is possible. Users should carefully evaluate its performance in their specific use case.
-[More Information Needed]
 ### Recommendations
@@ -83,7 +77,7 @@ processor = TrOCRProcessor.from_pretrained("path/to/processor")
 model = VisionEncoderDecoderModel.from_pretrained("path/to/model")
-[More Information Needed]
 ## Training Details
@@ -93,7 +87,7 @@ model = VisionEncoderDecoderModel.from_pretrained("path/to/model")
 <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
 The training data comprises 46,742 image-text pairs from a custom dataset of Urdu handwritten texts.
-[More Information Needed]
 ### Training Procedure
@@ -102,12 +96,12 @@ Images were resized to 384x384 pixels and normalized. Augmentations such as Elas
 #### Preprocessing [optional]
-[More Information Needed]
 #### Training Hyperparameters
-- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
 - Training regime: Mixed precision (fp16)
 - Learning rate: 4e-5
 - Batch size: 8
@@ -119,7 +113,6 @@ Images were resized to 384x384 pixels and normalized. Augmentations such as Elas
 <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
-[More Information Needed]
 ## Evaluation
@@ -132,7 +125,7 @@ Images were resized to 384x384 pixels and normalized. Augmentations such as Elas
 <!-- This should link to a Dataset Card if possible. -->
 A subset of 4,675 image-text pairs was used for evaluation.
-[More Information Needed]
 #### Factors
@@ -141,17 +134,17 @@ The model was tested on handwritten text images with varying font styles and com
-[More Information Needed]
 #### Metrics
 <!-- These are the evaluation metrics being used, ideally with a description of why. -->
-[More Information Needed]
 ### Results
-[More Information Needed]
 #### Summary
@@ -161,7 +154,7 @@ The model was tested on handwritten text images with varying font styles and com
 <!-- Relevant interpretability work for the model goes here -->
-[More Information Needed]
 ## Environmental Impact
@@ -184,7 +177,7 @@ The model uses a VisionEncoderDecoder architecture combining a ViT encoder and a
 ### Compute Infrastructure
-[More Information Needed]
 #### Hardware
@@ -193,7 +186,7 @@ NVIDIA GPU (e.g., A100)
 #### Software
-[More Information Needed]
 Python, PyTorch, Hugging Face Transformers
@@ -208,7 +201,7 @@ Python, PyTorch, Hugging Face Transformers
 **APA:**
-[More Information Needed]
 ## Glossary [optional]
@@ -217,7 +210,7 @@ CER: Character Error Rate
 WER: Word Error Rate
 OCR: Optical Character Recognition
-[More Information Needed]
 ## More Information [optional]
@@ -225,10 +218,10 @@ OCR: Optical Character Recognition
 ## Model Card Authors [optional]
-[More Information Needed]
 Fajar Pervaiz
 ## Model Card Contact
-[More Information Needed]
 [email protected]

 This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
 - **Developed by:** Fajar Pervaiz
 - **Model type:** VisionEncoderDecoderModel
 - **Language(s) (NLP):** Urdu (ur)
 - **Finetuned from model [optional]:** facebook/deit-base-distilled-patch16-384, bert-base-multilingual-cased
 ### Model Sources [optional]
 <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
 This model can be directly used for Urdu handwriting recognition tasks, particularly for extracting text from scanned documents or handwritten notes.
 ### Downstream Use [optional]
 <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
 This model can be fine-tuned further for specific handwriting datasets or integrated into larger OCR systems for Urdu or multilingual text recognition.
 ### Out-of-Scope Use
 <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
 The model is not suitable for languages other than Urdu or domains with highly noisy or distorted images without further fine-tuning.
 ## Bias, Risks, and Limitations
 <!-- This section is meant to convey both technical and sociotechnical limitations. -->
 The model may exhibit biases inherent in the training data. Misrecognition of complex or ambiguous handwriting is possible. Users should carefully evaluate its performance in their specific use case.
 ### Recommendations
 model = VisionEncoderDecoderModel.from_pretrained("path/to/model")
 ## Training Details
 <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
 The training data comprises 46,742 image-text pairs from a custom dataset of Urdu handwritten texts.
 ### Training Procedure
 #### Preprocessing [optional]
 #### Training Hyperparameters
+- **Training regime:**  <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
 - Training regime: Mixed precision (fp16)
 - Learning rate: 4e-5
 - Batch size: 8
 <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
 ## Evaluation
 <!-- This should link to a Dataset Card if possible. -->
 A subset of 4,675 image-text pairs was used for evaluation.
 #### Factors
 #### Metrics
 <!-- These are the evaluation metrics being used, ideally with a description of why. -->
 ### Results
 #### Summary
 <!-- Relevant interpretability work for the model goes here -->
 ## Environmental Impact
 ### Compute Infrastructure
 #### Hardware
 #### Software
 Python, PyTorch, Hugging Face Transformers
 **APA:**
 ## Glossary [optional]
 WER: Word Error Rate
 OCR: Optical Character Recognition
 ## More Information [optional]
 ## Model Card Authors [optional]
 Fajar Pervaiz
 ## Model Card Contact
 [email protected]