fawzanaramam's picture
Update README.md
4cf9d04 verified
metadata
language:
  - ar
license: apache-2.0
base_model: openai/whisper-medium
tags:
  - fine-tuned
  - Quran
  - automatic-speech-recognition
  - arabic
  - whisper
datasets:
  - fawzanaramam/the-amma-juz
model-index:
  - name: Whisper Medium Finetuned on Amma Juz of Quran
    results:
      - task:
          type: automatic-speech-recognition
          name: Speech Recognition
        dataset:
          name: The Amma Juz Dataset
          type: fawzanaramam/the-amma-juz
        metrics:
          - type: eval_loss
            value: 0.0032
          - type: eval_wer
            value: 0.5102

Whisper Medium Finetuned on Amma Juz of Quran

This model is a fine-tuned version of openai/whisper-medium, tailored for transcribing Arabic audio with a focus on Quranic recitation from the Amma Juz dataset. It is optimized for high accuracy and minimal word error rates in Quranic transcription tasks.

Model Description

Whisper Medium is a transformer-based automatic speech recognition (ASR) model developed by OpenAI. This fine-tuned version leverages the Amma Juz dataset to enhance performance in recognizing Quranic recitations. The model is particularly effective for Arabic speech transcription in religious contexts, while retaining Whisper's general-purpose ASR capabilities.

Performance Metrics

On the evaluation set, the model achieved:

  • Evaluation Loss: 0.0032
  • Word Error Rate (WER): 0.5102%
  • Evaluation Runtime: 47.9061 seconds
  • Evaluation Samples per Second: 2.087
  • Evaluation Steps per Second: 0.271

These metrics demonstrate the model's superior accuracy and efficiency, making it suitable for applications requiring high-quality Quranic transcription.

Intended Uses & Limitations

Intended Uses

  • Speech-to-text transcription of Quranic recitation in Arabic, specifically from the Amma Juz.
  • Research and development of tools for Quranic education and learning.
  • Projects focused on Arabic ASR in religious and educational domains.

Limitations

  • The model is fine-tuned on Quranic recitations and may not generalize well to non-Quranic Arabic speech or casual conversations.
  • Variations in recitation style, audio quality, or heavy accents may impact transcription accuracy.
  • For optimal performance, use clean and high-quality audio inputs.

Training and Evaluation Data

The model was trained using the Amma Juz dataset, which includes Quranic audio recordings and corresponding transcripts. The dataset was carefully curated to ensure the integrity and accuracy of Quranic content.

Training Procedure

Training Hyperparameters

The following hyperparameters were used during training:

  • Learning Rate: 1e-05
  • Training Batch Size: 16
  • Evaluation Batch Size: 8
  • Seed: 42
  • Optimizer: Adam (betas=(0.9, 0.999), epsilon=1e-08)
  • Learning Rate Scheduler: Linear
  • Warmup Steps: 10
  • Number of Epochs: 3.0
  • Mixed Precision Training: Native AMP

Framework Versions

  • Transformers: 4.41.1
  • PyTorch: 2.2.1+cu121
  • Datasets: 2.19.1
  • Tokenizers: 0.19.1