File size: 3,140 Bytes
01ac050
 
 
 
4ba4139
01ac050
4cf9d04
 
 
 
 
01ac050
 
 
4ba4139
4cf9d04
 
 
 
 
 
 
 
 
 
 
 
01ac050
 
4ba4139
01ac050
4cf9d04
 
 
 
 
01ac050
4cf9d04
01ac050
4cf9d04
 
 
 
 
 
01ac050
4cf9d04
01ac050
4cf9d04
01ac050
4cf9d04
 
 
 
01ac050
4cf9d04
 
 
 
01ac050
4cf9d04
01ac050
4cf9d04
01ac050
4cf9d04
 
 
01ac050
4cf9d04
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
---
language:
- ar
license: apache-2.0
base_model: openai/whisper-medium
tags:
- fine-tuned
- Quran
- automatic-speech-recognition
- arabic
- whisper
datasets:
- fawzanaramam/the-amma-juz
model-index:
- name: Whisper Medium Finetuned on Amma Juz of Quran
  results:
  - task:
      type: automatic-speech-recognition
      name: Speech Recognition
    dataset:
      name: The Amma Juz Dataset
      type: fawzanaramam/the-amma-juz
    metrics:
      - type: eval_loss
        value: 0.0032
      - type: eval_wer
        value: 0.5102
---

# Whisper Medium Finetuned on Amma Juz of Quran

This model is a fine-tuned version of [openai/whisper-medium](https://huggingface.co/openai/whisper-medium), tailored for transcribing Arabic audio with a focus on Quranic recitation from the *Amma Juz* dataset. It is optimized for high accuracy and minimal word error rates in Quranic transcription tasks.

## Model Description

Whisper Medium is a transformer-based automatic speech recognition (ASR) model developed by OpenAI. This fine-tuned version leverages the *Amma Juz* dataset to enhance performance in recognizing Quranic recitations. The model is particularly effective for Arabic speech transcription in religious contexts, while retaining Whisper's general-purpose ASR capabilities.

## Performance Metrics

On the evaluation set, the model achieved:
- **Evaluation Loss**: 0.0032  
- **Word Error Rate (WER)**: 0.5102%  
- **Evaluation Runtime**: 47.9061 seconds  
- **Evaluation Samples per Second**: 2.087  
- **Evaluation Steps per Second**: 0.271  

These metrics demonstrate the model's superior accuracy and efficiency, making it suitable for applications requiring high-quality Quranic transcription.

## Intended Uses & Limitations

### Intended Uses
- **Speech-to-text transcription** of Quranic recitation in Arabic, specifically from the *Amma Juz*.  
- Research and development of tools for Quranic education and learning.  
- Projects focused on Arabic ASR in religious and educational domains.  

### Limitations
- The model is fine-tuned on Quranic recitations and may not generalize well to non-Quranic Arabic speech or casual conversations.  
- Variations in recitation style, audio quality, or heavy accents may impact transcription accuracy.  
- For optimal performance, use clean and high-quality audio inputs.  

## Training and Evaluation Data

The model was trained using the *Amma Juz* dataset, which includes Quranic audio recordings and corresponding transcripts. The dataset was carefully curated to ensure the integrity and accuracy of Quranic content.

## Training Procedure

### Training Hyperparameters
The following hyperparameters were used during training:
- **Learning Rate**: 1e-05  
- **Training Batch Size**: 16  
- **Evaluation Batch Size**: 8  
- **Seed**: 42  
- **Optimizer**: Adam (betas=(0.9, 0.999), epsilon=1e-08)  
- **Learning Rate Scheduler**: Linear  
- **Warmup Steps**: 10  
- **Number of Epochs**: 3.0  
- **Mixed Precision Training**: Native AMP  

### Framework Versions
- **Transformers**: 4.41.1  
- **PyTorch**: 2.2.1+cu121  
- **Datasets**: 2.19.1  
- **Tokenizers**: 0.19.1