library_name: transformers
license: gpl-3.0
datasets:
- MohamedRashad/arabic-english-code-switching
language:
- ar
- en
metrics:
- wer
pipeline_tag: automatic-speech-recognition
π³ Arabic-Whisper-CodeSwitching-Edition
This model is a fine-tuned version of Whisper Large v2 by OpenAI, trained on an Arabic-English-code-switching dataset.
π Model Details
Model Description
The Arabic-Whisper-CodeSwitching-Edition is designed to handle Arabic audio with embedded English words. This model enhances the original Whisper Large v2 by improving its performance on Arabic-English code-switching speech
- Developed by: Ψ§ΩΨΉΨ¨Ψ― ΩΩΩ
- Model type: Speech Recognition
- Language(s) (NLP): Arabic, English (in the context of Arabic audio)
- License: GPL-3.0
Model Sources [optional]
- Repository for data collection: https://github.com/MohamedAliRashad/youtube-audio-collector
- Demo: https://huggingface.co/spaces/MohamedRashad/Arabic-Whisper-CodeSwitching-Edition
π· Uses
Direct Use
The model can be used directly for transcribing Arabic speech that includes English words. It is particularly useful in multilingual environments where code-switching is common.
Out-of-Scope Use
The model may not perform well on monolingual speech in languages other than Arabic or English, or on speech with code-switching in languages other than Arabic and English.
π¨ Bias, Risks, and Limitations
Recommendations
Users (both direct and downstream) should be made aware of the risks, biases, and limitations of the model. More information needed for further recommendations.
π How to Get Started with the Model
Use the code below to get started with the model.
from transformers import WhisperForConditionalGeneration, WhisperProcessor
processor = WhisperProcessor.from_pretrained("MohamedRashad/Arabic-Whisper-CodeSwitching-Edition")
model = WhisperForConditionalGeneration.from_pretrained("MohamedRashad/Arabic-Whisper-CodeSwitching-Edition")
# Example usage
inputs = processor("path_to_audio_file.wav", return_tensors="pt")
generated_ids = model.generate(inputs["input_features"])
transcription = processor.batch_decode(generated_ids, skip_special_tokens=True)
print(transcription)
π¨βπ Citation
BibTeX:
@misc{rashad2024arabicwhisper,
title={Arabic-Whisper-CodeSwitching-Edition},
author={Mohamed Rashad},
year={2024},
url={https://huggingface.co/spaces/MohamedRashad/Arabic-Whisper-CodeSwitching-Edition},
}
APA:
Rashad, M. (2024). Arabic-Whisper-CodeSwitching-Edition. Retrieved from https://huggingface.co/spaces/MohamedRashad/Arabic-Whisper-CodeSwitching-Edition