|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- Murple/ksponspeech |
|
language: |
|
- ko |
|
metrics: |
|
- cer |
|
- wer |
|
pipeline_tag: automatic-speech-recognition |
|
--- |
|
# Whisper-Medium-KsponSpeech |
|
|
|
The Whisper-medium Model finetunned with [KsponSpeech](https://huggingface.co/datasets/Murple/ksponspeech) |
|
|
|
|
|
### Model Description |
|
|
|
<!-- Provide a longer summary of what this model is. --> |
|
|
|
|
|
|
|
- **Developed by :** [yw0nam](https://github.com/yw0nam) |
|
- **Shared by :** [yw0nam](https://github.com/yw0nam) |
|
- **Model type :** ASR |
|
- **License:** [apache-2.0] |
|
|
|
## Uses |
|
|
|
``` |
|
|
|
processor = WhisperProcessor.from_pretrained("openai/whisper-medium", language="ko", task="transcribe") |
|
model = WhisperForConditionalGeneration.from_pretrained('spow12/whisper-medium-zeroth_korean').cuda() |
|
|
|
data, _ = librosa.load(wav_path, sr=16000) |
|
input_features = processor(data, sampling_rate=16000, return_tensors="pt").input_features.cuda() |
|
|
|
predicted_ids = model.generate(input_features) |
|
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0] |
|
|
|
``` |
|
|
|
### Metrics |
|
|
|
Metric | result | |
|
--- | --- | |
|
WER | 3.96 | |
|
CER | 1.71 | |
|
|
|
|