---
library_name: transformers
datasets:
- FBK-MT/Speech-MASSIVE
language:
- pl
metrics:
- wer
- bleu
base_model:
- openai/whisper-tiny
pipeline_tag: automatic-speech-recognition
---

# Model Card

## Model Details

### Model Description

This model is a fine-tuned version of OpenAI's Whisper-Tiny ASR model,
optimized for transcribing Polish voice commands. The fine-tuning process 
utilized the MASSIVE Speech dataset to enhance the model's performance
on Polish utterances. The Whisper-Tiny model is a transformer-based
encoder-decoder architecture, pre-trained on 680,000 hours of labeled speech data.

- **Developed by:** gs224
- **Language(s) (NLP):** Polish
- **Finetuned from model:** Whisper-tiny

## Uses

The model can be used for automatic transcription of Polish speech-to-text tasks, including voice command recognition.

### Out-of-Scope Use

The model may not perform well on languages or domains it was not fine-tuned for, and it is not suitable for sensitive applications requiring very high accuracy.

## Bias, Risks, and Limitations

The fine-tuning was performed on a relatively small subset of Polish voice data
with limited epochs, leading to potential underperformance in certain dialects or accents.
The presence of capital letters and punctuation in the ground-truth transcription
may affect the Word Error Rate (WER) score.

### Recommendations

Future improvements could include training on larger datasets, more diverse utterances,
and addressing case sensitivity and punctuation in ground-truth labels.

## Training Details

### Training Data

https://huggingface.co/datasets/FBK-MT/Speech-MASSIVE-test

## Evaluation

Word Error Rate (WER)

### Testing Data, Factors & Metrics


#### Metrics

WER, a typical metrics for ASR.

### Results

WER of fine-tuned model: 0.3216