|
--- |
|
pipeline_tag: automatic-speech-recognition |
|
library_name: transformers |
|
tags: |
|
- wav2vec2 |
|
- speech-recognition |
|
- english-phoneme-recognition |
|
--- |
|
|
|
# Wav2Vec2-Large-Robust ETRI Korean-English Pronunciation Model |
|
|
|
This repository contains a fine-tuned Wav2Vec2-Large-Robust model for phoneme recognition tasks. The model was trained and evaluated on our in-house dataset, English pronunciations of Korean learners made with ETRI. |
|
|
|
## Data Information |
|
- **Dataset Name**: ETRI English Pronunciation of Korean Learners |
|
- **Train Data**: 14,305 samples |
|
- **Valid Data**: 1,590 samples |
|
- **Test Data**: 3,974 samples |
|
|
|
## Training Procedure |
|
The model was fine-tuned for phoneme recognition using the Hugging Face `transformers` library. Below are the training steps: |
|
1. Data preprocessing to align audio with phoneme labels. |
|
2. Wav2Vec2-Large-Robust model fine-tuning with CTC loss. |
|
3. Evaluation on validation and test sets. |
|
|
|
### Training Hyperparameters |
|
- **Epochs**: 50 |
|
- **Learning Rate**: 0.0001 |
|
- **Warmup Ratio**: 0.1 |
|
- **Scheduler**: Linear |
|
- **Batch Size**: 8 |
|
- **Loss Reduction**: Mean |
|
- **Feature Extractor Freeze**: Enabled |
|
|
|
## Training Results |
|
The following metrics were achieved during training: |
|
- **Final Training Loss**: 0.2527 |
|
- **Validation Loss**: 0.4532 |
|
- **Word Error Rate (WER) on Validation Set**: 0.1617 |
|
|
|
## Test Results |
|
The model was evaluated on the test dataset with the following performance: |
|
- **Word Error Rate (WER)**: 0.1223 |
|
|
|
## Phoneme Data Example |
|
Below is an example of how the dataset is structured for phoneme recognition tasks: |
|
|
|
**Sample 1:** |
|
- **Provided Sentence**: The one with the ribbon on its head |
|
- **Correct Korean English Phonemes**: dh ah w ah n w ih dh ax r ih b ah n ao n ih t s hh eh dd |
|
- **Predicted Phonemes**: d ah w ah n w ih dh ah r ih b ah n ao n ih ts hh eh dd |
|
|
|
## Training Logs |
|
TensorBoard logs are available for detailed training analysis: |
|
- `events.out.tfevents.1732529747.oem-WS-C621E-SAGE-Series.2265579.0` |
|
- `events.out.tfevents.1732573537.oem-WS-C621E-SAGE-Series.2265579.1` |
|
|
|
Use the following command to visualize logs: |
|
```bash |
|
tensorboard --logdir=./logs/ |