Wav2Vec2-Large-Robust ETRI Korean-English Pronunciation Model
This repository contains a fine-tuned Wav2Vec2-Large-Robust model for phoneme recognition tasks. The model was trained and evaluated on our in-house English pronunciations of Korean learners dataset, which was made with ETRI.
Data Information
- Dataset Name: ETRI English Pronunciation of Korean Learners
- Train Data: 14,305 samples
- Valid Data: 1,590 samples
- Test Data: 3,974 samples
Training Procedure
The model was fine-tuned for phoneme recognition using the Hugging Face transformers
library. Below are the training steps:
- Data preprocessing to align audio with phoneme labels.
- Wav2Vec2-Large-Robust model fine-tuning with CTC loss.
- Evaluation on validation and test sets.
Training Hyperparameters
- Epochs: 50
- Learning Rate: 0.0001
- Warmup Ratio: 0.1
- Scheduler: Linear
- Batch Size: 8
- Loss Reduction: Mean
- Feature Extractor Freeze: Enabled
Training Results
The following metrics were achieved during training:
- Final Training Loss: 0.2527
- Validation Loss: 0.4532
- Word Error Rate (WER) on Validation Set: 0.1617
Test Results
The model was evaluated on the test dataset with the following performance:
- Word Error Rate (WER): 0.1223
Phoneme Data Example
Below is an example of how the dataset is structured for phoneme recognition tasks:
Sample 1:
- Provided Sentence: The one with the ribbon on its head
- Correct Korean English Phonemes: dh ah w ah n w ih dh ax r ih b ah n ao n ih t s hh eh dd
- Predicted Phonemes: d ah w ah n w ih dh ah r ih b ah n ao n ih ts hh eh dd
Training Logs
TensorBoard logs are available for detailed training analysis:
events.out.tfevents.1732529747.oem-WS-C621E-SAGE-Series.2265579.0
events.out.tfevents.1732573537.oem-WS-C621E-SAGE-Series.2265579.1
Use the following command to visualize logs:
tensorboard --logdir=./logs/
- Downloads last month
- 45
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.