--- pipeline_tag: automatic-speech-recognition library_name: transformers tags: - wav2vec2 - speech-recognition - english-phoneme-recognition --- # Wav2Vec2-Large-Robust ETRI Korean-English Pronunciation Model This repository contains a fine-tuned Wav2Vec2-Large-Robust model for phoneme recognition tasks. The model was trained and evaluated on our in-house dataset, English pronunciations of Korean learners made with ETRI. ## Data Information - **Dataset Name**: ETRI English Pronunciation of Korean Learners - **Train Data**: 14,305 samples - **Valid Data**: 1,590 samples - **Test Data**: 3,974 samples ## Training Procedure The model was fine-tuned for phoneme recognition using the Hugging Face `transformers` library. Below are the training steps: 1. Data preprocessing to align audio with phoneme labels. 2. Wav2Vec2-Large-Robust model fine-tuning with CTC loss. 3. Evaluation on validation and test sets. ### Training Hyperparameters - **Epochs**: 50 - **Learning Rate**: 0.0001 - **Warmup Ratio**: 0.1 - **Scheduler**: Linear - **Batch Size**: 8 - **Loss Reduction**: Mean - **Feature Extractor Freeze**: Enabled ## Training Results The following metrics were achieved during training: - **Final Training Loss**: 0.2527 - **Validation Loss**: 0.4532 - **Word Error Rate (WER) on Validation Set**: 0.1617 ## Test Results The model was evaluated on the test dataset with the following performance: - **Word Error Rate (WER)**: 0.1223 ## Phoneme Data Example Below is an example of how the dataset is structured for phoneme recognition tasks: **Sample 1:** - **Provided Sentence**: The one with the ribbon on its head - **Correct Korean English Phonemes**: dh ah w ah n w ih dh ax r ih b ah n ao n ih t s hh eh dd - **Predicted Phonemes**: d ah w ah n w ih dh ah r ih b ah n ao n ih ts hh eh dd ## Training Logs TensorBoard logs are available for detailed training analysis: - `events.out.tfevents.1732529747.oem-WS-C621E-SAGE-Series.2265579.0` - `events.out.tfevents.1732573537.oem-WS-C621E-SAGE-Series.2265579.1` Use the following command to visualize logs: ```bash tensorboard --logdir=./logs/