Wav2Vec2-Large-Robust ETRI Korean-English Pronunciation Model

This repository contains a fine-tuned Wav2Vec2-Large-Robust model for phoneme recognition tasks. The model was trained and evaluated on our in-house English pronunciations of Korean learners dataset, which was made with ETRI.

Data Information

  • Dataset Name: ETRI English Pronunciation of Korean Learners
  • Train Data: 14,305 samples
  • Valid Data: 1,590 samples
  • Test Data: 3,974 samples

Training Procedure

The model was fine-tuned for phoneme recognition using the Hugging Face transformers library. Below are the training steps:

  1. Data preprocessing to align audio with phoneme labels.
  2. Wav2Vec2-Large-Robust model fine-tuning with CTC loss.
  3. Evaluation on validation and test sets.

Training Hyperparameters

  • Epochs: 50
  • Learning Rate: 0.0001
  • Warmup Ratio: 0.1
  • Scheduler: Linear
  • Batch Size: 8
  • Loss Reduction: Mean
  • Feature Extractor Freeze: Enabled

Training Results

The following metrics were achieved during training:

  • Final Training Loss: 0.2527
  • Validation Loss: 0.4532
  • Word Error Rate (WER) on Validation Set: 0.1617

Test Results

The model was evaluated on the test dataset with the following performance:

  • Word Error Rate (WER): 0.1223

Phoneme Data Example

Below is an example of how the dataset is structured for phoneme recognition tasks:

Sample 1:

  • Provided Sentence: The one with the ribbon on its head
  • Correct Korean English Phonemes: dh ah w ah n w ih dh ax r ih b ah n ao n ih t s hh eh dd
  • Predicted Phonemes: d ah w ah n w ih dh ah r ih b ah n ao n ih ts hh eh dd

Training Logs

TensorBoard logs are available for detailed training analysis:

  • events.out.tfevents.1732529747.oem-WS-C621E-SAGE-Series.2265579.0
  • events.out.tfevents.1732573537.oem-WS-C621E-SAGE-Series.2265579.1

Use the following command to visualize logs:

tensorboard --logdir=./logs/
Downloads last month
45
Safetensors
Model size
316M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.