slplab's picture
Update README.md
7302b2a verified
|
raw
history blame
2.12 kB
metadata
pipeline_tag: automatic-speech-recognition
library_name: transformers
tags:
  - wav2vec2
  - speech-recognition
  - english-phoneme-recognition

Wav2Vec2-Large-Robust ETRI Korean-English Pronunciation Model

This repository contains a fine-tuned Wav2Vec2-Large-Robust model for phoneme recognition tasks. The model was trained and evaluated on our in-house dataset, English pronunciations of Korean learners made with ETRI.

Data Information

  • Dataset Name: ETRI English Pronunciation of Korean Learners
  • Train Data: 14,305 samples
  • Valid Data: 1,590 samples
  • Test Data: 3,974 samples

Training Procedure

The model was fine-tuned for phoneme recognition using the Hugging Face transformers library. Below are the training steps:

  1. Data preprocessing to align audio with phoneme labels.
  2. Wav2Vec2-Large-Robust model fine-tuning with CTC loss.
  3. Evaluation on validation and test sets.

Training Hyperparameters

  • Epochs: 50
  • Learning Rate: 0.0001
  • Warmup Ratio: 0.1
  • Scheduler: Linear
  • Batch Size: 8
  • Loss Reduction: Mean
  • Feature Extractor Freeze: Enabled

Training Results

The following metrics were achieved during training:

  • Final Training Loss: 0.2527
  • Validation Loss: 0.4532
  • Word Error Rate (WER) on Validation Set: 0.1617

Test Results

The model was evaluated on the test dataset with the following performance:

  • Word Error Rate (WER): 0.1223

Phoneme Data Example

Below is an example of how the dataset is structured for phoneme recognition tasks:

Sample 1:

  • Provided Sentence: The one with the ribbon on its head
  • Correct Korean English Phonemes: dh ah w ah n w ih dh ax r ih b ah n ao n ih t s hh eh dd
  • Predicted Phonemes: d ah w ah n w ih dh ah r ih b ah n ao n ih ts hh eh dd

Training Logs

TensorBoard logs are available for detailed training analysis:

  • events.out.tfevents.1732529747.oem-WS-C621E-SAGE-Series.2265579.0
  • events.out.tfevents.1732573537.oem-WS-C621E-SAGE-Series.2265579.1

Use the following command to visualize logs:

tensorboard --logdir=./logs/