Model Details

Model Description

Wav2Vec2.0 model trained with Early-Exit pipeline.

  • Developed by: SpeectTek unit, Fondazione Bruno Kessler
  • Model type: Wav2Vec 2.0
  • Language(s) (NLP): English
  • Finetuned from model: facebook/wav2vec2-base-960h
  • Repository: https://github.com/augustgw/wav2vec2-ee
  • Paper: Training early-exit architectures for automatic speech recognition: Fine-tuning pre-trained models or training from scratch

Downstream Use [optional]

The model is trained for computationally efficient ASR tasks.

Training Details

Training Data

The model is trained using the LibriSpeech-960h dataset.

Training Procedure

Basic training

  • Fine-tuning with only EE loss: finetune_ee.py
  • Fine-tuning a model without early exits: finetune_non-ee.py
  • Change model_config = Wav2Vec2Config(num_hidden_layers=X) to set the number of layers in the encoder. E.g., for 4-layer encoder: model_config = Wav2Vec2Config(num_hidden_layers=4)

Training Hyperparameters

training_args = TrainingArguments( output_dir="./wav2vec2-ee/checkpoints/", evaluation_strategy="no", #eval_steps=1000, save_strategy = 'epoch', #eval_accumulation_steps=10, learning_rate=1e-4, per_device_train_batch_size=16, per_device_eval_batch_size=1, num_train_epochs=100, weight_decay=0.01, push_to_hub=False, report_to='wandb', logging_strategy='steps', logging_steps=1000, dataloader_num_workers=1, ignore_data_skip=True,)

Evaluation

The evaluation scripts create files in the indicated output directory. wer_results.txt contains the layerwise WERs on the test sets indicated in the evaluation script. The remaining files contain the layerwise transcriptions of each item in each test set.

Basic evaluation

  • Normal evaluation: eval.py path/to/model/checkpoint path/to/output/directory
    • For safetensors checkpoints saved by newer versions of Hugging Face, see note in eval.py
  • Evaluation for models without early exits (evaluates only output of final layer): eval_non-ee.py path/to/model/checkpoint path/to/output/directory

Results

Exit Test-Clean Dev-Clean
Exit(1) 19.14 19.06
Exit(2) 8.26 8.01
Exit(3) 5.93 5.57
Exit(4) 4.74 4.48
Exit(5) 3.98 3.79
Exit(6) 3.95 3.69

Citation

@inproceedings{wright2024training,
  title={Training early-exit architectures for automatic speech recognition: Fine-tuning pre-trained models or training from scratch},
  author={Wright, George August and Cappellazzo, Umberto and Zaiem, Salah and Raj, Desh and Yang, Lucas Ondel and Falavigna, Daniele and Ali, Mohamed Nabih and Brutti, Alessio},
  booktitle={2024 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)},
  pages={685--689},
  year={2024},
  organization={IEEE}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for SpeechTek/EE-Wav2Vec2

Finetuned
(123)
this model

Dataset used to train SpeechTek/EE-Wav2Vec2