waveletdeboshir
/

whisper-small-ru-pruned

Automatic Speech Recognition

Inference Endpoints

Model card Files Files and versions Community

whisper-small-ru-pruned / README.md

waveletdeboshir's picture

waveletdeboshir

Add metrics on golos-test-crowd

8c17c7e verified 6 months ago

|

1.74 kB

	---
	license: apache-2.0
	language:
	- ru
	library_name: transformers
	pipeline_tag: automatic-speech-recognition
	tags:
	- asr
	- Pytorch
	- pruned
	- audio
	- automatic-speech-recognition
	metrics:
	- cer
	- wer
	---

	# Whisper-small-ru-pruned

	## Model info
	This is a pruned version of [openai/whisper-small](https://huggingface.co/openai/whisper-small) model with only russian tokens left.
	Pruning was made without any fine-tuning. Method from [this post](https://medium.com/m/global-identity-2?redirectUrl=https%3A%2F%2Ftowardsdatascience.com%2Fhow-to-adapt-a-multilingual-t5-model-for-a-single-language-b9f94f3d9c90) was used.

	## Size
	Only 10% tokens was left including special whisper tokens, added whisper tokens, 100 most popular tokens from tokenizer and 3000 most popular Russian tokens computed by tokenization of russian text corpus.

	Model size is 15% less then original whisper-small:
	\| \| openai/whisper-small \| waveletdeboshir/whisper-small-ru-pruned \|
	\| :------ \| :------ \| :------ \|
	\| n of parameters \| 242 M \| 205 M \|
	\| n of parameters (with proj_out layer) \| 281 M \| 209 M \|
	\| model file size \| 967 Mb \| 837 Mb \|
	\| vocab_size \| 51865 \| 4705 \|

	## Other pruned whisper models
	* [waveletdeboshir/whisper-tiny-ru-pruned](https://huggingface.co/waveletdeboshir/whisper-tiny-ru-pruned)
	* [waveletdeboshir/whisper-base-ru-pruned](https://huggingface.co/waveletdeboshir/whisper-base-ru-pruned)

	## Metrics
	\| \| openai/whisper-small \| waveletdeboshir/whisper-small-ru-pruned \|
	\| :------ \| :------ \| :------ \|
	\| WER* golos-test-crowd \| 0.3358 \| 0.3471 \|
	\| CER* golos-test-crowd \| 0.1561 \| 0.1444 \|
	*Metrics were measured after text normalization

	You can fine-tune this model on your data to achive better performance.

	## Colab for pruning
	TODO