Versions:

  • CUDA: 12.1
  • cuDNN Version: 8.9.2.26_1.0-1_amd64
  • tensorflow Version: 2.12.0
  • torch Version: 2.1.0.dev20230606+cu12135
  • transformers Version: 4.30.2
  • accelerate Version: 0.20.3

Model Benchmarks:

  • RAM: 3 GB (Original_Model: 6GB)

  • VRAM: 3.7 GB (Original_Model: 11GB)

  • test.wav: 23 s (Multilingual Speech i.e. English+Hindi)

    • Time in seconds for Processing by each device
    Device Name float32 (Original) float16 CudaCores TensorCores
    3060 2.2 1.3 3,584 112
    1660 Super OOM 6 1,408 N/A
    Collab (Tesla T4) - - 2,560 320
    Collab (CPU) - N/A N/A N/A
    M1 (CPU) - - N/A N/A
    M1 (GPU -> 'mps') - - N/A N/A
    • NOTE: TensorCores are efficient in mixed-precision calculations
    • CPU -> torch.float16 not supported on CPU (AMD Ryzen 5 3600 or Collab CPU)
  • Punchuation: Sometimes False ('I don't know the exact reason why this is happening')

Model Error Benchmarks:

  • WER: Word Error Rate
  • MER: Match Error Rate
  • WIL: Word Information Lost
  • WIP: Word Information Preserved
  • CER: Character Error Rate

Hindi to Hindi (test.tsv) Common Voice 14.0

Test done on RTX 3060 on 1000 Samples

WER MER WIL WIP CER
Original_Model (30 min) 43.99 41.65 59.47 40.52 16.23
This_Model (20 min) 44.64 41.69 59.53 40.46 16.80

Hindi to English (test.csv) Custom Dataset

Test done on RTX 3060 on 1000 Samples

WER MER WIL WIP CER
Original_Model (30 min) - - - - -
This_Model (20 min) - - - - -

English (LibriSpeech -> test-clean)

Test done on RTX 3060 on ___ Samples

WER MER WIL WIP CER
Original_Model - - - - -
This_Model - - - - -

English (LibriSpeech -> test-other)

Test done on RTX 3060 on ___ Samples

WER MER WIL WIP CER
Original_Model - - - - -
This_Model - - - - -
  • 'jiwer' library is used for calculations

Code for conversion:

Usage

A file __init__.py is contained inside this repo which contains all the code to use this model.

Firstly, clone this repo and place all the files inside a folder.

Make sure you have git-lfs installed (https://git-lfs.com)

git lfs install
git clone https://huggingface.co/devasheeshG/whisper_large_v2_fp16_transformers

Please try in jupyter notebook

# Import the Model
from whisper_large_v2_fp16_transformers import Model, load_audio, pad_or_trim
# Initilise the model
model = Model(
            model_name_or_path='whisper_large_v2_fp16_transformers',
            cuda_visible_device="0",
            device='cuda',
      )
# Load Audio
audio = load_audio('whisper_large_v2_fp16_transformers/test.wav')
audio = pad_or_trim(audio)
# Transcribe (First transcription takes time)
model.transcribe(audio)

Credits

It is fp16 version of openai/whisper-large-v2

Downloads last month
9
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Evaluation results