imedennikov commited on
Commit
633d071
·
verified ·
1 Parent(s): 3a2788c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -6
README.md CHANGED
@@ -19,7 +19,7 @@ tags:
19
  - NeMo
20
  license: cc-by-4.0
21
  model-index:
22
- - name: ja-parakeet-tdt_ctc-0.6b
23
  results:
24
  - task:
25
  name: Automatic Speech Recognition
@@ -108,7 +108,7 @@ img {
108
  | [![Language](https://img.shields.io/badge/Language-ja-lightgrey#model-badge)](#datasets)
109
 
110
 
111
- `ja-parakeet-tdt_ctc-0.6b` is an ASR model that transcribes Japanese speech with Punctuations. This model is developed by [NVIDIA NeMo](https://github.com/NVIDIA/NeMo) team.
112
  It is an XL version of Hybrid FastConformer [1] TDT-CTC [2] (around 0.6B parameters) model.
113
  See the [model architecture](#model-architecture) section and [NeMo documentation](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/models.html#fast-conformer) for complete architecture details.
114
 
@@ -116,7 +116,7 @@ See the [model architecture](#model-architecture) section and [NeMo documentatio
116
 
117
  To train, fine-tune or play with the model you will need to install [NVIDIA NeMo](https://github.com/NVIDIA/NeMo). We recommend you install it after you've installed latest PyTorch version.
118
  ```
119
- pip install nemo_toolkit['all']
120
  ```
121
 
122
  ## How to Use this Model
@@ -127,7 +127,7 @@ The model is available for use in the NeMo toolkit [3], and can be used as a pre
127
 
128
  ```python
129
  import nemo.collections.asr as nemo_asr
130
- asr_model = nemo_asr.models.ASRModel.from_pretrained(model_name="nvidia/ja-parakeet-tdt_ctc-0.6b")
131
  ```
132
 
133
  ### Transcribing using Python
@@ -142,7 +142,7 @@ By default model uses TDT to transcribe the audio files, to switch decoder to us
142
 
143
  ```shell
144
  python [NEMO_GIT_FOLDER]/examples/asr/transcribe_speech.py
145
- pretrained_name="nvidia/ja-parakeet-tdt_ctc-0.6b"
146
  audio_dir="<DIRECTORY CONTAINING AUDIO FILES>"
147
  ```
148
 
@@ -160,7 +160,7 @@ This model uses a Hybrid FastConformer-TDT-CTC architecture.
160
 
161
  FastConformer [1] is an optimized version of the Conformer model with 8x depthwise-separable convolutional downsampling. You may find more information on the details of FastConformer here: [Fast-Conformer Model](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/models.html#fast-conformer).
162
 
163
- TDT (Token-and-Duration Transducer) [2] is a generalization of conventional Transducers by decoupling token and duration predictions. Unlike conventional Transducers which produces a lot of blanks during inference, a TDT model can skip majority of blank predictions by using the duration output (up to 4 frames for this ja-parakeet-tdt_ctc-0.6b model), thus brings significant inference speed-up. The detail of TDT can be found here: [Efficient Sequence Transduction by Jointly Predicting Tokens and Durations](https://arxiv.org/abs/2304.06795).
164
 
165
  ## Training
166
 
 
19
  - NeMo
20
  license: cc-by-4.0
21
  model-index:
22
+ - name: parakeet-tdt_ctc-0.6b-ja
23
  results:
24
  - task:
25
  name: Automatic Speech Recognition
 
108
  | [![Language](https://img.shields.io/badge/Language-ja-lightgrey#model-badge)](#datasets)
109
 
110
 
111
+ `parakeet-tdt_ctc-0.6b-ja` is an ASR model that transcribes Japanese speech with Punctuations. This model is developed by [NVIDIA NeMo](https://github.com/NVIDIA/NeMo) team.
112
  It is an XL version of Hybrid FastConformer [1] TDT-CTC [2] (around 0.6B parameters) model.
113
  See the [model architecture](#model-architecture) section and [NeMo documentation](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/models.html#fast-conformer) for complete architecture details.
114
 
 
116
 
117
  To train, fine-tune or play with the model you will need to install [NVIDIA NeMo](https://github.com/NVIDIA/NeMo). We recommend you install it after you've installed latest PyTorch version.
118
  ```
119
+ pip install nemo_toolkit['asr']
120
  ```
121
 
122
  ## How to Use this Model
 
127
 
128
  ```python
129
  import nemo.collections.asr as nemo_asr
130
+ asr_model = nemo_asr.models.ASRModel.from_pretrained(model_name="nvidia/parakeet-tdt_ctc-0.6b-ja")
131
  ```
132
 
133
  ### Transcribing using Python
 
142
 
143
  ```shell
144
  python [NEMO_GIT_FOLDER]/examples/asr/transcribe_speech.py
145
+ pretrained_name="nvidia/parakeet-tdt_ctc-0.6b-ja"
146
  audio_dir="<DIRECTORY CONTAINING AUDIO FILES>"
147
  ```
148
 
 
160
 
161
  FastConformer [1] is an optimized version of the Conformer model with 8x depthwise-separable convolutional downsampling. You may find more information on the details of FastConformer here: [Fast-Conformer Model](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/models.html#fast-conformer).
162
 
163
+ TDT (Token-and-Duration Transducer) [2] is a generalization of conventional Transducers by decoupling token and duration predictions. Unlike conventional Transducers which produces a lot of blanks during inference, a TDT model can skip majority of blank predictions by using the duration output (up to 4 frames for this `parakeet-tdt_ctc-0.6b-ja` model), thus brings significant inference speed-up. The detail of TDT can be found here: [Efficient Sequence Transduction by Jointly Predicting Tokens and Durations](https://arxiv.org/abs/2304.06795).
164
 
165
  ## Training
166