File size: 1,267 Bytes
e80128c 1ec76a9 e80128c 1ec76a9 e80128c 1ec76a9 21826c0 e80128c 21826c0 e80128c 21826c0 e80128c 21826c0 e80128c 21826c0 e80128c 21826c0 e80128c 21826c0 e80128c ba695fa e80128c 1ec76a9 e80128c ba695fa 21826c0 ba695fa e80128c ba695fa e80128c ba695fa 1ec76a9 e80128c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 |
---
language:
- en
license: cc-by-nc-nd-4.0
library_name: nemo
datasets:
- commonvoice
thumbnail: null
tags:
- automatic-speech-recognition
- speech
- audio
- CTC
- named-entity-recognition
- emotion-classification
- Transformer
- NeMo
- pytorch
model-index:
- name: 1step_ctc_ner_emotion_commonvoice500hrs
results: []
---
# This speech tagger performs transcription, annotates entities, predict speaker emotion
Model is suitable for voiceAI applications, real-time and offline.
## Model Details
- **Model type**: NeMo ASR
- **Architecture**: Conformer CTC
- **Language**: English
- **Training data**: CommonVoice, Gigaspeech
- **Performance metrics**: [Metrics]
## Usage
To use this model, you need to install the NeMo library:
```bash
pip install nemo_toolkit
```
### How to run
```python
import nemo.collections.asr as nemo_asr
# Step 1: Load the ASR model from Hugging Face
model_name = 'WhissleAI/speech-tagger_en_ner_emotion'
asr_model = nemo_asr.models.EncDecCTCModel.from_pretrained(model_name)
# Step 2: Provide the path to your audio file
audio_file_path = '/path/to/your/audio_file.wav'
# Step 3: Transcribe the audio
transcription = asr_model.transcribe(paths2audio_files=[audio_file_path])
print(f'Transcription: {transcription[0]}')
```
|