File size: 1,267 Bytes
e80128c
1ec76a9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e80128c
1ec76a9
 
e80128c
1ec76a9
21826c0
e80128c
21826c0
e80128c
21826c0
e80128c
21826c0
 
 
 
 
e80128c
21826c0
e80128c
21826c0
e80128c
21826c0
 
 
e80128c
ba695fa
e80128c
1ec76a9
 
e80128c
ba695fa
21826c0
ba695fa
e80128c
ba695fa
 
e80128c
ba695fa
 
 
1ec76a9
e80128c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
---
language:
- en
license: cc-by-nc-nd-4.0
library_name: nemo
datasets:
- commonvoice
thumbnail: null
tags:
- automatic-speech-recognition
- speech
- audio
- CTC
- named-entity-recognition
- emotion-classification
- Transformer
- NeMo
- pytorch
model-index:
- name: 1step_ctc_ner_emotion_commonvoice500hrs
  results: []

---
# This speech tagger performs transcription, annotates entities, predict speaker emotion 

Model is suitable for voiceAI applications, real-time and offline.

## Model Details

- **Model type**: NeMo ASR
- **Architecture**: Conformer CTC
- **Language**: English
- **Training data**: CommonVoice, Gigaspeech
- **Performance metrics**: [Metrics]

## Usage

To use this model, you need to install the NeMo library:

```bash
pip install nemo_toolkit
```

### How to run

```python
import nemo.collections.asr as nemo_asr

# Step 1: Load the ASR model from Hugging Face
model_name = 'WhissleAI/speech-tagger_en_ner_emotion'
asr_model = nemo_asr.models.EncDecCTCModel.from_pretrained(model_name)

# Step 2: Provide the path to your audio file
audio_file_path = '/path/to/your/audio_file.wav'

# Step 3: Transcribe the audio
transcription = asr_model.transcribe(paths2audio_files=[audio_file_path])
print(f'Transcription: {transcription[0]}')
```