|
--- |
|
license: apache-2.0 |
|
language: |
|
- zh |
|
metrics: |
|
- accuracy |
|
- cer |
|
pipeline_tag: automatic-speech-recognition |
|
tags: |
|
- Paraformer |
|
- FunASR |
|
- ASR |
|
--- |
|
## Introduce |
|
|
|
[Paraformer](https://arxiv.org/abs/2206.08317) is a non-autoregressive end-to-end speech recognition model. Compared to the currently mainstream autoregressive models, non-autoregressive models can output the target text for the entire sentence in parallel, making them particularly suitable for parallel inference using GPUs. Paraformer is currently the first known non-autoregressive model that can achieve the same performance as autoregressive end-to-end models on industrial-scale data. When combined with GPU inference, it can improve inference efficiency by 10 times, thereby reducing machine costs for speech recognition cloud services by nearly 10 times. |
|
|
|
This repo shows how to use Paraformer with `funasr_onnx` runtime, the model comes from [FunASR](https://github.com/alibaba-damo-academy/FunASR), which trained from 60000 hours Mandarin data. The performance of Paraformer obtained the first place in [SpeechIO Leadboard](https://github.com/SpeechColab/Leaderboard). |
|
|
|
We have released a large number of industrial-level models, including speech recognition, voice activaty detection, punctuation restoration, speaker verification, speaker diarizatio and timestamp prediction(force alignment). If you are interest, please ref to [FunASR](https://github.com/alibaba-damo-academy/FunASR). |
|
|
|
|
|
## Install funasr_onnx |
|
|
|
```shell |
|
pip install -U funasr_onnx |
|
# For the users in China, you could install with the command: |
|
# pip install -U funasr_onnx -i https://mirror.sjtu.edu.cn/pypi/web/simple |
|
``` |
|
|
|
## Download the model |
|
|
|
```shell |
|
git clone https://huggingface.co/funasr/paraformer-large |
|
``` |
|
|
|
## Inference with runtime |
|
|
|
### Speech Recognition |
|
#### Paraformer |
|
```python |
|
from funasr_onnx import Paraformer |
|
|
|
model_dir = "./paraformer-large" |
|
model = Paraformer(model_dir, batch_size=1, quantize=True) |
|
|
|
wav_path = ['./funasr/paraformer-large/asr_example.wav'] |
|
|
|
result = model(wav_path) |
|
print(result) |
|
``` |
|
- `model_dir`: the model path, which contains `model.onnx`, `config.yaml`, `am.mvn` |
|
- `batch_size`: `1` (Default), the batch size duration inference |
|
- `device_id`: `-1` (Default), infer on CPU. If you want to infer with GPU, set it to gpu_id (Please make sure that you have install the onnxruntime-gpu) |
|
- `quantize`: `False` (Default), load the model of `model.onnx` in `model_dir`. If set `True`, load the model of `model_quant.onnx` in `model_dir` |
|
- `intra_op_num_threads`: `4` (Default), sets the number of threads used for intraop parallelism on CPU |
|
|
|
Input: wav formt file, support formats: `str, np.ndarray, List[str]` |
|
|
|
Output: `List[str]`: recognition result |
|
|
|
|
|
|
|
|
|
## Performance benchmark |
|
|
|
Please ref to [benchmark](https://github.com/alibaba-damo-academy/FunASR/blob/main/funasr/runtime/python/benchmark_onnx.md) |
|
|
|
## Citations |
|
|
|
``` bibtex |
|
@inproceedings{gao2022paraformer, |
|
title={Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition}, |
|
author={Gao, Zhifu and Zhang, Shiliang and McLoughlin, Ian and Yan, Zhijie}, |
|
booktitle={INTERSPEECH}, |
|
year={2022} |
|
} |
|
``` |