File size: 4,436 Bytes
8caa5ba ea89af5 8caa5ba 41cb0c3 3260e38 8caa5ba 4148783 8caa5ba 4148783 8caa5ba ec2cd70 4148783 ec2cd70 4148783 8caa5ba |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 |
---
language: fr
pipeline_tag: "token-classification"
widget:
- text: "je voudrais réserver une chambre à paris pour demain et lundi"
- text: "d'accord pour l'hôtel à quatre vingt dix euros la nuit"
- text: "deux nuits s'il vous plait"
- text: "dans un hôtel avec piscine à marseille"
tags:
- bert
- flaubert
- natural language understanding
- NLU
- spoken language understanding
- SLU
- understanding
- MEDIA
---
# vpelloin/MEDIA_NLU-flaubert_oral_asr
This is a Natural Language Understanding (NLU) model for the French [MEDIA benchmark](https://catalogue.elra.info/en-us/repository/browse/ELRA-S0272/).
It maps each input words into outputs concepts tags (76 available).
This model is trained using [`nherve/flaubert-oral-asr`](https://huggingface.co/nherve/flaubert-oral-asr) as its inital checkpoint. It obtained 12.43% CER (*lower is better*) in the MEDIA test set, in [our Interspeech 2023 publication](http://doi.org/10.21437/Interspeech.2022-352), using Kaldi ASR transcriptions.
## Available MEDIA NLU models:
- [`vpelloin/MEDIA_NLU-flaubert_base_cased`](https://huggingface.co/vpelloin/MEDIA_NLU-flaubert_base_cased): MEDIA NLU model trained using [`flaubert/flaubert_base_cased`](https://huggingface.co/flaubert/flaubert_base_cased). Obtains 13.20% CER on MEDIA test.
- [`vpelloin/MEDIA_NLU-flaubert_base_uncased`](https://huggingface.co/vpelloin/MEDIA_NLU-flaubert_base_uncased): MEDIA NLU model trained using [`flaubert/flaubert_base_uncased`](https://huggingface.co/flaubert/flaubert_base_uncased). Obtains 12.40% CER on MEDIA test.
- [`vpelloin/MEDIA_NLU-flaubert_oral_ft`](https://huggingface.co/vpelloin/MEDIA_NLU-flaubert_oral_ft): MEDIA NLU model trained using [`nherve/flaubert-oral-ft`](https://huggingface.co/nherve/flaubert-oral-ft). Obtains 11.98% CER on MEDIA test.
- [`vpelloin/MEDIA_NLU-flaubert_oral_mixed`](https://huggingface.co/vpelloin/MEDIA_NLU-flaubert_oral_mixed): MEDIA NLU model trained using [`nherve/flaubert-oral-mixed`](https://huggingface.co/nherve/flaubert-oral-mixed). Obtains 12.47% CER on MEDIA test.
- [`vpelloin/MEDIA_NLU-flaubert_oral_asr`](https://huggingface.co/vpelloin/MEDIA_NLU-flaubert_oral_asr): MEDIA NLU model trained using [`nherve/flaubert-oral-asr`](https://huggingface.co/nherve/flaubert-oral-asr). Obtains 12.43% CER on MEDIA test.
- [`vpelloin/MEDIA_NLU-flaubert_oral_asr_nb`](https://huggingface.co/vpelloin/MEDIA_NLU-flaubert_oral_asr_nb): MEDIA NLU model trained using [`nherve/flaubert-oral-asr_nb`](https://huggingface.co/nherve/flaubert-oral-asr_nb). Obtains 12.24% CER on MEDIA test.
## Usage with Pipeline
```python
from transformers import pipeline
generator = pipeline(
model="vpelloin/MEDIA_NLU-flaubert_oral_asr",
task="token-classification"
)
sentences = [
"je voudrais réserver une chambre à paris pour demain et lundi",
"d'accord pour l'hôtel à quatre vingt dix euros la nuit",
"deux nuits s'il vous plait",
"dans un hôtel avec piscine à marseille"
]
for sentence in sentences:
print([(tok['word'], tok['entity']) for tok in generator(sentence)])
```
## Usage with AutoTokenizer/AutoModel
```python
from transformers import (
AutoTokenizer,
AutoModelForTokenClassification
)
tokenizer = AutoTokenizer.from_pretrained(
"vpelloin/MEDIA_NLU-flaubert_oral_asr"
)
model = AutoModelForTokenClassification.from_pretrained(
"vpelloin/MEDIA_NLU-flaubert_oral_asr"
)
sentences = [
"je voudrais réserver une chambre à paris pour demain et lundi",
"d'accord pour l'hôtel à quatre vingt dix euros la nuit",
"deux nuits s'il vous plait",
"dans un hôtel avec piscine à marseille"
]
inputs = tokenizer(sentences, padding=True, return_tensors='pt')
outputs = model(**inputs).logits
print([
[model.config.id2label[i] for i in b]
for b in outputs.argmax(dim=-1).tolist()
])
```
## Reference
If you use this model for your scientific publication, or if you find the resources in this repository useful, please cite the [following paper](http://doi.org/10.21437/Interspeech.2022-352):
```
@inproceedings{pelloin22_interspeech,
author={Valentin Pelloin and Franck Dary and Nicolas Hervé and Benoit Favre and Nathalie Camelin and Antoine LAURENT and Laurent Besacier},
title={ASR-Generated Text for Language Model Pre-training Applied to Speech Tasks},
year=2022,
booktitle={Proc. Interspeech 2022},
pages={3453--3457},
doi={10.21437/Interspeech.2022-352}
}
```
|