--- language: fr pipeline_tag: "token-classification" widget: - text: "je voudrais réserver une chambre à paris pour demain et lundi" - text: "d'accord pour l'hôtel à quatre vingt dix euros la nuit" - text: "deux nuits s'il vous plait" - text: "dans un hôtel avec piscine à marseille" tags: - bert - flaubert - natural language understanding - NLU - spoken language understanding - SLU - understanding - MEDIA --- # vpelloin/MEDIA_NLU-flaubert_oral_asr This is a Natural Language Understanding (NLU) model for the French [MEDIA benchmark](https://catalogue.elra.info/en-us/repository/browse/ELRA-S0272/). It maps each input words into outputs concepts tags (76 available). This model is trained with [`flaubert-oral-asr`](https://huggingface.co/nherve/flaubert-oral-asr) as it's inital checkpoint. Available MEDIA NLU models: - [MEDIA_NLU-flaubert_base_cased](https://huggingface.co/vpelloin/MEDIA_NLU-flaubert_base_cased): model trained with [`flaubert_base_cased`](https://huggingface.co/flaubert/flaubert_base_cased) as it's inital checkpoint - [MEDIA_NLU-flaubert_base_uncased](https://huggingface.co/vpelloin/MEDIA_NLU-flaubert_base_uncased): model trained with [`flaubert_base_uncased`](https://huggingface.co/flaubert/flaubert_base_uncased) as it's inital checkpoint - [MEDIA_NLU-flaubert_oral_ft](https://huggingface.co/vpelloin/MEDIA_NLU-flaubert_oral_ft): model trained with [`flaubert-oral-ft`](https://huggingface.co/nherve/flaubert-oral-ft) as it's inital checkpoint - [MEDIA_NLU-flaubert_oral_mixed](https://huggingface.co/vpelloin/MEDIA_NLU-flaubert_oral_mixed): model trained with [`flaubert-oral-mixed`](https://huggingface.co/nherve/flaubert-oral-mixed) as it's inital checkpoint - [MEDIA_NLU-flaubert_oral_asr](https://huggingface.co/vpelloin/MEDIA_NLU-flaubert_oral_asr): model trained with [`flaubert-oral-asr`](https://huggingface.co/nherve/flaubert-oral-asr) as it's inital checkpoint - [MEDIA_NLU-flaubert_oral_asr_nb](https://huggingface.co/vpelloin/MEDIA_NLU-flaubert_oral_asr_nb): model trained with [`flaubert-oral-asr_nb`](https://huggingface.co/nherve/flaubert-oral-asr_nb) as it's inital checkpoint ## Usage with Pipeline ```python from transformers import pipeline generator = pipeline(model="vpelloin/MEDIA_NLU-flaubert_oral_asr", task="token-classification") sentences = [ "je voudrais réserver une chambre à paris pour demain et lundi", "d'accord pour l'hôtel à quatre vingt dix euros la nuit", "deux nuits s'il vous plait", "dans un hôtel avec piscine à marseille" ] for sentence in sentences: print([(tok['word'], tok['entity']) for tok in generator(sentence)]) ``` ## Usage with AutoTokenizer/AutoModel ```python from transformers import ( AutoTokenizer, AutoModelForTokenClassification ) tokenizer = AutoTokenizer.from_pretrained("vpelloin/MEDIA_NLU-flaubert_oral_asr") model = AutoModelForTokenClassification.from_pretrained("vpelloin/MEDIA_NLU-flaubert_oral_asr") sentences = [ "je voudrais réserver une chambre à paris pour demain et lundi", "d'accord pour l'hôtel à quatre vingt dix euros la nuit", "deux nuits s'il vous plait", "dans un hôtel avec piscine à marseille" ] inputs = tokenizer(sentences, padding=True, return_tensors='pt') outptus = model(**inputs).logits print([[model.config.id2label[i] for i in b] for b in outptus.argmax(dim=-1).tolist()]) ``` ## Reference If you use this model for your scientific publication, or if you find the resources in this repository useful, please cite the [following paper](http://doi.org/10.21437/Interspeech.2022-352): ``` @inproceedings{pelloin22_interspeech, author={Valentin Pelloin and Franck Dary and Nicolas Hervé and Benoit Favre and Nathalie Camelin and Antoine LAURENT and Laurent Besacier}, title={ASR-Generated Text for Language Model Pre-training Applied to Speech Tasks}, year=2022, booktitle={Proc. Interspeech 2022}, pages={3453--3457}, doi={10.21437/Interspeech.2022-352} } ```