vpelloin commited on
Commit
8caa5ba
·
1 Parent(s): f8c1264

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +83 -0
README.md ADDED
@@ -0,0 +1,83 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ---
3
+ language: fr
4
+ pipeline_tag: "token-classification"
5
+ widget:
6
+ - text: "je voudrais réserver une chambre à paris pour demain et lundi"
7
+ - text: "d'accord pour l'hôtel à quatre vingt dix euros la nuit"
8
+ - text: "deux nuits s'il vous plait"
9
+ - text: "dans un hôtel avec piscine à marseille"
10
+ tags:
11
+ - bert
12
+ - flaubert
13
+ - natural language understanding
14
+ - NLU
15
+ - spoken language understanding
16
+ - SLU
17
+ - understanding
18
+ - MEDIA
19
+ ---
20
+
21
+ # vpelloin/MEDIA_NLU-flaubert_oral_asr
22
+ This is a Natural Language Understanding (NLU) model for the French [MEDIA benchmark](https://catalogue.elra.info/en-us/repository/browse/ELRA-S0272/).
23
+ It maps each input words into outputs concepts tags (76 available).
24
+
25
+ This model is trained with [`flaubert-oral-asr`](https://huggingface.co/nherve/flaubert-oral-asr) as it's inital checkpoint.
26
+
27
+ Available MEDIA NLU models:
28
+ - [MEDIA_NLU-flaubert_base_cased](https://huggingface.co/vpelloin/MEDIA_NLU-flaubert_base_cased): model trained with [`flaubert_base_cased`](https://huggingface.co/flaubert/flaubert_base_cased) as it's inital checkpoint
29
+ - [MEDIA_NLU-flaubert_base_uncased](https://huggingface.co/vpelloin/MEDIA_NLU-flaubert_base_uncased): model trained with [`flaubert_base_uncased`](https://huggingface.co/flaubert/flaubert_base_uncased) as it's inital checkpoint
30
+ - [MEDIA_NLU-flaubert_oral_ft](https://huggingface.co/vpelloin/MEDIA_NLU-flaubert_oral_ft): model trained with [`flaubert-oral-ft`](https://huggingface.co/nherve/flaubert-oral-ft) as it's inital checkpoint
31
+ - [MEDIA_NLU-flaubert_oral_mixed](https://huggingface.co/vpelloin/MEDIA_NLU-flaubert_oral_mixed): model trained with [`flaubert-oral-mixed`](https://huggingface.co/nherve/flaubert-oral-mixed) as it's inital checkpoint
32
+ - [MEDIA_NLU-flaubert_oral_asr](https://huggingface.co/vpelloin/MEDIA_NLU-flaubert_oral_asr): model trained with [`flaubert-oral-asr`](https://huggingface.co/nherve/flaubert-oral-asr) as it's inital checkpoint
33
+ - [MEDIA_NLU-flaubert_oral_asr_nb](https://huggingface.co/vpelloin/MEDIA_NLU-flaubert_oral_asr_nb): model trained with [`flaubert-oral-asr_nb`](https://huggingface.co/nherve/flaubert-oral-asr_nb) as it's inital checkpoint
34
+
35
+ ## Usage with Pipeline
36
+ ```python
37
+ from transformers import pipeline
38
+
39
+ generator = pipeline(model="vpelloin/MEDIA_NLU-flaubert_oral_asr", task="token-classification")
40
+ sentences = [
41
+ "je voudrais réserver une chambre à paris pour demain et lundi",
42
+ "d'accord pour l'hôtel à quatre vingt dix euros la nuit",
43
+ "deux nuits s'il vous plait",
44
+ "dans un hôtel avec piscine à marseille"
45
+ ]
46
+
47
+ for sentence in sentences:
48
+ print([(tok['word'], tok['entity']) for tok in generator(sentence)])
49
+ ```
50
+ ## Usage with AutoTokenizer/AutoModel
51
+ ```python
52
+ from transformers import (
53
+ AutoTokenizer,
54
+ AutoModelForTokenClassification
55
+ )
56
+ tokenizer = AutoTokenizer.from_pretrained("vpelloin/MEDIA_NLU-flaubert_oral_asr")
57
+ model = AutoModelForTokenClassification.from_pretrained("vpelloin/MEDIA_NLU-flaubert_oral_asr")
58
+
59
+ sentences = [
60
+ "je voudrais réserver une chambre à paris pour demain et lundi",
61
+ "d'accord pour l'hôtel à quatre vingt dix euros la nuit",
62
+ "deux nuits s'il vous plait",
63
+ "dans un hôtel avec piscine à marseille"
64
+ ]
65
+ inputs = tokenizer(sentences, padding=True, return_tensors='pt')
66
+ outptus = model(**inputs).logits
67
+ print([[model.config.id2label[i] for i in b] for b in outptus.argmax(dim=-1).tolist()])
68
+ ```
69
+
70
+ ## Reference
71
+
72
+ If you use this model for your scientific publication, or if you find the resources in this repository useful, please cite the [following paper](http://doi.org/10.21437/Interspeech.2022-352):
73
+ ```
74
+ @inproceedings{pelloin22_interspeech,
75
+ author={Valentin Pelloin and Franck Dary and Nicolas Hervé and Benoit Favre and Nathalie Camelin and Antoine LAURENT and Laurent Besacier},
76
+ title={ASR-Generated Text for Language Model Pre-training Applied to Speech Tasks},
77
+ year=2022,
78
+ booktitle={Proc. Interspeech 2022},
79
+ pages={3453--3457},
80
+ doi={10.21437/Interspeech.2022-352}
81
+ }
82
+ ```
83
+