--- tags: - Transformers - text-classification - intent-classification - multi-class-classification - natural-language-understanding languages: - af-ZA - am-ET - ar-SA - az-AZ - bn-BD - cy-GB - da-DK - de-DE - el-GR - en-US - es-ES - fa-IR - fi-FI - fr-FR - he-IL - hi-IN - hu-HU - hy-AM - id-ID - is-IS - it-IT - ja-JP - jv-ID - ka-GE - km-KH - kn-IN - ko-KR - lv-LV - ml-IN - mn-MN - ms-MY - my-MM - nb-NO - nl-NL - pl-PL - pt-PT - ro-RO - ru-RU - sl-SL - sq-AL - sv-SE - sw-KE - ta-IN - te-IN - th-TH - tl-PH - tr-TR - ur-PK - vi-VN - zh-CN - zh-TW multilinguality: - af-ZA - am-ET - ar-SA - az-AZ - bn-BD - cy-GB - da-DK - de-DE - el-GR - en-US - es-ES - fa-IR - fi-FI - fr-FR - he-IL - hi-IN - hu-HU - hy-AM - id-ID - is-IS - it-IT - ja-JP - jv-ID - ka-GE - km-KH - kn-IN - ko-KR - lv-LV - ml-IN - mn-MN - ms-MY - my-MM - nb-NO - nl-NL - pl-PL - pt-PT - ro-RO - ru-RU - sl-SL - sq-AL - sv-SE - sw-KE - ta-IN - te-IN - th-TH - tl-PH - tr-TR - ur-PK - vi-VN - zh-CN - zh-TW datasets: - qanastek/MASSIVE widget: - text: "wake me up at five am this week" - text: "je veux écouter la chanson de jacques brel encore une fois" - text: "quiero escuchar la canción de arijit singh una vez más" - text: "olly onde é que á um parque por perto onde eu possa correr" - text: "פרק הבא בפודקאסט בבקשה" - text: "亚马逊股价" - text: "найди билет на поезд в санкт-петербург" license: cc-by-4.0 --- **People Involved** * [LABRAK Yanis](https://www.linkedin.com/in/yanis-labrak-8a7412145/) (1) **Affiliations** 1. [LIA, NLP team](https://lia.univ-avignon.fr/), Avignon University, Avignon, France. ## Demo: How to use in HuggingFace Transformers Pipeline Requires [transformers](https://pypi.org/project/transformers/): ```pip install transformers``` ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification, TextClassificationPipeline model_name = 'qanastek/XLMRoberta-Alexa-Intents-Classification' tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name) classifier = TextClassificationPipeline(model=model, tokenizer=tokenizer) res = classifier("réveille-moi à neuf heures du matin le vendredi") print(res) ``` Outputs: ```python [{'label': 'alarm_set', 'score': 0.9998375177383423}] ``` ## Training data [MASSIVE](https://huggingface.co/datasets/qanastek/MASSIVE) is a parallel dataset of > 1M utterances across 51 languages with annotations for the Natural Language Understanding tasks of intent prediction and slot annotation. Utterances span 60 intents and include 55 slot types. MASSIVE was created by localizing the SLURP dataset, composed of general Intelligent Voice Assistant single-shot interactions. ## Intents * audio_volume_other * play_music * iot_hue_lighton * general_greet * calendar_set * audio_volume_down * social_query * audio_volume_mute * iot_wemo_on * iot_hue_lightup * audio_volume_up * iot_coffee * takeaway_query * qa_maths * play_game * cooking_query * iot_hue_lightdim * iot_wemo_off * music_settings * weather_query * news_query * alarm_remove * social_post * recommendation_events * transport_taxi * takeaway_order * music_query * calendar_query * lists_query * qa_currency * recommendation_movies * general_joke * recommendation_locations * email_querycontact * lists_remove * play_audiobook * email_addcontact * lists_createoradd * play_radio * qa_stock * alarm_query * email_sendemail * general_quirky * music_likeness * cooking_recipe * email_query * datetime_query * transport_traffic * play_podcasts * iot_hue_lightchange * calendar_remove * transport_query * transport_ticket * qa_factoid * iot_cleaning * alarm_set * datetime_convert * iot_hue_lightoff * qa_definition * music_dislikeness ## Evaluation results ```plain precision recall f1-score support alarm_query 0.9661 0.9037 0.9338 1734 alarm_remove 0.9484 0.9608 0.9545 1071 alarm_set 0.8611 0.9254 0.8921 2091 audio_volume_down 0.8657 0.9537 0.9075 561 audio_volume_mute 0.8608 0.9130 0.8861 1632 audio_volume_other 0.8684 0.5392 0.6653 306 audio_volume_up 0.7198 0.8446 0.7772 663 calendar_query 0.7555 0.8229 0.7878 6426 calendar_remove 0.8688 0.9441 0.9049 3417 calendar_set 0.9092 0.9014 0.9053 10659 cooking_query 0.0000 0.0000 0.0000 0 cooking_recipe 0.9282 0.8592 0.8924 3672 datetime_convert 0.8144 0.7686 0.7909 765 datetime_query 0.9152 0.9305 0.9228 4488 email_addcontact 0.6482 0.8431 0.7330 612 email_query 0.9629 0.9319 0.9472 6069 email_querycontact 0.6853 0.8032 0.7396 1326 email_sendemail 0.9530 0.9381 0.9455 5814 general_greet 0.1026 0.3922 0.1626 51 general_joke 0.9305 0.9123 0.9213 969 general_quirky 0.6984 0.5417 0.6102 8619 iot_cleaning 0.9590 0.9359 0.9473 1326 iot_coffee 0.9304 0.9749 0.9521 1836 iot_hue_lightchange 0.8794 0.9374 0.9075 1836 iot_hue_lightdim 0.8695 0.8711 0.8703 1071 iot_hue_lightoff 0.9440 0.9229 0.9334 2193 iot_hue_lighton 0.4545 0.5882 0.5128 153 iot_hue_lightup 0.9271 0.8315 0.8767 1377 iot_wemo_off 0.9615 0.8715 0.9143 918 iot_wemo_on 0.8455 0.7941 0.8190 510 lists_createoradd 0.8437 0.8356 0.8396 1989 lists_query 0.8918 0.8335 0.8617 2601 lists_remove 0.9536 0.8601 0.9044 2652 music_dislikeness 0.7725 0.7157 0.7430 204 music_likeness 0.8570 0.8159 0.8359 1836 music_query 0.8667 0.8050 0.8347 1785 music_settings 0.4024 0.3301 0.3627 306 news_query 0.8343 0.8657 0.8498 6324 play_audiobook 0.8172 0.8125 0.8149 2091 play_game 0.8666 0.8403 0.8532 1785 play_music 0.8683 0.8845 0.8763 8976 play_podcasts 0.8925 0.9125 0.9024 3213 play_radio 0.8260 0.8935 0.8585 3672 qa_currency 0.9459 0.9578 0.9518 1989 qa_definition 0.8638 0.8552 0.8595 2907 qa_factoid 0.7959 0.8178 0.8067 7191 qa_maths 0.8937 0.9302 0.9116 1275 qa_stock 0.7995 0.9412 0.8646 1326 recommendation_events 0.7646 0.7702 0.7674 2193 recommendation_locations 0.7489 0.8830 0.8104 1581 recommendation_movies 0.6907 0.7706 0.7285 1020 social_post 0.9623 0.9080 0.9344 4131 social_query 0.8104 0.7914 0.8008 1275 takeaway_order 0.7697 0.8458 0.8059 1122 takeaway_query 0.9059 0.8571 0.8808 1785 transport_query 0.8141 0.7559 0.7839 2601 transport_taxi 0.9222 0.9403 0.9312 1173 transport_ticket 0.9259 0.9384 0.9321 1785 transport_traffic 0.6919 0.9660 0.8063 765 weather_query 0.9387 0.9492 0.9439 7956 accuracy 0.8617 151674 macro avg 0.8162 0.8273 0.8178 151674 weighted avg 0.8639 0.8617 0.8613 151674 ```