|
--- |
|
datasets: |
|
- cardiffnlp/tweet_topic_multi |
|
metrics: |
|
- f1 |
|
- accuracy |
|
model-index: |
|
- name: cardiffnlp/roberta-base-tweet-topic-multi-2020 |
|
results: |
|
- task: |
|
type: text-classification |
|
name: Text Classification |
|
dataset: |
|
name: cardiffnlp/tweet_topic_multi |
|
type: cardiffnlp/tweet_topic_multi |
|
args: cardiffnlp/tweet_topic_multi |
|
split: test_2021 |
|
metrics: |
|
- name: F1 |
|
type: f1 |
|
value: 0.7252289758534556 |
|
- name: F1 (macro) |
|
type: f1_macro |
|
value: 0.5612608131902519 |
|
- name: Accuracy |
|
type: accuracy |
|
value: 0.4991066110780226 |
|
pipeline_tag: text-classification |
|
widget: |
|
- text: "I'm sure the {@Tampa Bay Lightning@} would’ve rather faced the Flyers but man does their experience versus the Blue Jackets this year and last help them a lot versus this Islanders team. Another meat grinder upcoming for the good guys" |
|
example_title: "Example 1" |
|
- text: "Love to take night time bike rides at the jersey shore. Seaside Heights boardwalk. Beautiful weather. Wishing everyone a safe Labor Day weekend in the US." |
|
example_title: "Example 2" |
|
--- |
|
# cardiffnlp/roberta-base-tweet-topic-multi-2020 |
|
|
|
This model is a fine-tuned version of [roberta-base](https://huggingface.co/roberta-base) on the [tweet_topic_multi](https://huggingface.co/datasets/cardiffnlp/tweet_topic_multi). This model is fine-tuned on `train_2020` split and validated on `test_2021` split of tweet_topic. |
|
Fine-tuning script can be found [here](https://huggingface.co/datasets/cardiffnlp/tweet_topic_multi/blob/main/lm_finetuning.py). It achieves the following results on the test_2021 set: |
|
|
|
- F1 (micro): 0.7252289758534556 |
|
- F1 (macro): 0.5612608131902519 |
|
- Accuracy: 0.4991066110780226 |
|
|
|
|
|
### Usage |
|
|
|
```python |
|
import math |
|
import torch |
|
from transformers import AutoModelForSequenceClassification, AutoTokenizer |
|
|
|
def sigmoid(x): |
|
return 1 / (1 + math.exp(-x)) |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("cardiffnlp/roberta-base-tweet-topic-multi-2020") |
|
model = AutoModelForSequenceClassification.from_pretrained("cardiffnlp/roberta-base-tweet-topic-multi-2020", problem_type="multi_label_classification") |
|
model.eval() |
|
class_mapping = model.config.id2label |
|
|
|
with torch.no_grad(): |
|
text = #NewVideo Cray Dollas- Water- Ft. Charlie Rose- (Official Music Video)- {{URL}} via {@YouTube@} #watchandlearn {{USERNAME}} |
|
tokens = tokenizer(text, return_tensors='pt') |
|
output = model(**tokens) |
|
flags = [sigmoid(s) > 0.5 for s in output[0][0].detach().tolist()] |
|
topic = [class_mapping[n] for n, i in enumerate(flags) if i] |
|
print(topic) |
|
``` |
|
|
|
### Reference |
|
|
|
``` |
|
|
|
@inproceedings{dimosthenis-etal-2022-twitter, |
|
title = "{T}witter {T}opic {C}lassification", |
|
author = "Antypas, Dimosthenis and |
|
Ushio, Asahi and |
|
Camacho-Collados, Jose and |
|
Neves, Leonardo and |
|
Silva, Vitor and |
|
Barbieri, Francesco", |
|
booktitle = "Proceedings of the 29th International Conference on Computational Linguistics", |
|
month = oct, |
|
year = "2022", |
|
address = "Gyeongju, Republic of Korea", |
|
publisher = "International Committee on Computational Linguistics" |
|
} |
|
|
|
``` |
|
|