Topic-xDistil is a model based on xtremedistil-l12-h384-uncased fine-tuned for classifying the topic of news headlines on a dataset annotated by Chat GPT 3.5. It is built, together with Sentiment-xDistil, as a tool for filtering out financial news headlines and classifying their sentiment. The code used to train both models and build the dataset are found here.

Notes: The output labels are either Economics or Other. The model is suitable for English.

Performance Results

Here are the performance metrics for both models on the test set:

Model Test Set Size Accuracy F1 Score
topic-xdistil-uncased 32 799 94.44 % 92.59 %
sentiment-xdistil-uncased 17 527 94.59 % 93.44 %

Data

The training data consists of ~600k news headlines and tweets, and was annotated by Chat GPT 3.5, which has shown to outperform crowd-workers for text annotation tasks.

The sentence labels are defined by the Chat GPT prompt as follows:

"""
[...]
    - Economic headlines generally cover topics such as financial markets, \
 business, financial assets, trade, employment, GDP, inflation, or fiscal \
and monetary policy.
    - Non-economic headlines might include sports, entertainment, politics, \
science, weather, health, or other unrelated news events.
[...]
"""

Example Usage

Here's a simple example:

from transformers import AutoTokenizer, AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained("hakonmh/topic-xdistil-uncased")
tokenizer = AutoTokenizer.from_pretrained("hakonmh/topic-xdistil-uncased")

SENTENCE = "Global Growth Surges as New Technologies Drive Innovation and Productivity!"
inputs = tokenizer(SENTENCE, return_tensors="pt")
output = model(**inputs).logits
predicted_label = model.config.id2label[output.argmax(-1).item()]

print(predicted_label)
Economics

Or, as a pipeline together with Sentiment-xDistil:

from transformers import pipeline

topic_classifier = pipeline("sentiment-analysis",
                            model="hakonmh/topic-xdistil-uncased",
                            tokenizer="hakonmh/topic-xdistil-uncased")
sentiment_classifier = pipeline("sentiment-analysis",
                                model="hakonmh/sentiment-xdistil-uncased",
                                tokenizer="hakonmh/sentiment-xdistil-uncased")

SENTENCE = "Global Growth Surges as New Technologies Drive Innovation and Productivity!"
print(topic_classifier(SENTENCE))
print(sentiment_classifier(SENTENCE))
[{'label': 'Economics', 'score': 0.9970171451568604}]
[{'label': 'Positive', 'score': 0.9997037053108215}]

Tested on transformers 4.30.1, and torch 2.0.0.

Downloads last month
35
Safetensors
Model size
33.4M params
Tensor type
I64
·
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.