Sentiment-xDistil is a model based on xtremedistil-l12-h384-uncased fine-tuned for classifying the sentiment of news headlines on a dataset annotated by Chat GPT 3.5. It is built, together with Topic-xDistil, as a tool for filtering out financial news headlines and classifying their sentiment. The code used to train both models and build the dataset are found here.

Notes: The output labels are either Negative, Neutral, or Positive. The model is suitable for English.

Performance Results

Here are the performance metrics for both models on the test set:

Model Test Set Size Accuracy F1 Score
topic-xdistil-uncased 32 799 94.44 % 92.59 %
sentiment-xdistil-uncased 17 527 94.59 % 93.44 %

Data

The training data consists of 300k+ news headlines and tweets, and was annotated by Chat GPT 3.5, which has shown to outperform crowd-workers for text annotation tasks.

The sentence labels are defined by the Chat GPT prompt as follows:

"""
[...]
Does the headline convey a Positive, Neutral, or Negative sentiment with \
regard to the current state or potential future impact on the economy or \
the asset described?
    - Positive sentiment headlines suggest growth, improvement, or \
stability in economic conditions.
    - Neutral sentiment headlines do not clearly indicate a positive or \
negative impact on the economy.
    - Negative sentiment headlines imply economic decline, uncertainty, \
or unfavorable conditions.
[...]
"""

Example Usage

Here's a simple example:

from transformers import AutoTokenizer, AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained("hakonmh/sentiment-xdistil-uncased")
tokenizer = AutoTokenizer.from_pretrained("hakonmh/sentiment-xdistil-uncased")

SENTENCE = "Global Growth Surges as New Technologies Drive Innovation and Productivity!"
inputs = tokenizer(SENTENCE, return_tensors="pt")
output = model(**inputs).logits
predicted_label = model.config.id2label[output.argmax(-1).item()]

print(predicted_label)
Positive

Or, as a pipeline together with Topic-xDistil:

from transformers import pipeline

topic_classifier = pipeline("sentiment-analysis",
                            model="hakonmh/topic-xdistil-uncased",
                            tokenizer="hakonmh/topic-xdistil-uncased")
sentiment_classifier = pipeline("sentiment-analysis",
                                model="hakonmh/sentiment-xdistil-uncased",
                                tokenizer="hakonmh/sentiment-xdistil-uncased")

SENTENCE = "Global Growth Surges as New Technologies Drive Innovation and Productivity!"
print(topic_classifier(SENTENCE))
print(sentiment_classifier(SENTENCE))
[{'label': 'Economics', 'score': 0.9970171451568604}]
[{'label': 'Positive', 'score': 0.9997037053108215}]

Tested on transformers 4.30.1, and torch 2.0.0.

Downloads last month
41
Safetensors
Model size
33.4M params
Tensor type
I64
·
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.