Model Card for uvegesistvan/wildmann_german_proposal_2b_GER_ENG_HU

Model Overview

This model is a multi-class emotion classifier trained on German text that was first machine-translated into English as an intermediary language and then into Czech. It identifies nine distinct emotional states in text. The training process leverages a multilingual dataset to explore the impact of multi-step machine translation on emotion classification.

Emotion Classes

The model classifies the following emotional states:

  • Anger (0)
  • Fear (1)
  • Disgust (2)
  • Sadness (3)
  • Joy (4)
  • Enthusiasm (5)
  • Hope (6)
  • Pride (7)
  • No emotion (8)

Dataset and Preprocessing

The dataset was created using a three-step machine translation process: German → English → Czech. Emotional annotations were applied after the final translation to ensure consistency. Preprocessing steps included:

  • Balancing the dataset through undersampling overrepresented classes like "No emotion" and "Anger."
  • Normalizing text to mitigate noise introduced by multi-step translations.

Evaluation Metrics

The model's performance was evaluated using standard classification metrics. Results are detailed below:

Class Precision Recall F1-Score Support
Anger (0) 0.54 0.55 0.55 777
Fear (1) 0.83 0.75 0.79 776
Disgust (2) 0.90 0.95 0.92 776
Sadness (3) 0.85 0.83 0.84 775
Joy (4) 0.85 0.79 0.82 777
Enthusiasm (5) 0.64 0.61 0.62 776
Hope (6) 0.48 0.58 0.52 777
Pride (7) 0.74 0.79 0.77 776
No emotion (8) 0.66 0.62 0.64 1553

Overall Metrics

  • Accuracy: 0.71
  • Macro Average: Precision = 0.72, Recall = 0.72, F1-Score = 0.72
  • Weighted Average: Precision = 0.71, Recall = 0.71, F1-Score = 0.71

Performance Insights

The model shows robust performance in classes such as "Disgust" and "Fear." However, the "Hope" class underperforms, likely due to subtleties being lost in the multi-step translation process. Despite these challenges, the model demonstrates overall strong accuracy across most classes.

Model Usage

Applications

  • Emotion analysis of German texts via machine-translated Czech representations.
  • Sentiment analysis for Czech-language datasets derived from multilingual sources.
  • Research on the effects of multi-step machine translation in emotion classification.

Limitations

  • The multi-step translation process introduces additional noise, which may impact classification accuracy for subtle or ambiguous emotions.
  • Emotional nuances and cultural context might be lost during translation.

Ethical Considerations

The reliance on multi-step machine translation can amplify biases or inaccuracies introduced at each stage. Careful validation is recommended before applying the model in sensitive areas such as mental health, social research, or customer feedback analysis.

Citation

For further information, visit: uvegesistvan/wildmann_german_proposal_2b_GER_ENG_HU

Downloads last month
3
Safetensors
Model size
560M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.