Model Card for uvegesistvan/wildmann_german_proposal_2b_german_to_czech

Model Overview

This model is a multi-class emotion classifier trained on German-to-Czech machine-translated text data. It identifies nine distinct emotional states in text and demonstrates how machine-translated datasets can support emotion classification tasks across different languages.

Emotion Classes

The model classifies the following emotional states:

Anger (0)
Fear (1)
Disgust (2)
Sadness (3)
Joy (4)
Enthusiasm (5)
Hope (6)
Pride (7)
No emotion (8)

Dataset and Preprocessing

The dataset includes German text machine-translated into Czech and annotated for emotional content. Both synthetic and original German sentences were translated to create a diverse corpus. Preprocessing steps included:

Balancing classes through undersampling of overrepresented labels, such as "No emotion" and "Anger."
Normalization of text to handle inconsistencies from the machine translation process.

Evaluation Metrics

The model's performance was evaluated using standard classification metrics. Results are summarized below:

Class	Precision	Recall	F1-Score	Support
Anger (0)	0.50	0.63	0.56	777
Fear (1)	0.84	0.74	0.79	776
Disgust (2)	0.91	0.94	0.93	776
Sadness (3)	0.87	0.83	0.85	775
Joy (4)	0.83	0.81	0.82	777
Enthusiasm (5)	0.61	0.61	0.61	776
Hope (6)	0.54	0.46	0.50	777
Pride (7)	0.75	0.81	0.78	776
No emotion (8)	0.66	0.64	0.65	1553

Overall Metrics

Accuracy: 0.71
Macro Average: Precision = 0.72, Recall = 0.72, F1-Score = 0.72
Weighted Average: Precision = 0.72, Recall = 0.71, F1-Score = 0.71

Performance Insights

The model performs well across most classes, particularly in "Disgust" and "Fear." However, classes such as "Hope" exhibit lower F1-scores, potentially due to translation noise or subtle emotional cues being lost in machine translation.

Model Usage

Applications

Emotion analysis of German texts translated into Czech.
Sentiment tracking in Czech-language customer feedback derived from German text.
Research on cross-linguistic emotion classification in multilingual datasets.

Limitations

The model's performance is influenced by the quality of the machine-translated text, which may introduce biases or inaccuracies.
Subtle emotional states like "Hope" may be harder to classify due to translation inconsistencies.

Ethical Considerations

The reliance on machine-translated datasets means that cultural and linguistic nuances may be lost, potentially impacting classification accuracy. Users should carefully evaluate the model before applying it in sensitive areas, such as mental health or customer sentiment analysis.

Citation

For further information, visit: uvegesistvan/wildmann_german_proposal_2b_german_to_czech