---
library_name: transformers
license: apache-2.0
base_model: google/vit-base-patch16-224-in21k
tags:
- image-classification
- generated_from_trainer
datasets:
- imagefolder
- FastJobs/Visual_Emotional_Analysis
metrics:
- accuracy
model-index:
- name: vit-emotion-classification
  results:
  - task:
      name: Image Classification
      type: image-classification
    dataset:
      name: FastJobs/Visual_Emotional_Analysis
      type: imagefolder
      config: default
      split: train
      args: default
    metrics:
    - name: Accuracy
      type: accuracy
      value: 0.6125
pipeline_tag: image-classification
widget:
- src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/tiger.jpg
  example_title: Tiger
- src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/teapot.jpg
  example_title: Teapot
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# vit-emotion-classification

This model is a fine-tuned version of [google/vit-base-patch16-224-in21k](https://huggingface.co/google/vit-base-patch16-224-in21k) on the [FastJobs/Visual_Emotional_Analysis](https://huggingface.co/datasets/FastJobs/Visual_Emotional_Analysis) dataset.
It achieves the following results on the evaluation set:
- Loss: 1.3802
- Accuracy: 0.6125

## Intended uses & limitations

### Intended Uses
- Emotion classification from visual inputs (images).

### Limitations
- May reflect biases from the training dataset.
- Performance may degrade in domains outside the training data.
- Not suitable for critical or sensitive decision-making tasks.

## Training and evaluation data

This model was trained on the [FastJobs/Visual_Emotional_Analysis](https://huggingface.co/datasets/FastJobs/Visual_Emotional_Analysis) dataset. 

The dataset contains:
- **800 images** annotated with **8 emotion labels**:
  - Anger
  - Contempt
  - Disgust
  - Fear
  - Happy
  - Neutral
  - Sad
  - Surprise


## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 0.0002
- train_batch_size: 16
- eval_batch_size: 8
- seed: 42
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- num_epochs: 10
- mixed_precision_training: Native AMP

### Training results

| Training Loss | Epoch | Step | Validation Loss | Accuracy |
|:-------------:|:-----:|:----:|:---------------:|:--------:|
| 0.8454        | 2.5   | 100  | 1.4373          | 0.4813   |
| 0.2022        | 5.0   | 200  | 1.4067          | 0.55     |
| 0.0474        | 7.5   | 300  | 1.3802          | 0.6125   |
| 0.0368        | 10.0  | 400  | 1.4388          | 0.5938   |

##  How to use this model
```python
from transformers import AutoImageProcessor, ViTForImageClassification
import torch
from PIL import Image
import requests

from huggingface_hub import login
login(api_key)

image = Image.open("image.jpg").convert("RGB")

image_processor = AutoImageProcessor.from_pretrained("digo-prayudha/vit-emotion-classification")
model = ViTForImageClassification.from_pretrained("digo-prayudha/vit-emotion-classification")

inputs = image_processor(image, return_tensors="pt")

with torch.no_grad():
    logits = model(**inputs).logits

predicted_label = logits.argmax(-1).item()
print(model.config.id2label[predicted_label])
```
### Framework versions

- Transformers 4.47.1
- Pytorch 2.5.1+cu121
- Datasets 3.2.0
- Tokenizers 0.21.0