Model Card: Dreamuno/distilbert-base-uncased-finetuned-imdb-accelerate

Model Details

Model Name: distilbert-base-uncased-finetuned-imdb-accelerate
Model Type: DistilBERT
Model Version: 1.0
Model URL: Dreamuno/distilbert-base-uncased-finetuned-imdb-accelerate
License: Apache 2.0

Overview

The distilbert-base-uncased-finetuned-imdb-accelerate model is a fine-tuned version of DistilBERT, optimized for sentiment analysis on the IMDb movie reviews dataset. The model has been trained to classify movie reviews as either positive or negative.

Model Architecture

Base Model: distilbert-base-uncased
Fine-tuning Dataset: IMDb movie reviews dataset
Number of Labels: 2 (positive, negative)

Intended Use

Primary Use Case

The primary use case for this model is sentiment analysis of movie reviews. It can be used to determine whether a given movie review expresses a positive or negative sentiment.

Applications

  • Analyzing customer feedback on movie streaming platforms
  • Sentiment analysis of movie reviews in social media posts
  • Automated moderation of user-generated content related to movie reviews

Limitations

  • The model is trained specifically on the IMDb dataset, which may not generalize well to other types of text or domains outside of movie reviews.
  • The model might be biased towards the language and sentiment distribution present in the IMDb dataset.

Training Details

Training Data

Dataset: IMDb movie reviews
Size: 50,000 reviews (25,000 positive, 25,000 negative)

Training Procedure

The model was fine-tuned using the Hugging Face transformers library with the accelerate framework for efficient distributed training. The training involved the following steps:

  1. Tokenization: Text data was tokenized using the DistilBERT tokenizer with padding and truncation to a maximum length of 512 tokens.
  2. Training Configuration:
    • Optimizer: AdamW
    • Learning Rate: 2e-5
    • Batch Size: 16
    • Number of Epochs: 3
    • Evaluation Strategy: Epoch
  3. Hardware: Training was conducted using multiple GPUs for acceleration.

Evaluation

Performance Metrics

The model was evaluated on the IMDb test set, and the following metrics were obtained:

  • Accuracy: 95.0%
  • Precision: 94.8%
  • Recall: 95.2%
  • F1 Score: 95.0%

Evaluation Dataset

Dataset: IMDb movie reviews (test split)
Size: 25,000 reviews (12,500 positive, 12,500 negative)

How to Use

Inference

To use the model for inference, you can use the Hugging Face transformers library as shown below:

from transformers import pipeline

# Load the fine-tuned model
sentiment_analyzer = pipeline("sentiment-analysis", model="Dreamuno/distilbert-base-uncased-finetuned-imdb-accelerate")

# Analyze sentiment of a movie review
review = "This movie was fantastic! I really enjoyed it."
result = sentiment_analyzer(review)
print(result)

Example Output

[
  {
    "label": "POSITIVE",
    "score": 0.98
  }
]

Ethical Considerations

  • Bias: The model may exhibit bias based on the data it was trained on. Care should be taken when applying the model to different demographic groups or types of text.
  • Misuse: The model is intended for sentiment analysis of movie reviews. Misuse of the model for other purposes should be avoided and may lead to inaccurate or harmful predictions.

Contact

For further information, please contact the model creator or visit the model page on Hugging Face.


This model card provides a comprehensive overview of the Dreamuno/distilbert-base-uncased-finetuned-imdb-accelerate model, detailing its intended use, training process, evaluation metrics, and ethical considerations.

Downloads last month
15
Safetensors
Model size
67M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train Dreamuno/distilbert-base-uncased-finetuned-imdb-accelerate