File size: 2,074 Bytes
3468dde
fba3a6a
3468dde
 
fba3a6a
 
 
 
 
 
19d53c2
 
3468dde
3291976
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e036794
 
3291976
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
---
language: en
license: mit
tags:
- vision
- image-to-text
- image-captioning
model_name: microsoft/git-base
pipeline_tag: image-to-text
library_name: transformers
datasets:
- ayoubkirouane/One-Piece-anime-captions
---


# Model Details
+ **Model Name**: Git-base-One-Piece
+ **Base Model**: Microsoft's "git-base" model
+ **Model Type**: Generative Image-to-Text (GIT)
+ **Fine-Tuned** On: 'One-Piece-anime-captions' dataset
+ **Fine-Tuning Purpose**: To generate text captions for images related to the anime series "One Piece."



## Model Description
**Git-base-One-Piece** is a fine-tuned variant of Microsoft's **git-base** model, specifically trained for the task of generating descriptive text captions for images from the **One-Piece-anime-captions**'** dataset. 

The dataset consists of **856 {image: caption}** pairs, providing a substantial and diverse training corpus for the model.

The model is conditioned on both CLIP image tokens and text tokens and employs a **teacher forcing** training approach. It predicts the next text token while considering the context provided by the image and previous text tokens.


![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/6338c06c107c4835a05699f9/N_yNK2tLabtwmSYAqpTEp.jpeg)

## Limitations
+ The quality of generated captions may vary depending on the complexity and diversity of images from the 'One-Piece-anime-captions' dataset.
+ The model's output is based on the data it was fine-tuned on, so it may not generalize well to images outside the dataset's domain.
Generating highly detailed or contextually accurate captions may still be a challenge.


## Usage 

```python
# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-to-text", model="ayoubkirouane/git-base-One-Piece")
```

**or**

```python
# Load model directly
from transformers import AutoProcessor, AutoModelForCausalLM

processor = AutoProcessor.from_pretrained("ayoubkirouane/git-base-One-Piece")
model = AutoModelForCausalLM.from_pretrained("ayoubkirouane/git-base-One-Piece")
```