Transformers
GGUF
English
Inference Endpoints
File size: 2,476 Bytes
19046ef
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
---
language:
- en
license: mit
library_name: transformers
datasets:
- liuhaotian/LLaVA-Instruct-150K
- liuhaotian/LLaVA-Pretrain
---

# Model Card for LLaVa-Phi-2-3B-GGUF

<!-- Provide a quick summary of what the model is/does. -->

## Model Details

### Model Description

<!-- Provide a longer summary of what this model is. -->

Quantized version of [llava-phi-2-3b](https://huggingface.co/marianna13/llava-phi-2-3b). Quantization was done using [llama.cpp](https://github.com/ggerganov/llama.cpp/tree/master/examples/llava)


- **Developed by:** [LAION](https://laion.ai/), [SkunkworksAI](https://huggingface.co/SkunkworksAI) & [Ontocord](https://www.ontocord.ai/)
- **Model type:** LLaVA is an open-source chatbot trained by fine-tuning Phi-2 on GPT-generated multimodal instruction-following data.
It is an auto-regressive language model, based on the transformer architecture
- **Finetuned from model:** [Phi-2](https://huggingface.co/microsoft/phi-2)
- **License:** MIT

### Model Sources

<!-- Provide the basic links for the model. -->

- **Repository:** [BakLLaVa](https://github.com/SkunkworksAI/BakLLaVA)
- **LLama.cpp:** [GitHub](https://github.com/ggerganov/llama.cpp)

## Usage

```
make & ./llava-cli -m ../ggml-model-f16.gguf --mmproj ../mmproj-model-f16.gguf --image /path/to/image.jpg
```

## Evaluation

<!-- This section describes the evaluation protocols and provides the results. -->

### Benchmarks

| Model | Parameters |SQA | GQA | TextVQA | POPE |
| --- | --- | --- | --- | --- | --- |
| [LLaVA-1.5](https://huggingface.co/liuhaotian/llava-v1.5-7b) |  7.3B | 68.0| **62.0** | **58.3** | 85.3 |
| [MC-LLaVA-3B](https://huggingface.co/visheratin/MC-LLaVA-3b) | 3B | - | 49.6 | 38.59 | - |
| [LLaVA-Phi](https://arxiv.org/pdf/2401.02330.pdf) | 3B | 68.4 | - | 48.6 | 85.0 |
| [moondream1](https://huggingface.co/vikhyatk/moondream1) | 1.6B | - | 56.3 | 39.8 | - |
| **llava-phi-2-3b** | 3B | **69.0** | 51.2 | 47.0 | **86.0** |

### Image Captioning (MS COCO)

| Model                                                    | BLEU_1 | BLEU_2 | BLEU_3 | BLEU_4 | METEOR | ROUGE_L | CIDEr | SPICE |
| -------------------------------------------------------- | ------ | ------ | ------ | ------ | ------ | ------- | ----- | ----- |
| llava-1.5-7b                                             | 75.8   | 59.8   | 45     | 33.3   | 29.4   | 57.7    | 108.8 | 23.5  |
| **llava-phi-2-3b** | 67.7   | 50.5   | 35.7   | 24.2   | 27.0   | 52.4   | 85.0  | 20.7  |