runninglsy
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -16,24 +16,32 @@ Ovis is a novel Multimodal Large Language Model (MLLM) architecture, designed to
|
|
16 |
</div>
|
17 |
|
18 |
## Model
|
19 |
-
As always, Ovis1.5 remains fully open-source: we release the
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
20 |
|
21 |
-
| | MiniCPM-Llama3-V2.5 | Ovis1.5-Llama3-8B |
|
22 |
-
|:------------------|-------------------------------------------------------------------:|-------------------------------------------------------------------:|
|
23 |
-
| Training scripts | - | [Github](https://github.com/AIDC-AI/Ovis/tree/main/scripts/v1_5) |
|
24 |
-
| ViT | Siglip-400M | Siglip-400M |
|
25 |
-
| LLM | Llama3-8B-Instruct | Llama3-8B-Instruct |
|
26 |
-
| MMTBench-VAL | 57.6 | **60.7** |
|
27 |
-
| MMBench-EN-V1.1 | 74 | **78.2** |
|
28 |
-
| MMBench-CN-V1.1 | 70.1 | **75.2** |
|
29 |
-
| MMStar | 51.8 | **57.2** |
|
30 |
-
| MMMU-Val | 45.8 | **48.6** |
|
31 |
-
| MathVista-Mini | 54.3 | **62.4** |
|
32 |
-
| HallusionBenchAvg | 42.4 | **44.5** |
|
33 |
-
| AI2D | 78.4 | **82.5** |
|
34 |
-
| OCRBench | 725 | **743** |
|
35 |
-
| MMVet | **52.8** | 52.2 |
|
36 |
-
| RealWorldQA | 63.5 | **64.6** |
|
37 |
|
38 |
## Usage
|
39 |
Below is a code snippet to run Ovis with multimodal inputs. For additional usage instructions, including inference wrapper and Gradio UI, please refer to [Ovis GitHub](https://github.com/AIDC-AI/Ovis?tab=readme-ov-file#inference).
|
|
|
16 |
</div>
|
17 |
|
18 |
## Model
|
19 |
+
As always, Ovis1.5 remains fully open-source: we release the training datasets, training & inference codes, and model weights for **reproducible transparency** and community collaboration.
|
20 |
+
|
21 |
+
| Ovis MLLMs | ViT | LLM | Training Datasets | Code | Model Weights |
|
22 |
+
|:-------------------------|:-----------:|:------------------:|:-------------------------------------------------------------------:|:-------------------------------------------:|:----------------------------------------------------------------:|
|
23 |
+
| Ovis1.5-Llama3-8B | Siglip-400M | Llama3-8B-Instruct | [Huggingface](https://huggingface.co/datasets/AIDC-AI/Ovis-dataset) | [Github](https://github.com/AIDC-AI/Ovis) | [Huggingface](https://huggingface.co/AIDC-AI/Ovis1.5-Llama3-8B) |
|
24 |
+
| Ovis1.5-Gemma2-9B | Siglip-400M | Gemma2-9B-It | [Huggingface](https://huggingface.co/datasets/AIDC-AI/Ovis-dataset) | [Github](https://github.com/AIDC-AI/Ovis) | [Huggingface](https://huggingface.co/AIDC-AI/Ovis1.5-Gemma2-9B) |
|
25 |
+
|
26 |
+
## Performance
|
27 |
+
We evaluate Ovis1.5 across various multimodal benchmarks using [VLMEvalKit](https://github.com/open-compass/VLMEvalKit) and compare its performance to leading MLLMs with similar parameter scales.
|
28 |
+
|
29 |
+
| | MiniCPM-Llama3-V2.5 | GLM-4V-9B | Ovis1.5-Llama3-8B | Ovis1.5-Gemma2-9B |
|
30 |
+
|:------------------|--------------------:|----------:|------------------:|------------------:|
|
31 |
+
| Open Weights | β
| β
| β
| β
|
|
32 |
+
| Open Datasets | β | β | β
| β
|
|
33 |
+
| MMTBench-VAL | 57.6 | 48.8 | 60.7 | **62.7** |
|
34 |
+
| MMBench-EN-V1.1 | 74 | 68.7 | **78.2** | 78.0 |
|
35 |
+
| MMBench-CN-V1.1 | 70.1 | 67.1 | **75.2** | 75.1 |
|
36 |
+
| MMStar | 51.8 | 54.8 | 57.2 | **58.7** |
|
37 |
+
| MMMU-Val | 45.8 | 46.9 | 48.6 | **49.8** |
|
38 |
+
| MathVista-Mini | 54.3 | 51.1 | 62.4 | **65.7** |
|
39 |
+
| HallusionBenchAvg | 42.4 | 45 | 44.5 | **48.0** |
|
40 |
+
| AI2D | 78.4 | 71.2 | 82.5 | **84.7** |
|
41 |
+
| OCRBench | 725 | **776** | 743 | 756 |
|
42 |
+
| MMVet | 52.8 | **58** | 52.2 | 56.5 |
|
43 |
+
| RealWorldQA | 63.5 | 66 | 64.6 | **66.9** |
|
44 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
45 |
|
46 |
## Usage
|
47 |
Below is a code snippet to run Ovis with multimodal inputs. For additional usage instructions, including inference wrapper and Gradio UI, please refer to [Ovis GitHub](https://github.com/AIDC-AI/Ovis?tab=readme-ov-file#inference).
|