AIDC-AI
/

Ovis1.5-Llama3-8B

@@ -16,24 +16,32 @@ Ovis is a novel Multimodal Large Language Model (MLLM) architecture, designed to
 </div>
 ## Model
-As always, Ovis1.5 remains fully open-source: we release the [training datasets](https://huggingface.co/datasets/AIDC-AI/Ovis-dataset), [training & inference codes](https://github.com/AIDC-AI/Ovis), and [model weights](https://huggingface.co/AIDC-AI/Ovis1.5-Llama3-8B) for **reproducible transparency** and community collaboration.
-|                   |                                                MiniCPM-Llama3-V2.5 |                                                  Ovis1.5-Llama3-8B |
-|:------------------|-------------------------------------------------------------------:|-------------------------------------------------------------------:|
-| Training scripts  |                                                                  - |   [Github](https://github.com/AIDC-AI/Ovis/tree/main/scripts/v1_5) |
-| ViT               |                                                        Siglip-400M |                                                        Siglip-400M |
-| LLM               |                                                 Llama3-8B-Instruct |                                                 Llama3-8B-Instruct |
-| MMTBench-VAL      |                                                               57.6 |                                                           **60.7** |
-| MMBench-EN-V1.1   |                                                                 74 |                                                           **78.2** |
-| MMBench-CN-V1.1   |                                                               70.1 |                                                           **75.2** |
-| MMStar            |                                                               51.8 |                                                           **57.2** |
-| MMMU-Val          |                                                               45.8 |                                                           **48.6** |
-| MathVista-Mini    |                                                               54.3 |                                                           **62.4** |
-| HallusionBenchAvg |                                                               42.4 |                                                           **44.5** |
-| AI2D              |                                                               78.4 |                                                           **82.5** |
-| OCRBench          |                                                                725 |                                                            **743** |
-| MMVet             |                                                           **52.8** |                                                               52.2 |
-| RealWorldQA       |                                                               63.5 |                                                           **64.6** |
 ## Usage
 Below is a code snippet to run Ovis with multimodal inputs. For additional usage instructions, including inference wrapper and Gradio UI, please refer to [Ovis GitHub](https://github.com/AIDC-AI/Ovis?tab=readme-ov-file#inference).

 </div>
 ## Model
+As always, Ovis1.5 remains fully open-source: we release the training datasets, training & inference codes, and model weights for **reproducible transparency** and community collaboration.
+| Ovis MLLMs               | ViT         | LLM                |                                                   Training Datasets |                    Code                     |                          Model Weights                           |
+|:-------------------------|:-----------:|:------------------:|:-------------------------------------------------------------------:|:-------------------------------------------:|:----------------------------------------------------------------:|
+| Ovis1.5-Llama3-8B        | Siglip-400M | Llama3-8B-Instruct | [Huggingface](https://huggingface.co/datasets/AIDC-AI/Ovis-dataset) |  [Github](https://github.com/AIDC-AI/Ovis)  | [Huggingface](https://huggingface.co/AIDC-AI/Ovis1.5-Llama3-8B)  |
+| Ovis1.5-Gemma2-9B        | Siglip-400M | Gemma2-9B-It       | [Huggingface](https://huggingface.co/datasets/AIDC-AI/Ovis-dataset) |  [Github](https://github.com/AIDC-AI/Ovis)  | [Huggingface](https://huggingface.co/AIDC-AI/Ovis1.5-Gemma2-9B)  |
+## Performance
+We evaluate Ovis1.5 across various multimodal benchmarks using [VLMEvalKit](https://github.com/open-compass/VLMEvalKit) and compare its performance to leading MLLMs with similar parameter scales.
+|                   | MiniCPM-Llama3-V2.5 | GLM-4V-9B | Ovis1.5-Llama3-8B | Ovis1.5-Gemma2-9B |
+|:------------------|--------------------:|----------:|------------------:|------------------:|
+| Open Weights      |                  ✅ |         ✅ |                ✅ |                ✅ |
+| Open Datasets     |                  ❌ |         ❌ |                ✅ |                ✅ |
+| MMTBench-VAL      |                57.6 |      48.8 |              60.7 |          **62.7** |
+| MMBench-EN-V1.1   |                  74 |      68.7 |          **78.2** |              78.0 |
+| MMBench-CN-V1.1   |                70.1 |      67.1 |          **75.2** |              75.1 |
+| MMStar            |                51.8 |      54.8 |              57.2 |          **58.7** |
+| MMMU-Val          |                45.8 |      46.9 |              48.6 |          **49.8** |
+| MathVista-Mini    |                54.3 |      51.1 |              62.4 |          **65.7** |
+| HallusionBenchAvg |                42.4 |        45 |              44.5 |          **48.0** |
+| AI2D              |                78.4 |      71.2 |              82.5 |          **84.7** |
+| OCRBench          |                 725 |   **776** |               743 |               756 |
+| MMVet             |                52.8 |    **58** |              52.2 |              56.5 |
+| RealWorldQA       |                63.5 |        66 |              64.6 |          **66.9** |
 ## Usage
 Below is a code snippet to run Ovis with multimodal inputs. For additional usage instructions, including inference wrapper and Gradio UI, please refer to [Ovis GitHub](https://github.com/AIDC-AI/Ovis?tab=readme-ov-file#inference).