|
--- |
|
license: llama2 |
|
--- |
|
|
|
# v-MLLM Model Card |
|
|
|
## Model details |
|
|
|
**Model type:** |
|
v-MLLM is an open-source MLLM trained on Visual-Modality Instruction (VIM) corpus, it can robustly follow the text-modality instructions and visual-modality instructions. |
|
|
|
**Model date:** |
|
v-MLLM-7B was trained on January 2024. |
|
|
|
**Github for more information:** |
|
https://github.com/VIM-Bench/VIM_TOOL |
|
|
|
## License |
|
v-MLLM is licensed under the LLAMA 2 Community License, |
|
Copyright (c) Meta Platforms, Inc. All Rights Reserved. |
|
|
|
## Intended use |
|
**Primary intended uses:** |
|
The primary use of v-MLLM is research on multimodal large language models. |
|
|
|
**Primary intended users:** |
|
The primary intended users of the model are researchers in computer vision, natural language processing, machine learning, and artificial intelligence. |
|
|
|
## Training dataset |
|
- 846k VIM corpus based on LVIS-Instruct4V corpus. |
|
|
|
# Citation |
|
|
|
Please kindly cite our paper if you find our resources useful: |
|
|
|
``` |
|
@misc{li2024text, |
|
title={Text as Images: Can Multimodal Large Language Models Follow Printed Instructions in Pixels?}, |
|
author={Xiujun Li and Yujie Lu and Zhe Gan and Jianfeng Gao and William Yang Wang and Yejin Choi}, |
|
year={2024}, |
|
eprint={2311.17647}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CV} |
|
} |
|
@misc{lu2023vim, |
|
title={VIM: Probing Multimodal Large Language Models for Visual Embedded Instruction Following}, |
|
author={Yujie Lu and Xiujun Li and William Yang Wang and Yejin Choi}, |
|
year={2023}, |
|
eprint={2311.17647}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CV} |
|
} |
|
``` |