YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

license: apache-2.0

Model Card for AtomThink-EMOVA-8B

The model is post-trained based on EMOVA-8B and the AtomThink framework, and can be used to solve complex multimodal mathematical problems.

Comparison of accuracy with state-of-the-art methods on MathVista and MathVerse:

Model Inference General Math Total TL TD VI VD VO Total
Random Choice - - - 17.9 12.4 12.4 12.4 12.4 12.4 12.4
Human - - - - 70.9 71.2 61.4 68.3 66.7 66.7
OpenAI o1 Slow Think* - - 73.9 - - - - - -
GPT-4o CoT - - 63.8 - - - - - -
GPT-4V CoT - - 49.9 56.6 63.1 51.4 50.8 50.3 54.4
LLaVA-NeXT-34B Direct - - 46.5 25.5 33.8 23.5 20.3 15.7 23.8
InternLM-XComposer2 Direct - - 57.6 17.0 22.3 15.7 16.4 11.0 16.5
Qwen-VL-Plus Direct - - 43.3 11.1 15.7 9.0 13.0 10.0 11.8
LLaVA-1.5-13B Direct - - 27.6 15.2 19.4 16.8 15.2 11.3 15.6
G-LLaVA-7B Direct - - 53.4 20.7 20.9 17.2 14.6 9.4 16.6
MAVIS-7B Direct - - - 29.1 41.4 27.4 24.9 14.6 27.5
LLaVA-Llama3-8B Direct 34.1 25.6 29.5 16.0 19.3 16.4 13.1 15.0 15.9
EMOVA-8B-200k Direct 52.4 51.1 51.7 34.4 39.0 33.4 30.1 23.5 32.1
EMOVA w/. Formatted CoT 30.9 31.3 31.1 26.5 36.5 25.3 20.4 19.8 25.7
AtomThink-EMOVA Direct 53.9 52.4 53.1 33.6 39.0 33.8 28.0 24.4 31.8
AtomThink-EMOVA Quick Think 48.7 54.4 51.8 36.5 42.4 34.1 32.9 29.7 35.1
AtomThink-EMOVA Slow Think 48.9 57.0 53.3 42.1 51.5 39.0 36.7 33.1 40.5

Citation

If you use this dataset in your research, please cite:

@article{xiang2024atomthink,
  title={AtomThink: A Slow Thinking Framework for Multimodal Mathematical Reasoning},
  author={Xiang, Kun and Liu, Zhili and Jiang, Zihao and Nie, Yunshuang and Huang, Runhui and Fan, Haoxiang and Li, Hanhui and Huang, Weiran and Zeng, Yihan and Han, Jianhua and others},
  journal={arXiv preprint arXiv:2411.11930},
  year={2024}
}
@article{chen2024emova,
  title={Emova: Empowering language models to see, hear and speak with vivid emotions},
  author={Chen, Kai and Gou, Yunhao and Huang, Runhui and Liu, Zhili and Tan, Daxin and Xu, Jing and Wang, Chunwei and Zhu, Yi and Zeng, Yihan and Yang, Kuo and others},
  journal={arXiv preprint arXiv:2409.18042},
  year={2024}
}

License

The checkpoint is released under the Apache 2.0 license. Please ensure proper attribution when using this checkpoint.

Downloads last month
6
Safetensors
Model size
13.6B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .