license: apache-2.0

Model Card for AtomThink-EMOVA-8B

The model is post-trained based on EMOVA-8B and the AtomThink framework, and can be used to solve complex multimodal mathematical problems.

Comparison of accuracy with state-of-the-art methods on MathVista and MathVerse:

Model	Inference	General	Math	Total	TL	TD	VI	VD	VO	Total
Random Choice	-	-	-	17.9	12.4	12.4	12.4	12.4	12.4	12.4
Human	-	-	-	-	70.9	71.2	61.4	68.3	66.7	66.7
OpenAI o1	Slow Think*	-	-	73.9	-	-	-	-	-	-
GPT-4o	CoT	-	-	63.8	-	-	-	-	-	-
GPT-4V	CoT	-	-	49.9	56.6	63.1	51.4	50.8	50.3	54.4
LLaVA-NeXT-34B	Direct	-	-	46.5	25.5	33.8	23.5	20.3	15.7	23.8
InternLM-XComposer2	Direct	-	-	57.6	17.0	22.3	15.7	16.4	11.0	16.5
Qwen-VL-Plus	Direct	-	-	43.3	11.1	15.7	9.0	13.0	10.0	11.8
LLaVA-1.5-13B	Direct	-	-	27.6	15.2	19.4	16.8	15.2	11.3	15.6
G-LLaVA-7B	Direct	-	-	53.4	20.7	20.9	17.2	14.6	9.4	16.6
MAVIS-7B	Direct	-	-	-	29.1	41.4	27.4	24.9	14.6	27.5
LLaVA-Llama3-8B	Direct	34.1	25.6	29.5	16.0	19.3	16.4	13.1	15.0	15.9
EMOVA-8B-200k	Direct	52.4	51.1	51.7	34.4	39.0	33.4	30.1	23.5	32.1
EMOVA w/. Formatted	CoT	30.9	31.3	31.1	26.5	36.5	25.3	20.4	19.8	25.7
AtomThink-EMOVA	Direct	53.9	52.4	53.1	33.6	39.0	33.8	28.0	24.4	31.8
AtomThink-EMOVA	Quick Think	48.7	54.4	51.8	36.5	42.4	34.1	32.9	29.7	35.1
AtomThink-EMOVA	Slow Think	48.9	57.0	53.3	42.1	51.5	39.0	36.7	33.1	40.5

Citation

If you use this dataset in your research, please cite:

@article{xiang2024atomthink,
  title={AtomThink: A Slow Thinking Framework for Multimodal Mathematical Reasoning},
  author={Xiang, Kun and Liu, Zhili and Jiang, Zihao and Nie, Yunshuang and Huang, Runhui and Fan, Haoxiang and Li, Hanhui and Huang, Weiran and Zeng, Yihan and Han, Jianhua and others},
  journal={arXiv preprint arXiv:2411.11930},
  year={2024}
}
@article{chen2024emova,
  title={Emova: Empowering language models to see, hear and speak with vivid emotions},
  author={Chen, Kai and Gou, Yunhao and Huang, Runhui and Liu, Zhili and Tan, Daxin and Xu, Jing and Wang, Chunwei and Zhu, Yi and Zeng, Yihan and Yang, Kuo and others},
  journal={arXiv preprint arXiv:2409.18042},
  year={2024}
}

License

The checkpoint is released under the Apache 2.0 license. Please ensure proper attribution when using this checkpoint.