license: apache-2.0

Model Card for AtomThink-LLaVA-Llama3-8B

The model is fine-tuned based on LLaVA-Llama3-8B and AtomThink framework, and can be used to solve complex multimodal mathematical problems.

Comparison of accuracy with state-of-the-art methods on MathVista and MathVerse:

Model	Inference	General	Math	Total	TL	TD	VI	VD	VO	Total
Random Choice	-	-	-	17.9	12.4	12.4	12.4	12.4	12.4	12.4
Human	-	-	-	-	70.9	71.2	61.4	68.3	66.7	66.7
OpenAI o1	Slow Think*	-	-	73.9	-	-	-	-	-	-
GPT-4o	CoT	-	-	63.8	-	-	-	-	-	-
GPT-4V	CoT	-	-	49.9	56.6	63.1	51.4	50.8	50.3	54.4
LLaVA-NeXT-34B	Direct	-	-	46.5	25.5	33.8	23.5	20.3	15.7	23.8
InternLM-XComposer2	Direct	-	-	57.6	17.0	22.3	15.7	16.4	11.0	16.5
Qwen-VL-Plus	Direct	-	-	43.3	11.1	15.7	9.0	13.0	10.0	11.8
LLaVA-1.5-13B	Direct	-	-	27.6	15.2	19.4	16.8	15.2	11.3	15.6
G-LLaVA-7B	Direct	-	-	53.4	20.7	20.9	17.2	14.6	9.4	16.6
MAVIS-7B	Direct	-	-	-	29.1	41.4	27.4	24.9	14.6	27.5
LLaVA-Llama3-8B	Direct	34.1	25.6	29.5	16.0	19.3	16.4	13.1	15.0	15.9
LLaVA w/. Formatted	CoT	30.2	22.9	26.3	14.3	18.4	15.7	10.0	7.7	13.2
AtomThink-LLaVA	Direct	34.4	27.2	30.5	16.0	19.3	16.2	13.1	15.0	15.9
AtomThink-LLaVA	Quick Think	36.9	37.0	36.6	22.2	26.6	24.1	20.9	17.9	22.4
AtomThink-LLaVA	Slow Think	36.5	41.3	39.1	36.1	42.4	30.0	36.8	28.6	34.7

Citation

If you use this dataset in your research, please cite:

@article{xiang2024atomthink,
  title={AtomThink: A Slow Thinking Framework for Multimodal Mathematical Reasoning},
  author={Xiang, Kun and Liu, Zhili and Jiang, Zihao and Nie, Yunshuang and Huang, Runhui and Fan, Haoxiang and Li, Hanhui and Huang, Weiran and Zeng, Yihan and Han, Jianhua and others},
  journal={arXiv preprint arXiv:2411.11930},
  year={2024}
}
@article{liu2024visual,
  title={Visual instruction tuning},
  author={Liu, Haotian and Li, Chunyuan and Wu, Qingyang and Lee, Yong Jae},
  journal={Advances in neural information processing systems},
  volume={36},
  year={2024}
}

License

The checkpoint is released under the Apache 2.0 license. Please ensure proper attribution when using this checkpoint.