YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

license: apache-2.0

Model Card for AtomThink-LLaVA-Llama3-8B

The model is fine-tuned based on LLaVA-Llama3-8B and AtomThink framework, and can be used to solve complex multimodal mathematical problems.

Comparison of accuracy with state-of-the-art methods on MathVista and MathVerse:

Model Inference General Math Total TL TD VI VD VO Total
Random Choice - - - 17.9 12.4 12.4 12.4 12.4 12.4 12.4
Human - - - - 70.9 71.2 61.4 68.3 66.7 66.7
OpenAI o1 Slow Think* - - 73.9 - - - - - -
GPT-4o CoT - - 63.8 - - - - - -
GPT-4V CoT - - 49.9 56.6 63.1 51.4 50.8 50.3 54.4
LLaVA-NeXT-34B Direct - - 46.5 25.5 33.8 23.5 20.3 15.7 23.8
InternLM-XComposer2 Direct - - 57.6 17.0 22.3 15.7 16.4 11.0 16.5
Qwen-VL-Plus Direct - - 43.3 11.1 15.7 9.0 13.0 10.0 11.8
LLaVA-1.5-13B Direct - - 27.6 15.2 19.4 16.8 15.2 11.3 15.6
G-LLaVA-7B Direct - - 53.4 20.7 20.9 17.2 14.6 9.4 16.6
MAVIS-7B Direct - - - 29.1 41.4 27.4 24.9 14.6 27.5
LLaVA-Llama3-8B Direct 34.1 25.6 29.5 16.0 19.3 16.4 13.1 15.0 15.9
LLaVA w/. Formatted CoT 30.2 22.9 26.3 14.3 18.4 15.7 10.0 7.7 13.2
AtomThink-LLaVA Direct 34.4 27.2 30.5 16.0 19.3 16.2 13.1 15.0 15.9
AtomThink-LLaVA Quick Think 36.9 37.0 36.6 22.2 26.6 24.1 20.9 17.9 22.4
AtomThink-LLaVA Slow Think 36.5 41.3 39.1 36.1 42.4 30.0 36.8 28.6 34.7

Citation

If you use this dataset in your research, please cite:

@article{xiang2024atomthink,
  title={AtomThink: A Slow Thinking Framework for Multimodal Mathematical Reasoning},
  author={Xiang, Kun and Liu, Zhili and Jiang, Zihao and Nie, Yunshuang and Huang, Runhui and Fan, Haoxiang and Li, Hanhui and Huang, Weiran and Zeng, Yihan and Han, Jianhua and others},
  journal={arXiv preprint arXiv:2411.11930},
  year={2024}
}
@article{liu2024visual,
  title={Visual instruction tuning},
  author={Liu, Haotian and Li, Chunyuan and Wu, Qingyang and Lee, Yong Jae},
  journal={Advances in neural information processing systems},
  volume={36},
  year={2024}
}

License

The checkpoint is released under the Apache 2.0 license. Please ensure proper attribution when using this checkpoint.

Downloads last month
22
Safetensors
Model size
599k params
Tensor type
FP16
·
Inference API
Unable to determine this model's library. Check the docs .