Trained Weights(Model Card) of GPT4Scene

🏠 Overview

This dataset card is for the GPT4Scene project. You can see the more information below.

Github Code: Link to Github
Arxiv Paper: Link to Arxiv
Project Page: Link to Project

🤗 Hugging Face

Function	Huggingface Link
Validation Dataset	alexzyqi/GPT4Scene-Val-Dataset
Validation Annotations	alexzyqi/GPT4Scene-Val-Annotation
Pretrain Models	Qwen/Qwen2-VL-7B-Instruct
Trained Weights	alexzyqi/GPT4Scene-qwen2vl_full_sft_mark_32_3D_img512

⚖️ License

This repository is licensed under the Apache-2.0.

This repo benefits from LLaMA-Factory, Chat-Scene. Thanks for their wonderful works.

🔗 Citation

If this work is helpful, please kindly cite as:

@article{GPT4Scene,
  title={GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models},
  author={Zhangyang Qi and Zhixiong Zhang and Ye Fang and Jiaqi Wang and Hengshuang Zhao},
  journal={arXiv:2501.01428},
  year={2025}
}