--- license: apache-2.0 --- # Trained Weights(Model Card) of GPT4Scene ## 🏠 Overview This dataset card is for the **GPT4Scene** project. You can see the more information below. - **Github Code**: [Link to Github](https://github.com/Qi-Zhangyang/GPT4Scene) - **Arxiv Paper**: [Link to Arxiv](https://arxiv.org/abs/2501.01428) - **Project Page**: [Link to Project](https://gpt4scene.github.io/) ## 🤗 Hugging Face | Function | Huggingface Link | | ---------------------| -------------------- | | **Validation Dataset** | [alexzyqi/GPT4Scene-Val-Dataset](https://huggingface.co/datasets/alexzyqi/GPT4Scene-Val-Dataset) | | **Validation Annotations** | [alexzyqi/GPT4Scene-Val-Annotation](https://huggingface.co/datasets/alexzyqi/GPT4Scene-Val-Annotation) | | **Pretrain Models** | [Qwen/Qwen2-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct) | | **Trained Weights** | [alexzyqi/GPT4Scene-qwen2vl_full_sft_mark_32_3D_img512](https://huggingface.co/alexzyqi/GPT4Scene-qwen2vl_full_sft_mark_32_3D_img512) | ## ⚖️ License This repository is licensed under the Apache-2.0. This repo benefits from [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory/), [Chat-Scene](https://github.com/ZzZZCHS/Chat-Scene). Thanks for their wonderful works. ## 🔗 Citation If this work is helpful, please kindly cite as: ```bibtex @article{GPT4Scene, title={GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models}, author={Zhangyang Qi and Zhixiong Zhang and Ye Fang and Jiaqi Wang and Hengshuang Zhao}, journal={arXiv:2501.01428}, year={2025} } ```