---
license: apache-2.0
---

# Trained Weights(Model Card) of GPT4Scene

## 🏠 Overview

This dataset card is for the **GPT4Scene** project. You can see the more information below.
- **Github Code**: [Link to Github](https://github.com/Qi-Zhangyang/GPT4Scene)
- **Arxiv Paper**: [Link to Arxiv](https://arxiv.org/abs/2501.01428)
- **Project Page**: [Link to Project](https://gpt4scene.github.io/)

## 🤗 Hugging Face
| Function             | Huggingface Link       |
| ---------------------| -------------------- | 
| **Validation Dataset**  | [alexzyqi/GPT4Scene-Val-Dataset](https://huggingface.co/datasets/alexzyqi/GPT4Scene-Val-Dataset) | 
| **Validation Annotations** | [alexzyqi/GPT4Scene-Val-Annotation](https://huggingface.co/datasets/alexzyqi/GPT4Scene-Val-Annotation) |
| **Pretrain Models**  | [Qwen/Qwen2-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct) | 
| **Trained Weights** | [alexzyqi/GPT4Scene-qwen2vl_full_sft_mark_32_3D_img512](https://huggingface.co/alexzyqi/GPT4Scene-qwen2vl_full_sft_mark_32_3D_img512) |


## ⚖️ License

This repository is licensed under the Apache-2.0.

This repo benefits from [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory/), [Chat-Scene](https://github.com/ZzZZCHS/Chat-Scene). Thanks for their wonderful works.

## 🔗 Citation

If this work is helpful, please kindly cite as:

```bibtex
@article{GPT4Scene,
  title={GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models},
  author={Zhangyang Qi and Zhixiong Zhang and Ye Fang and Jiaqi Wang and Hengshuang Zhao},
  journal={arXiv:2501.01428},
  year={2025}
}
```