Trained Weights(Model Card) of GPT4Scene
π Overview
This dataset card is for the GPT4Scene project. You can see the more information below.
- Github Code: Link to Github
- Arxiv Paper: Link to Arxiv
- Project Page: Link to Project
π€ Hugging Face
Function | Huggingface Link |
---|---|
Validation Dataset | alexzyqi/GPT4Scene-Val-Dataset |
Validation Annotations | alexzyqi/GPT4Scene-Val-Annotation |
Pretrain Models | Qwen/Qwen2-VL-7B-Instruct |
Trained Weights | alexzyqi/GPT4Scene-qwen2vl_full_sft_mark_32_3D_img512 |
βοΈ License
This repository is licensed under the Apache-2.0.
This repo benefits from LLaMA-Factory, Chat-Scene. Thanks for their wonderful works.
π Citation
If this work is helpful, please kindly cite as:
@article{GPT4Scene,
title={GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models},
author={Zhangyang Qi and Zhixiong Zhang and Ye Fang and Jiaqi Wang and Hengshuang Zhao},
journal={arXiv:2501.01428},
year={2025}
}
- Downloads last month
- 2