Trained Weights(Model Card) of GPT4Scene

🏠 Overview

This dataset card is for the GPT4Scene project. You can see the more information below.

πŸ€— Hugging Face

Function Huggingface Link
Validation Dataset alexzyqi/GPT4Scene-Val-Dataset
Validation Annotations alexzyqi/GPT4Scene-Val-Annotation
Pretrain Models Qwen/Qwen2-VL-7B-Instruct
Trained Weights alexzyqi/GPT4Scene-qwen2vl_full_sft_mark_32_3D_img512

βš–οΈ License

This repository is licensed under the Apache-2.0.

This repo benefits from LLaMA-Factory, Chat-Scene. Thanks for their wonderful works.

πŸ”— Citation

If this work is helpful, please kindly cite as:

@article{GPT4Scene,
  title={GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models},
  author={Zhangyang Qi and Zhixiong Zhang and Ye Fang and Jiaqi Wang and Hengshuang Zhao},
  journal={arXiv:2501.01428},
  year={2025}
}
Downloads last month
2
Safetensors
Model size
8.29B params
Tensor type
BF16
Β·
Inference API
Unable to determine this model's library. Check the docs .