OpenGVLab
/

VideoChat-TPO

Video-Text-to-Text

feature-extraction

Model card Files Files and versions Community

VideoChat-TPO / README.md

nielsr's picture

nielsr HF staff

Add paper link and library name

d107423 verified 25 days ago

|

765 Bytes

	---
	base_model:
	- mistralai/Mistral-7B-Instruct-v0.2
	library_name: transformers
	license: mit
	pipeline_tag: video-text-to-text
	---

	# VideoChat2-TPO

	This model is based on the paper [Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment](https://huggingface.co/papers/2412.19326).

	## 🏃 Installation

	```
	pip install -r requirements.txt
	python app.py
	```

	## 🔧 Usage

	```
	from transformers import AutoModel, AutoTokenizer
	from tokenizer import MultimodalLlamaTokenizer

	model_path = "OpenGVLab/VideoChat-TPO"
	tokenizer = AutoTokenizer.from_pretrained(model_path,
	trust_remote_code=True,
	use_fast=False,)
	model = AutoModel.from_pretrained(model_path, trust_remote_code=True, _tokenizer=self.tokenizer).eval()
	```