MaoXun
/

llava-lora-7-20-10-5-vicuna-7b-v1.3

Image-Text-to-Text

Model card Files Files and versions Community

llava-lora-7-20-10-5-vicuna-7b-v1.3 / README.md

MaoXun's picture

Update README.md

2e4f805 verified 17 days ago

|

history blame contribute delete

1.62 kB

	---
	library_name: peft
	tags:
	- llava
	pipeline_tag: image-text-to-text
	license: mit
	datasets:
	- MaoXun/Synergy-General-MultimodalPairs
	language:
	- en
	base_model:
	- liuhaotian/llava-pretrain-vicuna-7b-v1.3
	- lmsys/vicuna-7b-v1.3
	---
	# Brief
	This is the LoRA Model of LLaVA 7B v1.3 trained on [Synergy-General-MultimodalPairs](https://huggingface.co/datasets/MaoXun/Synergy-General-MultimodalPairs).
	The dataset is to enhance the ability of describing images in detail for vision language models (VLM).
	Below is the introduction of this dataset.

	# Dataset
	### Link
	[Github](https://github.com/mao-code/Synergy-General-MultimodalPairs) \| [Paper](https://link.springer.com/chapter/10.1007/978-981-97-6125-8_12)

	### Introduction
	This is a visual-text pair dataset synergistically generated by a text-to-image model and multimodal large language model.

	The name of the file means (n_th generation)\_(numbers of batch)\_(numbers of initial description of each batch)\_(numbers of refined cycles of each initial description)
	For example, the 1_20_10_5.zip means this dataset is dataset number one with 20 batches, 10 initial descriptions for each batch, and 5 refined cycles for each initial description.
	Therefore, this dataset has a total of 20\10\5=1000 image and text pair data.

	Once you unzip one of the datasets, you will see 2 files. The first is the zip file of images. The second is the CSV file which contains the image path and the description of this image.

	Here is the GitHub script of the generation process: https://github.com/mao-code/Synergy-General-MultimodalPairs


	# Framework versions
	- PEFT 0.4.0