Brief

This is the LoRA Model of LLaVA 7B v1.3 trained on Synergy-General-MultimodalPairs. The dataset is to enhance the ability of describing images in detail for vision language models (VLM). Below is the introduction of this dataset.

Dataset

Link

Github | Paper

Introduction

This is a visual-text pair dataset synergistically generated by a text-to-image model and multimodal large language model.

The name of the file means (n_th generation)_(numbers of batch)_(numbers of initial description of each batch)_(numbers of refined cycles of each initial description) For example, the 1_20_10_5.zip means this dataset is dataset number one with 20 batches, 10 initial descriptions for each batch, and 5 refined cycles for each initial description. Therefore, this dataset has a total of 20*10*5=1000 image and text pair data.

Once you unzip one of the datasets, you will see 2 files. The first is the zip file of images. The second is the CSV file which contains the image path and the description of this image.

Here is the GitHub script of the generation process: https://github.com/mao-code/Synergy-General-MultimodalPairs

Framework versions

  • PEFT 0.4.0
Downloads last month
4
Inference API
Inference API (serverless) does not yet support peft models for this pipeline type.

Model tree for MaoXun/llava-lora-7-20-10-5-vicuna-7b-v1.3

Adapter
(1)
this model

Dataset used to train MaoXun/llava-lora-7-20-10-5-vicuna-7b-v1.3