GunaKoppula
/

Llava-Phi2

Visual Question Answering

text-generation

Inference Endpoints

Model card Files Files and versions Community

Llava-Phi2 / README.md

GunaKoppula's picture

Update README.md

95f1091 verified 12 months ago

|

history blame contribute delete

2.13 kB

	---
	license: mit
	datasets:
	- liuhaotian/LLaVA-Instruct-150K
	- liuhaotian/LLaVA-Pretrain
	language:
	- en
	pipeline_tag: visual-question-answering
	---

	# Model Card for Model ID

	This is a multimodal implementation of [Phi2](https://huggingface.co/microsoft/phi-2) model inspired by [LlaVA-Phi](https://github.com/zhuyiche/llava-phi).

	## Model Details
	1. LLM Backbone: [Phi2](https://huggingface.co/microsoft/phi-2)
	2. Vision Tower: [clip-vit-large-patch14-336](https://huggingface.co/openai/clip-vit-large-patch14-336)
	4. Pretraining Dataset: [LAION-CC-SBU dataset with BLIP captions(200k samples)](https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain)
	5. Finetuning Dataset: [Instruct 150k dataset based on COCO](https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K)
	6. Finetuned Model: [GunaKoppula/Llava-Phi2](https://huggingface.co/GunaKoppula/Llava-Phi2)


	### Model Sources

	<!-- Provide the basic links for the model. -->

	- Original Repository: [Llava-Phi](https://github.com/zhuyiche/llava-phi)
	- Paper [optional]: [LLaVA-Phi: Efficient Multi-Modal Assistant with Small Language Model](https://arxiv.org/pdf/2401.02330)
	- Demo [optional]: [Demo Link](https://huggingface.co/spaces/RaviNaik/MultiModal-Phi2)


	## How to Get Started with the Model

	Use the code below to get started with the model.
	1. Clone this repository and navigate to llava-phi folder
	```bash
	git clone https://github.com/zhuyiche/llava-phi.git
	cd llava-phi
	```
	2. Install Package
	```bash
	conda create -n llava_phi python=3.10 -y
	conda activate llava_phi
	pip install --upgrade pip # enable PEP 660 support
	pip install -e .
	```
	3. Run the Model
	```bash
	python llava_phi/eval/run_llava_phi.py --model-path="GunaKoppula/Llava-Phi2" \
	--image-file="https://huggingface.co/GunaKoppula/Llava-Phi2/resolve/main/people.jpg?download=true" \
	--query="How many people are there in the image?"
	```

	### Acknowledgement
	This implementation is based on wonderful work done by: \
	[LlaVA-Phi](https://github.com/zhuyiche/llava-phi) \
	[Llava](https://github.com/haotian-liu/LLaVA) \
	[Phi2](https://huggingface.co/microsoft/phi-2)