Stanford-ILIAD
/

minivla-history2-vq-libero90-prismatic

Image-Text-to-Text

Inference Endpoints

Model card Files Files and versions Community

minivla-history2-vq-libero90-prismatic / README.md

belkhale's picture

Update README.md

1fba66e verified 29 days ago

|

history blame contribute delete

1.48 kB

	---
	library_name: transformers
	tags:
	- robotics
	- vla
	- image-text-to-text
	- multimodal
	- pretraining
	license: mit
	language:
	- en
	pipeline_tag: image-text-to-text
	---

	# MiniVLA Image History (T=2) VQ 1B (Prismatic-Compatible Version)

	<b>This checkpoint is in a format that is compatible with the training script from the original [Prismatic VLMs project codebase](https://github.com/TRI-ML/prismatic-vlms), which the OpenVLA
	team built on top of to develop the OpenVLA model.</b>

	This Prismatic-compatible checkpoint may be useful if you wish to <b>fully fine-tune</b> MiniVLA (all 1 billion parameters) via native PyTorch Fully
	Sharded Data Parallel (FSDP) using the Prismatic VLMs training script. If you instead wish to do Parameter-Efficient Fine-Tuning via LoRA, you
	can use the MiniVLA checkpoint linked above, which is compatible with the Hugging Face `transformers` library. We recommend fine-tuning via LoRA if
	you do not have sufficient compute to fully fine-tune a 1B-parameter model (e.g., multiple A100/H100 GPUs).

	## Usage Instructions

	See the [MiniVLA GitHub README](https://github.com/Stanford-ILIAD/openvla-mini/blob/main/README.md) for instructions on how to use this checkpoint for full fine-tuning.

	## Citation

	BibTeX:

	```bibtex
	@article{belkhale24minivla,
	title={MiniVLA: A Better VLA with a Smaller Footprint},
	author={Suneel Belkhale and Dorsa Sadigh},
	url={https://github.com/Stanford-ILIAD/openvla-mini}
	year={2024}
	}
	```