view article Article PaliGemma β Google's Cutting-Edge Open Vision Language Model May 14, 2024 β’ 229
view post Post 1758 New open Vision Language Model by @Google : PaliGemma ππ€π Comes in 3B, pretrained, mix and fine-tuned models in 224, 448 and 896 resolution𧩠Combination of Gemma 2B LLM and SigLIP image encoderπ€ Supported in transformersPaliGemma can do..𧩠Image segmentation and detection! π€―π Detailed document understanding and reasoningπ Visual question answering, captioning and any other VLM task!Read our blog π hf.co/blog/paligemmaTry the demo πͺ hf.co/spaces/google/paligemmaCheck out the Spaces and the models all in the collection π google/paligemma-release-6643a9ffbf57de2ae0448ddaCollection of fine-tuned PaliGemma models google/paligemma-ft-models-6643b03efb769dad650d2dda 13 replies Β· π₯ 13 13 π 8 8 β€οΈ 6 6 π 4 4 + Reply
Salesforce/xgen-mm-phi3-mini-instruct-r-v1 Image-Text-to-Text β’ Updated Sep 18, 2024 β’ 1.48k β’ 185
view article Article SeeMoE: Implementing a MoE Vision Language Model from Scratch By AviSoori1x β’ Jun 23, 2024 β’ 34
[lecture artifacts] aligning open language models Collection artifacts referenced in the talk timeline! Slides: https://docs.google.com/presentation/d/1quMyI4BAx4rvcDfk8jjv063bmHg4RxZd9mhQloXpMn0/edit?usp=sharin β’ 63 items β’ Updated Apr 17, 2024 β’ 56
view article Article Fine-tuning a large language model on Kaggle Notebooks (or even on your own computer) for solving real-world tasks By lmassaron β’ Feb 21, 2024 β’ 14
view article Article Design choices for Vision Language Models in 2024 By gigant β’ Apr 16, 2024 β’ 25