Talaviya Bhavik's picture
6

Talaviya Bhavik

talaviyabhavik
ยท

AI & ML interests

LLM.. LLM.. LLM

Recent Activity

liked a dataset 3 days ago
openreasoner/MATH-APS
liked a dataset 3 days ago
GAIR/o1-journey
View all activity

Organizations

scikit-learn's profile picture

talaviyabhavik's activity

reacted to xiaotianhan's post with ๐Ÿš€๐Ÿ‘ 9 months ago
view post
Post
2094
๐ŸŽ‰ ๐ŸŽ‰ ๐ŸŽ‰ Happy to share our recent work. We noticed that image resolution plays an important role, either in improving multi-modal large language models (MLLM) performance or in Sora style any resolution encoder decoder, we hope this work can help lift restriction of 224x224 resolution limit in ViT.

ViTAR: Vision Transformer with Any Resolution (2403.18361)
  • 2 replies
ยท
reacted to merve's post with ๐Ÿ”ฅ 9 months ago
reacted to merve's post with โค๏ธ 9 months ago
view post
Post
2905
SegGPT is a vision generalist on image segmentation, quite like GPTs for computer vision โœจ
It comes with the last release of transformers ๐ŸŽ Demo and more in this post!
SegGPT is an extension of the Painter, where you speak to images with images: the model takes in an image prompt, transformed version of the image prompt, the actual image you want to see the same transform, and expected to output the transformed image.
SegGPT consists of a vanilla ViT with a decoder on top (linear, conv, linear).
The model is trained on diverse segmentation examples, where they provide example image-mask pairs, the actual input to be segmented, and the decoder head learns to reconstruct the mask output.
This generalizes pretty well!
The authors do not claim state-of-the-art results as the model is mainly used zero-shot and few-shot inference. They also do prompt tuning, where they freeze the parameters of the model and only optimize the image tensor (the input context).
Thanks to ๐Ÿค— transformers you can use this model easily!
See here https://huggingface.co/docs/transformers/en/model_doc/seggpt
I have built an app for you to try it out. I combined SegGPT with Depth Anything Model, so you don't have to upload image mask prompts in your prompt pair ๐Ÿค—
Try it here merve/seggpt-depth-anything
Also check out the collection merve/seggpt-660466a303bc3cd7559d271b