AI & ML interests

computer-vision, image-processing, machine-learning, deep-learning

Recent Activity

kornia's activity

awacke1 
posted an update 1 day ago
prithivMLmods 
posted an update 3 days ago
view post
Post
2772
Triangulum Catalogued 🔥💫

🎯Triangulum is a collection of pretrained and instruction-tuned generative models, designed for multilingual applications. These models are trained using synthetic datasets based on long chains of thought, enabling them to perform complex reasoning tasks effectively.

+ Triangulum-10B : prithivMLmods/Triangulum-10B
+ Quants : prithivMLmods/Triangulum-10B-GGUF

+ Triangulum-5B : prithivMLmods/Triangulum-5B
+ Quants : prithivMLmods/Triangulum-5B-GGUF

+ Triangulum-1B : prithivMLmods/Triangulum-1B
+ Quants : prithivMLmods/Triangulum-1B-GGUF
  • 1 reply
·
merve 
posted an update 3 days ago
view post
Post
3728
supercharge your LLM apps with smolagents 🔥

however cool your LLM is, without being agentic it can only go so far

enter smolagents: a new agent library by Hugging Face to make the LLM write code, do analysis and automate boring stuff!

Here's our blog for you to get started https://huggingface.co/blog/smolagents
merve 
posted an update 10 days ago
prithivMLmods 
posted an update 13 days ago
akhaliq 
posted an update 15 days ago
view post
Post
3469
Google drops Gemini 2.0 Flash Thinking

a new experimental model that unlocks stronger reasoning capabilities and shows its thoughts. The model plans (with thoughts visible), can solve complex problems with Flash speeds, and more

now available in anychat, try it out: akhaliq/anychat
prithivMLmods 
posted an update 15 days ago
view post
Post
2464
Qwen2VL Models: Vision and Language Processing 🍉

📍FT; [ Latex OCR, Math Parsing, Text Analogy OCRTest ]

Colab Demo: prithivMLmods/Qwen2-VL-OCR-2B-Instruct

❄️Demo : prithivMLmods/Qwen2-VL-2B . The demo includes the Qwen2VL 2B Base Model.

🎯The space handles documenting content from the input image along with standardized plain text. It includes adjustment tools with over 30 font styles, file formatting support for PDF and DOCX, textual alignments, font size adjustments, and line spacing modifications.

📄PDFs are rendered using the ReportLab software library toolkit.

🧵Models :
+ prithivMLmods/Qwen2-VL-OCR-2B-Instruct
+ prithivMLmods/Qwen2-VL-Ocrtest-2B-Instruct
+ prithivMLmods/Qwen2-VL-Math-Prase-2B-Instruct

🚀Sample Document :
+ https://drive.google.com/file/d/1Hfqqzq4Xc-3eTjbz-jcQY84V5E1YM71E/view?usp=sharing

📦Collection :
+ prithivMLmods/vision-language-models-67639f790e806e1f9799979f

.
.
.
@prithivMLmods 🤗
  • 1 reply
·
prithivMLmods 
posted an update 16 days ago
view post
Post
3215
🎄 Here Before - Xmas🎅✨

🧑🏻‍🎄Models
+ [ Xmas 2D Illustration ] : strangerzonehf/Flux-Xmas-Illustration-LoRA
+ [ Xmas 3D Art ] : strangerzonehf/Flux-Xmas-3D-LoRA
+ [ Xmas Chocolate ] : strangerzonehf/Flux-Xmas-Chocolate-LoRA
+ [ Xmas Isometric Kit ] : strangerzonehf/Flux-Xmas-Isometric-Kit-LoRA
+ [ Xmas Realpix ] : strangerzonehf/Flux-Xmas-Realpix-LoRA
+ [ Xmas Anime ] : strangerzonehf/Flux-Anime-Xmas-LoRA

❄️Collections
+ [ Xmas Art ] : strangerzonehf/christmas-pack-6758b199487adafaddb68f82
+ [ Stranger Zone Collection ] : prithivMLmods/stranger-zone-collections-org-6737118adcf2cb40d66d0c7e

🥶Page
+ [ Stranger Zone ] : https://huggingface.co/strangerzonehf


.
.
.
@prithivMLmods 🤗
AtAndDev 
posted an update 17 days ago
view post
Post
359
@s3nh Hey man check your discord! Got some news.
  • 4 replies
·
merve 
posted an update 17 days ago
view post
Post
2731
Aya by Cohere For AI can now see! 👀

C4AI community has built Maya 8B, a new open-source multilingual VLM built on SigLIP and Aya 8B 🌱 works on 8 languages! 🗣️

The authors extend Llava dataset using Aya's translation capabilities with 558k examples!
ry it here kkr5155/maya_demo

Dataset maya-multimodal/pretrain

Model maya-multimodal/maya 👏
kudos @nahidalam and team
  • 1 reply
·
merve 
posted an update 17 days ago
view post
Post
3159
Apollo is a new family of open-source video language models by Meta, where 3B model outperforms most 7B models and 7B outperforms most 30B models 🧶

✨ the models come in 1.5B https://huggingface.co/Apollo-LMMs/Apollo-1_5B-t32, 3B https://huggingface.co/Apollo-LMMs/Apollo-3B-t32 and 7B https://huggingface.co/Apollo-LMMs/Apollo-7B-t32 with A2.0 license, based on Qwen1.5 & Qwen2
✨ the authors also release a benchmark dataset https://huggingface.co/spaces/Apollo-LMMs/ApolloBench

The paper has a lot of experiments (they trained 84 models!) about what makes the video LMs work ⏯️

Try the demo for best setup here https://huggingface.co/spaces/Apollo-LMMs/Apollo-3B
they evaluate sampling strategies, scaling laws for models and datasets, video representation and more!
> The authors find out that whatever design decision was applied to small models also scale properly when the model and dataset are scaled 📈 scaling dataset has diminishing returns for smaller models
> They evaluate frame sampling strategies, and find that FPS sampling is better than uniform sampling, and they find 8-32 tokens per frame optimal
> They also compare image encoders, they try a variation of models from shape optimized SigLIP to DINOv2
they find google/siglip-so400m-patch14-384 to be most powerful 🔥
> they also compare freezing different parts of models, training all stages with some frozen parts give the best yield

They eventually release three models, where Apollo-3B outperforms most 7B models and Apollo 7B outperforms 30B models 🔥
·
prithivMLmods 
posted an update 21 days ago
merve 
posted an update 23 days ago
view post
Post
1739
A complete RAG pipeline includes a reranker, which ranks the documents to find the best document 📓
Same goes for multimodal RAG, multimodal rerankers which we can integrate to multimodal RAG pipelines!
Learn how to build a complete multimodal RAG pipeline with vidore/colqwen2-v1.0 as retriever, lightonai/MonoQwen2-VL-v0.1 as reranker, Qwen/Qwen2-VL-7B-Instruct as VLM in this notebook that runs on a GPU as small as L4 🔥 https://huggingface.co/learn/cookbook/multimodal_rag_using_document_retrieval_and_reranker_and_vlms
merve 
posted an update 27 days ago
view post
Post
5558
This week in open-source AI was insane 🤠 A small recap🕺🏻 merve/dec-6-releases-67545caebe9fc4776faac0a3

Multimodal 🖼️
> Google shipped a PaliGemma 2, new iteration of PaliGemma with more sizes: 3B, 10B and 28B, with pre-trained and captioning variants 👏
> OpenGVLab released InternVL2, seven new vision LMs in different sizes, with sota checkpoint with MIT license ✨
> Qwen team at Alibaba released the base models of Qwen2VL models with 2B, 7B and 72B ckpts

LLMs 💬
> Meta released a new iteration of Llama 70B, Llama3.2-70B trained further
> EuroLLM-9B-Instruct is a new multilingual LLM for European languages with Apache 2.0 license 🔥
> Dataset: CohereForAI released GlobalMMLU, multilingual version of MMLU with 42 languages with Apache 2.0 license
> Dataset: QwQ-LongCoT-130K is a new dataset to train reasoning models
> Dataset: FineWeb2 just landed with multilinguality update! 🔥 nearly 8TB pretraining data in many languages!

Image/Video Generation 🖼️
> Tencent released HunyuanVideo, a new photorealistic video generation model
> OminiControl is a new editing/control framework for image generation models like Flux

Audio 🔊
> Indic-Parler-TTS is a new text2speech model made by community
merve 
posted an update 28 days ago
prithivMLmods 
posted an update 29 days ago
view post
Post
3828
Near 3:2 { 1280*832 } Adapters 🔥

🧪The datasets were prepared for a 3:2 aspect ratio by processing images of any dimension (width × height) in alignment with the adapter's concept. This involved using techniques such as magic expand, magic fill, or outpainting to adjust the remaining parts of the image to achieve the 3:2 ratio & posts training. This approach enhanced the desired image quality to up to 2 MB for detailed prompts and reduced artifacts in images sized at 1280 × 832.

🎈This approach was used instead of cropping down the 2x or 3x zoomed positions in the actual image. It generative filling to adjust the image's aspect ratio proportionally within the dataset.

🔧I used Canva's Magic Expand, Firefly's Generative Fill, and Flux's Outpaint for aspect ratio adjustments.

⬇️Model DLC :
+ [ Microworld Nft ] : strangerzonehf/Flux-Microworld-NFT-LoRA
+ [ Creative Stocks ] : strangerzonehf/Flux-Creative-Stocks-LoRA
+ [ Icon-Kit ] : strangerzonehf/Flux-Icon-Kit-LoRA
+ [ Claymation ] : strangerzonehf/Flux-Claymation-XC-LoRA
+ [ Super Portrait ] : strangerzonehf/Flux-Super-Portrait-LoRA
+ [ Ghibli Art ] : strangerzonehf/Flux-Ghibli-Art-LoRA
+ [ Isometric Site ] : strangerzonehf/Flux-Isometric-Site-LoRA

🧨Page :
1] Stranger Zone: https://huggingface.co/strangerzonehf

💣Space :
1] Flux LoRA DLC: prithivMLmods/FLUX-LoRA-DLC

📦Collections :
1] strangerzonehf/flux-3dxl-engine-674833c14a001d5b1fdb5139
2] prithivMLmods/flux-lora-collections-66dd5908be2206cfaa8519be
3] strangerzonehf/animaker-engine-673714956dec98c400c30cf6
4] strangerzonehf/mixer-engine-673582c9c5939d8aa5bf9533

.
.
.
@prithivMLmods
  • 1 reply
·
prithivMLmods 
posted an update about 1 month ago
view post
Post
2636
Milestone for Flux.1 Dev 🔥

💢The Flux.1 Dev model has crossed 1️⃣0️⃣,0️⃣0️⃣0️⃣ creative public adapters! 🎈
🔗 https://huggingface.co/models?other=base_model:adapter:black-forest-labs/FLUX.1-dev

💢This includes:
- 266 Finetunes
- 19 Quants
- 4 Merges

💢 Here’s the 10,000th public adapter : 😜
+ strangerzonehf/Flux-3DXL-Partfile-0006

💢 Page :
+ https://huggingface.co/strangerzonehf

💢 Collection :
+ prithivMLmods/flux-lora-collections-66dd5908be2206cfaa8519be
ZennyKenny 
posted an update about 1 month ago
merve 
posted an update about 1 month ago
view post
Post
2647
small but mighty 🔥
you can fine-tune SmolVLM on an L4 with batch size of 4 and it will only take 16.4 GB VRAM 🫰🏻 also with gradient accumulation simulated batch size is 16 ✨
I made a notebook that includes all the goodies: QLoRA, gradient accumulation, gradient checkpointing with explanations on how they work 💝 https://github.com/huggingface/smollm/blob/main/finetuning/Smol_VLM_FT.ipynb