Marcos

dreamworks2050

AI & ML interests

None yet

Recent Activity

liked a model about 1 month ago

gghfez/Llama-3.3-90B-Vision-merged

liked a model about 1 month ago

AIDC-AI/Marco-o1

liked a model about 1 month ago

CohereForAI/aya-expanse-32b

View all activity

Organizations

None yet

dreamworks2050's activity

liked 3 models about 1 month ago

liked 3 models 4 months ago

upstage/solar-docvision-preview-tokenizer

Updated Sep 5, 2024 • 6

unsloth/Llama-3.2-11B-Vision

Image-Text-to-Text • Updated Nov 22, 2024 • 4.67k • 28

unsloth/Llama-3.2-11B-Vision-Instruct-bnb-4bit

Image-Text-to-Text • Updated Dec 10, 2024 • 14.1k • 68

liked a model 5 months ago

gcasey2/whisper-large-v3-ko-en-v2

Automatic Speech Recognition • Updated Feb 17, 2024 • 5 • 2

updated a collection 9 months ago

ai-stories

Collection

1 item • Updated May 9, 2024

liked a Space 9 months ago

Paused

708

👻

AI Stories Factory

Generate video stories using AI ✨

upvoted 2 papers 12 months ago

Training-Free Consistent Text-to-Image Generation

Paper • 2402.03286 • Published Feb 5, 2024 • 66

Aligning Large Language Models with Counterfactual DPO

Paper • 2401.09566 • Published Jan 17, 2024 • 2

reacted to abhishek's post with 🤯 about 1 year ago

Post

Happy to announce, brand new, open-source Hugging Face Competitions platform 🚀 Now, create a machine learning competition for your friends, colleagues or the world for FREE* and host it on Hugging Face: the AI community building the future. Creating a competition requires only two steps: pip install competitions, then run competitions create and create competition by answering a few questions 💥 Checkout the github repo: https://github.com/huggingface/competitions and docs: https://hf.co/docs/competitions

6 replies

reacted to clem's post with 🤯 about 1 year ago

Post

Is synthetic data the future of AI? 🔥🔥🔥

@HugoLaurencon @Leyo & @VictorSanh are introducing HuggingFaceM4/WebSight , a multimodal dataset featuring 823,000 pairs of synthetically generated HTML/CSS codes along with screenshots of the corresponding rendered websites to train GPT4-V-like models 🌐💻

While crafting their upcoming foundation vision language model, they faced the challenge of converting website screenshots into usable HTML/CSS codes. Most VLMs suck at this and there was no public dataset available for this specific task, so they decided to create their own.

They prompted existing LLMs to generate 823k HTML/CSS codes of very simple websites. Through supervised fine-tuning of a vision language model on WebSight, they were able to generate the code to reproduce a website component, given a screenshot.

You can explore the dataset here: HuggingFaceM4/WebSight

What do you think?

12 replies

reacted to merve's post with 👍 about 1 year ago

Post

Google's SigLIP is another alternative to openai's CLIP, and it just got merged to 🤗transformers and it's super easy to use!
To celebrate this, I have created a repository including notebooks and bunch of Spaces on various SigLIP based projects 🥳
Search for art 👉 merve/draw_to_search_art
Compare SigLIP with CLIP 👉 merve/compare_clip_siglip

How does SigLIP work?
SigLIP an vision-text pre-training technique based on contrastive learning. It jointly trains an image encoder and text encoder such that the dot product of embeddings are most similar for the appropriate text-image pairs
The image below is taken from CLIP, where this contrastive pre-training takes place with softmax, but SigLIP replaces softmax with sigmoid. 📎

Highlights from the paper on why you should use it ✨
🖼️📝 Authors used medium sized B/16 ViT for image encoder and B-sized transformer for text encoder
😍 More performant than CLIP on zero-shot
🗣️ Authors trained a multilingual model too!
⚡️ Super efficient, sigmoid is enabling up to 1M items per batch, but the authors chose 32k because the performance saturates after that

It's super easy to use thanks to transformers 👇

from transformers import pipeline
from PIL import Image
import requests

# load pipe
image_classifier = pipeline(task="zero-shot-image-classification", model="google/siglip-base-patch16-256-i18n")

# load image
url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
image = Image.open(requests.get(url, stream=True).raw)

# inference
outputs = image_classifier(image, candidate_labels=["2 cats", "a plane", "a remote"])
outputs = [{"score": round(output["score"], 4), "label": output["label"] } for output in outputs]
print(outputs)

For all the SigLIP notebooks on similarity search and indexing, you can check this [repository](https://github.com/merveenoyan/siglip) out. 🤗

2 replies

liked 2 models over 1 year ago

TheBloke/Vicuna-13B-1.1-GPTQ

Text Generation • Updated Jun 23, 2023 • 289 • 208

reeducator/vicuna-13b-free

Text Generation • Updated May 26, 2023 • 718 • 132

liked a model about 2 years ago

kamalkraj/stable-diffusion-v1-4-onnx

Updated Nov 9, 2022 • 3