Stoney Kang

sikang99

AI & ML interests

Remote Control based on Vision

Recent Activity

liked a model 12 days ago

microsoft/phi-4

reacted to merve's post with 👍 about 2 months ago

Last week we were blessed with open-source models! A recap 💝 https://huggingface.co/collections/merve/nov-29-releases-674ccc255a57baf97b1e2d31 🖼️ Multimodal > At Hugging Face we released SmolVLM, a performant and efficient smol vision language model 💗 > Show Lab released ShowUI-2B: new vision-language-action model to build GUI/web automation agents 🤖 > Rhymes AI has released the base model of Aria: Aria-Base-64K and Aria-Base-8K with their respective context length > ViDoRe team released ColSmolVLM: A new ColPali-like retrieval model based on SmolVLM > Dataset: Llava-CoT-o1-Instruct: new dataset labelled using Llava-CoT multimodal reasoning model📖 > Dataset: LLaVA-CoT-100k dataset used to train Llava-CoT released by creators of Llava-CoT 📕 💬 LLMs > Qwen team released QwQ-32B-Preview, state-of-the-art open-source reasoning model, broke the internet 🔥 > AliBaba has released Marco-o1, a new open-source reasoning model 💥 > NVIDIA released Hymba 1.5B Base and Instruct, the new state-of-the-art SLMs with hybrid architecture (Mamba + transformer) ⏯️ Image/Video Generation > Qwen2VL-Flux: new image generation model based on Qwen2VL image encoder, T5 and Flux for generation > Lightricks released LTX-Video, a new DiT-based video generation model that can generate 24 FPS videos at 768x512 res ⏯️ > Dataset: Image Preferences is a new image generation preference dataset made with DIBT community effort of Argilla 🏷️ Audio > OuteAI released OuteTTS-0.2-500M new multilingual text-to-speech model based on Qwen-2.5-0.5B trained on 5B audio prompt tokens

reacted to merve's post with ❤️ 3 months ago

This is not a drill 💥 HuggingChat is now multimodal with https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct! 🤗 This also comes with multimodal assistants, I have migrated my Marcus Aurelius advice assistant to Llama-Vision and Marcus can see now! 😄 Chat with Marcus: https://hf.co/chat/assistant/65bfed22022ba290531112f8 Start chatting with Llama-Vision 3.2 11B Instruct https://huggingface.co/chat/models/meta-llama/Llama-3.2-11B-Vision-Instruct

View all activity

Organizations

None yet

sikang99's activity

upvoted 2 papers 4 months ago

Qwen2.5-Coder Technical Report

Paper • 2409.12186 • Published Sep 18, 2024 • 140

Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution

Paper • 2409.12191 • Published Sep 18, 2024 • 76

upvoted a collection 4 months ago

Moshi v0.1 Release

Collection

MLX, Candle & PyTorch model checkpoints released as part of the Moshi release from Kyutai. Run inference via: https://github.com/kyutai-labs/moshi • 13 items • Updated Sep 18, 2024 • 225

upvoted an article 5 months ago

Article

Train Custom Models on Hugging Face Spaces with AutoTrain SpaceRunner

•

May 9, 2024

• 13

upvoted 2 papers 5 months ago

UniBench: Visual Reasoning Requires Rethinking Vision-Language Beyond Scaling

Paper • 2408.04810 • Published Aug 9, 2024 • 23

VITA: Towards Open-Source Interactive Omni Multimodal LLM

Paper • 2408.05211 • Published Aug 9, 2024 • 47