Xargs Lynx

xargs01

AI & ML interests

None yet

Recent Activity

reacted to m-ric's post with 👀 2 days ago

𝗠𝗶𝗻𝗶𝗠𝗮𝘅'𝘀 𝗻𝗲𝘄 𝗠𝗼𝗘 𝗟𝗟𝗠 𝗿𝗲𝗮𝗰𝗵𝗲𝘀 𝗖𝗹𝗮𝘂𝗱𝗲-𝗦𝗼𝗻𝗻𝗲𝘁 𝗹𝗲𝘃𝗲𝗹 𝘄𝗶𝘁𝗵 𝟰𝗠 𝘁𝗼𝗸𝗲𝗻𝘀 𝗰𝗼𝗻𝘁𝗲𝘅𝘁 𝗹𝗲𝗻𝗴𝘁𝗵 💥 This work from Chinese startup @MiniMax-AI introduces a novel architecture that achieves state-of-the-art performance while handling context windows up to 4 million tokens - roughly 20x longer than current models. The key was combining lightning attention, mixture of experts (MoE), and a careful hybrid approach. 𝗞𝗲𝘆 𝗶𝗻𝘀𝗶𝗴𝗵𝘁𝘀: 🏗️ MoE with novel hybrid attention: ‣ Mixture of Experts with 456B total parameters (45.9B activated per token) ‣ Combines Lightning attention (linear complexity) for most layers and traditional softmax attention every 8 layers 🏆 Outperforms leading models across benchmarks while offering vastly longer context: ‣ Competitive with GPT-4/Claude-3.5-Sonnet on most tasks ‣ Can efficiently handle 4M token contexts (vs 256K for most other LLMs) 🔬 Technical innovations enable efficient scaling: ‣ Novel expert parallel and tensor parallel strategies cut communication overhead in half ‣ Improved linear attention sequence parallelism, multi-level padding and other optimizations achieve 75% GPU utilization (that's really high, generally utilization is around 50%) 🎯 Thorough training strategy: ‣ Careful data curation and quality control by using a smaller preliminary version of their LLM as a judge! Overall, not only is the model impressive, but the technical paper is also really interesting! 📝 It has lots of insights including a great comparison showing how a 2B MoE (24B total) far outperforms a 7B model for the same amount of FLOPs. Read it in full here 👉 https://huggingface.co/papers/2501.08313 Model here, allows commercial use <100M monthly users 👉 https://huggingface.co/MiniMaxAI/MiniMax-Text-01

liked a model 6 days ago

mradermacher/Phi-4-AbliteratedRP-i1-GGUF

liked a Space 28 days ago

artificialguybr/video-dubbing

View all activity

Organizations

None yet

xargs01's activity

liked a model 6 days ago

mradermacher/Phi-4-AbliteratedRP-i1-GGUF

Updated 8 days ago • 1.74k • 5

liked a Space 28 days ago

Running on Zero

273

🚀

Video Dubbing

liked 3 models about 1 month ago

liked 2 Spaces about 1 month ago

Running on A10G

189

🏃

CharacterGen

Gradio demo of CharacterGen (SIGGRAPH 2024)

Running

512

👁

Edge TTS Text To Speech

liked a model about 2 months ago

OuteAI/OuteTTS-0.1-350M-GGUF

Text-to-Speech • Updated Nov 27, 2024 • 225 • 34

liked a Space 3 months ago

Running

🧠

Mistral Small 22B (2409)

Mistral Small 22B snapshot from Sep 2024

liked a Space about 1 year ago

Running on A10G

4.71k

🎵

MusicGen

liked a model about 1 year ago

facebook/musicgen-stereo-large

Text-to-Audio • Updated Mar 6, 2024 • 1.19k • 70

liked a Space about 1 year ago

Runtime error

516

📞

Seamless M4T v2

liked 2 models over 1 year ago

lllyasviel/sd_control_collection

Updated Sep 9, 2023 • 1.85k

dreamlike-art/dreamlike-anime-1.0

Text-to-Image • Updated Mar 13, 2023 • 11.4k • 247

liked a model almost 2 years ago

lllyasviel/ControlNet-v1-1

Updated Apr 25, 2023 • 3.7k

liked a Space almost 2 years ago

Runtime error

447

🦙

Alpaca-LoRA

liked a model almost 2 years ago

Anashel/rpg

Text-to-Image • Updated Sep 4, 2024 • 40 • 294

liked 3 models about 2 years ago

darkstorm2150/Protogen_x3.4_Official_Release

Text-to-Image • Updated May 10, 2023 • 522 • 350

dreamlike-art/dreamlike-diffusion-1.0

Text-to-Image • Updated Jan 27, 2023 • 25.3k • 1.02k

prompthero/openjourney-v4

Text-to-Image • Updated May 15, 2023 • 43.6k • • 1.23k