Xargs Lynx

xargs01

AI & ML interests

None yet

Recent Activity

reacted to m-ric's post with 👀 2 days ago

𝗠𝗶𝗻𝗶𝗠𝗮𝘅'𝘀 𝗻𝗲𝘄 𝗠𝗼𝗘 𝗟𝗟𝗠 𝗿𝗲𝗮𝗰𝗵𝗲𝘀 𝗖𝗹𝗮𝘂𝗱𝗲-𝗦𝗼𝗻𝗻𝗲𝘁 𝗹𝗲𝘃𝗲𝗹 𝘄𝗶𝘁𝗵 𝟰𝗠 𝘁𝗼𝗸𝗲𝗻𝘀 𝗰𝗼𝗻𝘁𝗲𝘅𝘁 𝗹𝗲𝗻𝗴𝘁𝗵 💥 This work from Chinese startup @MiniMax-AI introduces a novel architecture that achieves state-of-the-art performance while handling context windows up to 4 million tokens - roughly 20x longer than current models. The key was combining lightning attention, mixture of experts (MoE), and a careful hybrid approach. 𝗞𝗲𝘆 𝗶𝗻𝘀𝗶𝗴𝗵𝘁𝘀: 🏗️ MoE with novel hybrid attention: ‣ Mixture of Experts with 456B total parameters (45.9B activated per token) ‣ Combines Lightning attention (linear complexity) for most layers and traditional softmax attention every 8 layers 🏆 Outperforms leading models across benchmarks while offering vastly longer context: ‣ Competitive with GPT-4/Claude-3.5-Sonnet on most tasks ‣ Can efficiently handle 4M token contexts (vs 256K for most other LLMs) 🔬 Technical innovations enable efficient scaling: ‣ Novel expert parallel and tensor parallel strategies cut communication overhead in half ‣ Improved linear attention sequence parallelism, multi-level padding and other optimizations achieve 75% GPU utilization (that's really high, generally utilization is around 50%) 🎯 Thorough training strategy: ‣ Careful data curation and quality control by using a smaller preliminary version of their LLM as a judge! Overall, not only is the model impressive, but the technical paper is also really interesting! 📝 It has lots of insights including a great comparison showing how a 2B MoE (24B total) far outperforms a 7B model for the same amount of FLOPs. Read it in full here 👉 https://huggingface.co/papers/2501.08313 Model here, allows commercial use <100M monthly users 👉 https://huggingface.co/MiniMaxAI/MiniMax-Text-01

liked a model 6 days ago

mradermacher/Phi-4-AbliteratedRP-i1-GGUF

liked a Space 28 days ago

artificialguybr/video-dubbing

View all activity

Organizations

None yet

xargs01's activity

reacted to m-ric's post with 👀 2 days ago

Post

889

𝗠𝗶𝗻𝗶𝗠𝗮𝘅'𝘀 𝗻𝗲𝘄 𝗠𝗼𝗘 𝗟𝗟𝗠 𝗿𝗲𝗮𝗰𝗵𝗲𝘀 𝗖𝗹𝗮𝘂𝗱𝗲-𝗦𝗼𝗻𝗻𝗲𝘁 𝗹𝗲𝘃𝗲𝗹 𝘄𝗶𝘁𝗵 𝟰𝗠 𝘁𝗼𝗸𝗲𝗻𝘀 𝗰𝗼𝗻𝘁𝗲𝘅𝘁 𝗹𝗲𝗻𝗴𝘁𝗵 💥

This work from Chinese startup @MiniMax-AI introduces a novel architecture that achieves state-of-the-art performance while handling context windows up to 4 million tokens - roughly 20x longer than current models. The key was combining lightning attention, mixture of experts (MoE), and a careful hybrid approach.

𝗞𝗲𝘆 𝗶𝗻𝘀𝗶𝗴𝗵𝘁𝘀:

🏗️ MoE with novel hybrid attention:
‣ Mixture of Experts with 456B total parameters (45.9B activated per token)
‣ Combines Lightning attention (linear complexity) for most layers and traditional softmax attention every 8 layers

🏆 Outperforms leading models across benchmarks while offering vastly longer context:
‣ Competitive with GPT-4/Claude-3.5-Sonnet on most tasks
‣ Can efficiently handle 4M token contexts (vs 256K for most other LLMs)

🔬 Technical innovations enable efficient scaling:
‣ Novel expert parallel and tensor parallel strategies cut communication overhead in half
‣ Improved linear attention sequence parallelism, multi-level padding and other optimizations achieve 75% GPU utilization (that's really high, generally utilization is around 50%)

🎯 Thorough training strategy:
‣ Careful data curation and quality control by using a smaller preliminary version of their LLM as a judge!

Overall, not only is the model impressive, but the technical paper is also really interesting! 📝
It has lots of insights including a great comparison showing how a 2B MoE (24B total) far outperforms a 7B model for the same amount of FLOPs.

Read it in full here 👉 MiniMax-01: Scaling Foundation Models with Lightning Attention (2501.08313)
Model here, allows commercial use <100M monthly users 👉 MiniMaxAI/MiniMax-Text-01

liked a model 6 days ago

mradermacher/Phi-4-AbliteratedRP-i1-GGUF

Updated 8 days ago • 1.74k • 5

liked a Space 28 days ago

Running on Zero

273

🚀

Video Dubbing

liked a model about 1 month ago

mradermacher/MambaHermes-3B-i1-GGUF

Updated about 1 month ago • 350 • 1

reacted to m-ric's post with 👍 about 1 month ago

Post

2319

After 6 years, BERT, the workhorse of encoder models, finally gets a replacement: 𝗪𝗲𝗹𝗰𝗼𝗺𝗲 𝗠𝗼𝗱𝗲𝗿𝗻𝗕𝗘𝗥𝗧! 🤗

We talk a lot about ✨Generative AI✨, meaning "Decoder version of the Transformers architecture", but this is only one of the ways to build LLMs: encoder models, that turn a sentence in a vector, are maybe even more widely used in industry than generative models.

The workhorse for this category has been BERT since its release in 2018 (that's prehistory for LLMs).

It's not a fancy 100B parameters supermodel (just a few hundred millions), but it's an excellent workhorse, kind of a Honda Civic for LLMs.

Many applications use BERT-family models - the top models in this category cumulate millions of downloads on the Hub.

➡️ Now a collaboration between Answer.AI and LightOn just introduced BERT's replacement: ModernBERT.

𝗧𝗟;𝗗𝗥:
🏛️ Architecture changes:
⇒ First, standard modernizations:
- Rotary positional embeddings (RoPE)
- Replace GeLU with GeGLU,
- Use Flash Attention 2
✨ The team also introduced innovative techniques like alternating attention instead of full attention, and sequence packing to get rid of padding overhead.

🥇 As a result, the model tops the game of encoder models:
It beats previous standard DeBERTaV3 for 1/5th the memory footprint, and runs 4x faster!

Read the blog post 👉 https://huggingface.co/blog/modernbert

1 reply

liked 2 models about 1 month ago

mradermacher/Llama-3.2-3B-Instruct-abliterated-i1-GGUF

Updated Nov 20, 2024 • 420 • 3

mradermacher/Llama3.2-3B-ShiningValiant2-i1-GGUF

Updated Nov 18, 2024 • 204 • 2

liked 2 Spaces about 1 month ago

Running on A10G

189

🏃

CharacterGen

Gradio demo of CharacterGen (SIGGRAPH 2024)

Running

512

👁

Edge TTS Text To Speech

liked a model about 2 months ago

OuteAI/OuteTTS-0.1-350M-GGUF

Text-to-Speech • Updated Nov 27, 2024 • 225 • 34

liked a Space 3 months ago

Running

🧠

Mistral Small 22B (2409)

Mistral Small 22B snapshot from Sep 2024

reacted to vladbogo's post with ❤️ 11 months ago

Post

REALIGN is a new method designed to improve the alignment of Large Language Models (LLMs) with human values by reformatting instruction data. This approach enhances LLM performance across various metrics by aligning responses with predefined criteria and evidence.

Key points:

* REALIGN has three steps: criteria definition, retrieval augmentation, and response reformatting
* It rewrites pairs (query, response) to enhance data quality for fine-tuning LLMs.
* It has shown significant improvements in general alignment, math reasoning and other tasks.

Congrats to the authors for their work!

Paper: Reformatted Alignment (2402.12219)
Code: https://github.com/GAIR-NLP/ReAlign