Xargs Lynx

xargs01
Ā·

AI & ML interests

None yet

Recent Activity

reacted to m-ric's post with šŸ‘€ 2 days ago
š— š—¶š—»š—¶š— š—®š˜…'š˜€ š—»š—²š˜„ š— š—¼š—˜ š—Ÿš—Ÿš—  š—暝—²š—®š—°š—µš—²š˜€ š—–š—¹š—®š˜‚š—±š—²-š—¦š—¼š—»š—»š—²š˜ š—¹š—²š˜ƒš—²š—¹ š˜„š—¶š˜š—µ šŸ°š—  š˜š—¼š—øš—²š—»š˜€ š—°š—¼š—»š˜š—²š˜…š˜ š—¹š—²š—»š—“š˜š—µ šŸ’„ This work from Chinese startup @MiniMax-AI introduces a novel architecture that achieves state-of-the-art performance while handling context windows up to 4 million tokens - roughly 20x longer than current models. The key was combining lightning attention, mixture of experts (MoE), and a careful hybrid approach. š—žš—²š˜† š—¶š—»š˜€š—¶š—“š—µš˜š˜€: šŸ—ļø MoE with novel hybrid attention: ā€£ Mixture of Experts with 456B total parameters (45.9B activated per token) ā€£ Combines Lightning attention (linear complexity) for most layers and traditional softmax attention every 8 layers šŸ† Outperforms leading models across benchmarks while offering vastly longer context: ā€£ Competitive with GPT-4/Claude-3.5-Sonnet on most tasks ā€£ Can efficiently handle 4M token contexts (vs 256K for most other LLMs) šŸ”¬ Technical innovations enable efficient scaling: ā€£ Novel expert parallel and tensor parallel strategies cut communication overhead in half ā€£ Improved linear attention sequence parallelism, multi-level padding and other optimizations achieve 75% GPU utilization (that's really high, generally utilization is around 50%) šŸŽÆ Thorough training strategy: ā€£ Careful data curation and quality control by using a smaller preliminary version of their LLM as a judge! Overall, not only is the model impressive, but the technical paper is also really interesting! šŸ“ It has lots of insights including a great comparison showing how a 2B MoE (24B total) far outperforms a 7B model for the same amount of FLOPs. Read it in full here šŸ‘‰ https://huggingface.co/papers/2501.08313 Model here, allows commercial use <100M monthly users šŸ‘‰ https://huggingface.co/MiniMaxAI/MiniMax-Text-01
liked a Space 28 days ago
artificialguybr/video-dubbing
View all activity

Organizations

None yet

xargs01's activity

liked a Space about 1 year ago
liked a Space about 1 year ago
liked a Space almost 2 years ago