Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
23
Xargs Lynx
xargs01
Follow
21world's profile picture
1 follower
Ā·
14 following
AI & ML interests
None yet
Recent Activity
reacted
to
m-ric
's
post
with š
2 days ago
š š¶š»š¶š š®š 'š š»š²š š š¼š ššš šæš²š®š°šµš²š šš¹š®šš±š²-š¦š¼š»š»š²š š¹š²šš²š¹ šš¶ššµ š°š šš¼šøš²š»š š°š¼š»šš²š š š¹š²š»š“ššµ š„ This work from Chinese startup @MiniMax-AI introduces a novel architecture that achieves state-of-the-art performance while handling context windows up to 4 million tokens - roughly 20x longer than current models. The key was combining lightning attention, mixture of experts (MoE), and a careful hybrid approach. šš²š š¶š»šš¶š“šµšš: šļø MoE with novel hybrid attention: ā£ Mixture of Experts with 456B total parameters (45.9B activated per token) ā£ Combines Lightning attention (linear complexity) for most layers and traditional softmax attention every 8 layers š Outperforms leading models across benchmarks while offering vastly longer context: ā£ Competitive with GPT-4/Claude-3.5-Sonnet on most tasks ā£ Can efficiently handle 4M token contexts (vs 256K for most other LLMs) š¬ Technical innovations enable efficient scaling: ā£ Novel expert parallel and tensor parallel strategies cut communication overhead in half ā£ Improved linear attention sequence parallelism, multi-level padding and other optimizations achieve 75% GPU utilization (that's really high, generally utilization is around 50%) šÆ Thorough training strategy: ā£ Careful data curation and quality control by using a smaller preliminary version of their LLM as a judge! Overall, not only is the model impressive, but the technical paper is also really interesting! š It has lots of insights including a great comparison showing how a 2B MoE (24B total) far outperforms a 7B model for the same amount of FLOPs. Read it in full here š https://huggingface.co/papers/2501.08313 Model here, allows commercial use <100M monthly users š https://huggingface.co/MiniMaxAI/MiniMax-Text-01
liked
a model
6 days ago
mradermacher/Phi-4-AbliteratedRP-i1-GGUF
liked
a Space
28 days ago
artificialguybr/video-dubbing
View all activity
Organizations
None yet
xargs01
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
liked
a model
6 days ago
mradermacher/Phi-4-AbliteratedRP-i1-GGUF
Updated
8 days ago
ā¢
1.74k
ā¢
5
liked
a Space
28 days ago
Running
on
Zero
273
š
Video Dubbing
liked
3 models
about 1 month ago
mradermacher/MambaHermes-3B-i1-GGUF
Updated
about 1 month ago
ā¢
350
ā¢
1
mradermacher/Llama-3.2-3B-Instruct-abliterated-i1-GGUF
Updated
Nov 20, 2024
ā¢
420
ā¢
3
mradermacher/Llama3.2-3B-ShiningValiant2-i1-GGUF
Updated
Nov 18, 2024
ā¢
204
ā¢
2
liked
2 Spaces
about 1 month ago
Running
on
A10G
189
š
CharacterGen
Gradio demo of CharacterGen (SIGGRAPH 2024)
Running
512
š
Edge TTS Text To Speech
liked
a model
about 2 months ago
OuteAI/OuteTTS-0.1-350M-GGUF
Text-to-Speech
ā¢
Updated
Nov 27, 2024
ā¢
225
ā¢
34
liked
a Space
3 months ago
Running
4
š§
Mistral Small 22B (2409)
Mistral Small 22B snapshot from Sep 2024
liked
a Space
about 1 year ago
Running
on
A10G
4.71k
šµ
MusicGen
liked
a model
about 1 year ago
facebook/musicgen-stereo-large
Text-to-Audio
ā¢
Updated
Mar 6, 2024
ā¢
1.19k
ā¢
70
liked
a Space
about 1 year ago
Runtime error
516
š
Seamless M4T v2
liked
2 models
over 1 year ago
lllyasviel/sd_control_collection
Updated
Sep 9, 2023
ā¢
1.85k
dreamlike-art/dreamlike-anime-1.0
Text-to-Image
ā¢
Updated
Mar 13, 2023
ā¢
11.4k
ā¢
247
liked
a model
almost 2 years ago
lllyasviel/ControlNet-v1-1
Updated
Apr 25, 2023
ā¢
3.7k
liked
a Space
almost 2 years ago
Runtime error
447
š¦
Alpaca-LoRA
liked
a model
almost 2 years ago
Anashel/rpg
Text-to-Image
ā¢
Updated
Sep 4, 2024
ā¢
40
ā¢
294
liked
3 models
about 2 years ago
darkstorm2150/Protogen_x3.4_Official_Release
Text-to-Image
ā¢
Updated
May 10, 2023
ā¢
522
ā¢
350
dreamlike-art/dreamlike-diffusion-1.0
Text-to-Image
ā¢
Updated
Jan 27, 2023
ā¢
25.3k
ā¢
1.02k
prompthero/openjourney-v4
Text-to-Image
ā¢
Updated
May 15, 2023
ā¢
43.6k
ā¢
ā¢
1.23k
Load more