view article Article πΊπ¦ββ¬ LLM Comparison/Test: DeepSeek-V3, QVQ-72B-Preview, Falcon3 10B, Llama 3.3 70B, Nemotron 70B in my updated MMLU-Pro CS benchmark By wolfram β’ 4 days ago β’ 30
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment Paper β’ 2412.19326 β’ Published 10 days ago β’ 18
PowerInfer/SmallThinker-3B-Preview Text Generation β’ Updated about 3 hours ago β’ 4.66k β’ β’ 237
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search Paper β’ 2412.18319 β’ Published 13 days ago β’ 34
ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing Paper β’ 2412.14711 β’ Published 18 days ago β’ 15
MixLLM: LLM Quantization with Global Mixed-precision between Output-features and Highly-efficient System Design Paper β’ 2412.14590 β’ Published 18 days ago β’ 13
Proposer-Agent-Evaluator(PAE): Autonomous Skill Discovery For Foundation Model Internet Agents Paper β’ 2412.13194 β’ Published 19 days ago β’ 12
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters Paper β’ 2408.03314 β’ Published Aug 6, 2024 β’ 54
Smaller Language Models Are Better Instruction Evolvers Paper β’ 2412.11231 β’ Published 22 days ago β’ 27