view article Article πΊπ¦ββ¬ LLM Comparison/Test: DeepSeek-V3, QVQ-72B-Preview, Falcon3 10B, Llama 3.3 70B, Nemotron 70B in my updated MMLU-Pro CS benchmark By wolfram β’ 1 day ago β’ 24
How Well Do LLMs Generate Code for Different Application Domains? Benchmark and Evaluation Paper β’ 2412.18573 β’ Published 10 days ago β’ 1
DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines Paper β’ 2310.03714 β’ Published Oct 5, 2023 β’ 32
RobustFT: Robust Supervised Fine-tuning for Large Language Models under Noisy Response Paper β’ 2412.14922 β’ Published 16 days ago β’ 82
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs Paper β’ 2412.18925 β’ Published 10 days ago β’ 82
YuLan-Mini: An Open Data-efficient Language Model Paper β’ 2412.17743 β’ Published 11 days ago β’ 59
Spectrum: Targeted Training on Signal to Noise Ratio Paper β’ 2406.06623 β’ Published Jun 7, 2024 β’ 12
RetroLLM: Empowering Large Language Models to Retrieve Fine-grained Evidence within Generation Paper β’ 2412.11919 β’ Published 18 days ago β’ 33
Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper β’ 2412.10360 β’ Published 21 days ago β’ 136
Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation Paper β’ 2412.06531 β’ Published 26 days ago β’ 71
STIV: Scalable Text and Image Conditioned Video Generation Paper β’ 2412.07730 β’ Published 24 days ago β’ 70
SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints Paper β’ 2412.07760 β’ Published 24 days ago β’ 50
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions Paper β’ 2412.09596 β’ Published 22 days ago β’ 92