-
Qwen2.5-Coder Technical Report
Paper • 2409.12186 • Published • 139 -
Attention Heads of Large Language Models: A Survey
Paper • 2409.03752 • Published • 89 -
Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency
Paper • 2409.02634 • Published • 91 -
OmniGen: Unified Image Generation
Paper • 2409.11340 • Published • 109
Collections
Discover the best community collections!
Collections including paper arxiv:2409.12186
-
LLMs + Persona-Plug = Personalized LLMs
Paper • 2409.11901 • Published • 32 -
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning
Paper • 2409.12183 • Published • 37 -
Chain of Thought Empowers Transformers to Solve Inherently Serial Problems
Paper • 2402.12875 • Published • 13 -
TPI-LLM: Serving 70B-scale LLMs Efficiently on Low-resource Edge Devices
Paper • 2410.00531 • Published • 30
-
Qwen2.5-Coder Technical Report
Paper • 2409.12186 • Published • 139 -
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement
Paper • 2409.12122 • Published • 3 -
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
Paper • 2405.04434 • Published • 14 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 76
-
Agent Workflow Memory
Paper • 2409.07429 • Published • 29 -
MVLLaVA: An Intelligent Agent for Unified and Flexible Novel View Synthesis
Paper • 2409.07129 • Published • 7 -
Paper Copilot: A Self-Evolving and Efficient LLM System for Personalized Academic Assistance
Paper • 2409.04593 • Published • 24 -
Imagine yourself: Tuning-Free Personalized Image Generation
Paper • 2409.13346 • Published • 68
-
Law of Vision Representation in MLLMs
Paper • 2408.16357 • Published • 92 -
CogVLM2: Visual Language Models for Image and Video Understanding
Paper • 2408.16500 • Published • 56 -
Learning to Move Like Professional Counter-Strike Players
Paper • 2408.13934 • Published • 23 -
Building and better understanding vision-language models: insights and future directions
Paper • 2408.12637 • Published • 124
-
LLM Pruning and Distillation in Practice: The Minitron Approach
Paper • 2408.11796 • Published • 57 -
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
Paper • 2408.09174 • Published • 51 -
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Paper • 2408.10914 • Published • 41 -
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
Paper • 2408.11878 • Published • 53
-
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper • 2312.00752 • Published • 138 -
Elucidating the Design Space of Diffusion-Based Generative Models
Paper • 2206.00364 • Published • 14 -
GLU Variants Improve Transformer
Paper • 2002.05202 • Published • 1 -
StarCoder 2 and The Stack v2: The Next Generation
Paper • 2402.19173 • Published • 136
-
SelfEval: Leveraging the discriminative nature of generative models for evaluation
Paper • 2311.10708 • Published • 14 -
OmniGen: Unified Image Generation
Paper • 2409.11340 • Published • 109 -
NVLM: Open Frontier-Class Multimodal LLMs
Paper • 2409.11402 • Published • 73 -
Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think
Paper • 2409.11355 • Published • 29