Collections
Discover the best community collections!
Collections including paper arxiv:2410.02073
-
Flowing from Words to Pixels: A Framework for Cross-Modality Evolution
Paper • 2412.15213 • Published • 25 -
No More Adam: Learning Rate Scaling at Initialization is All You Need
Paper • 2412.11768 • Published • 41 -
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
Paper • 2412.13663 • Published • 120 -
Autoregressive Video Generation without Vector Quantization
Paper • 2412.14169 • Published • 14
-
CBT-Bench: Evaluating Large Language Models on Assisting Cognitive Behavior Therapy
Paper • 2410.13218 • Published • 4 -
TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models
Paper • 2410.10818 • Published • 15 -
Scaling Up Your Kernels: Large Kernel Design in ConvNets towards Universal Representations
Paper • 2410.08049 • Published • 8 -
Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation
Paper • 2410.05363 • Published • 45
-
EVER: Exact Volumetric Ellipsoid Rendering for Real-time View Synthesis
Paper • 2410.01804 • Published • 5 -
DreamCinema: Cinematic Transfer with Free Camera and 3D Character
Paper • 2408.12601 • Published • 29 -
Build-A-Scene: Interactive 3D Layout Control for Diffusion-Based Image Generation
Paper • 2408.14819 • Published • 21 -
DressRecon: Freeform 4D Human Reconstruction from Monocular Video
Paper • 2409.20563 • Published • 8
-
DreamStruct: Understanding Slides and User Interfaces via Synthetic Data Generation
Paper • 2410.00201 • Published -
Does RAG Introduce Unfairness in LLMs? Evaluating Fairness in Retrieval-Augmented Generation Systems
Paper • 2409.19804 • Published -
Rethinking Conventional Wisdom in Machine Learning: From Generalization to Scaling
Paper • 2409.15156 • Published -
Just ASR + LLM? A Study on Speech Large Language Models' Ability to Identify and Understand Speaker in Spoken Dialogue
Paper • 2409.04927 • Published
-
MotionLLM: Understanding Human Behaviors from Human Motions and Videos
Paper • 2405.20340 • Published • 20 -
Spectrally Pruned Gaussian Fields with Neural Compensation
Paper • 2405.00676 • Published • 8 -
Paint by Inpaint: Learning to Add Image Objects by Removing Them First
Paper • 2404.18212 • Published • 27 -
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report
Paper • 2405.00732 • Published • 119
-
MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training
Paper • 2311.17049 • Published • 1 -
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
Paper • 2405.04434 • Published • 14 -
A Study of Autoregressive Decoders for Multi-Tasking in Computer Vision
Paper • 2303.17376 • Published -
Sigmoid Loss for Language Image Pre-Training
Paper • 2303.15343 • Published • 6
-
LocalMamba: Visual State Space Model with Windowed Selective Scan
Paper • 2403.09338 • Published • 7 -
GiT: Towards Generalist Vision Transformer through Universal Language Interface
Paper • 2403.09394 • Published • 25 -
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Paper • 2402.19479 • Published • 32 -
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection
Paper • 2405.10300 • Published • 27
-
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 605 -
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models
Paper • 2402.17177 • Published • 88 -
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Paper • 2402.19427 • Published • 52 -
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Paper • 2402.19479 • Published • 32