-
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning
Paper • 2404.16994 • Published • 35 -
VideoMamba: State Space Model for Efficient Video Understanding
Paper • 2403.06977 • Published • 27 -
VideoAgent: Long-form Video Understanding with Large Language Model as Agent
Paper • 2403.10517 • Published • 32 -
Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding
Paper • 2403.09626 • Published • 13
Collections
Discover the best community collections!
Collections including paper arxiv:2404.01258
-
Fine-Tuning Language Models from Human Preferences
Paper • 1909.08593 • Published • 3 -
Transforming and Combining Rewards for Aligning Large Language Models
Paper • 2402.00742 • Published • 11 -
Leverage the Average: an Analysis of KL Regularization in RL
Paper • 2003.14089 • Published • 2 -
Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward
Paper • 2404.01258 • Published • 10
-
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 50 -
ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization
Paper • 2402.09320 • Published • 6 -
sDPO: Don't Use Your Data All at Once
Paper • 2403.19270 • Published • 40 -
Dueling RL: Reinforcement Learning with Trajectory Preferences
Paper • 2111.04850 • Published • 2
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 25 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 12 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 40 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 20
-
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 145 -
Orion-14B: Open-source Multilingual Large Language Models
Paper • 2401.12246 • Published • 12 -
MambaByte: Token-free Selective State Space Model
Paper • 2401.13660 • Published • 53 -
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper • 2401.13601 • Published • 45
-
Exponentially Faster Language Modelling
Paper • 2311.10770 • Published • 117 -
stabilityai/stable-video-diffusion-img2vid-xt
Image-to-Video • Updated • 161k • 2.79k -
LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes
Paper • 2311.13384 • Published • 50 -
HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-shot Speech Synthesis
Paper • 2311.12454 • Published • 30
-
UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs
Paper • 2311.09257 • Published • 45 -
VideoPoet: A Large Language Model for Zero-Shot Video Generation
Paper • 2312.14125 • Published • 44 -
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones
Paper • 2312.16862 • Published • 30 -
VideoDrafter: Content-Consistent Multi-Scene Video Generation with LLM
Paper • 2401.01256 • Published • 19