-
Seamless Human Motion Composition with Blended Positional Encodings
Paper • 2402.15509 • Published • 14 -
TripoSR: Fast 3D Object Reconstruction from a Single Image
Paper • 2403.02151 • Published • 12 -
3D-VLA: A 3D Vision-Language-Action Generative World Model
Paper • 2403.09631 • Published • 7 -
Controllable Text-to-3D Generation via Surface-Aligned Gaussian Splatting
Paper • 2403.09981 • Published • 6
Collections
Discover the best community collections!
Collections including paper arxiv:2404.13013
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 25 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 12 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 40 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 20
-
PRDP: Proximal Reward Difference Prediction for Large-Scale Reward Finetuning of Diffusion Models
Paper • 2402.08714 • Published • 11 -
Data Engineering for Scaling Language Models to 128K Context
Paper • 2402.10171 • Published • 23 -
RLVF: Learning from Verbal Feedback without Overgeneralization
Paper • 2402.10893 • Published • 10 -
Coercing LLMs to do and reveal (almost) anything
Paper • 2402.14020 • Published • 12
-
Kosmos-2.5: A Multimodal Literate Model
Paper • 2309.11419 • Published • 50 -
Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities
Paper • 2311.05698 • Published • 9 -
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks
Paper • 2311.06242 • Published • 87 -
PolyMaX: General Dense Prediction with Mask Transformer
Paper • 2311.05770 • Published • 6