LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis Paper • 2412.15214 • Published 17 days ago • 15
AnySat: An Earth Observation Model for Any Resolutions, Scales, and Modalities Paper • 2412.14123 • Published 18 days ago • 11
Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper • 2412.10360 • Published 23 days ago • 136
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling Paper • 2412.05271 • Published about 1 month ago • 123
Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion Paper • 2412.04424 • Published Dec 5, 2024 • 59
Material Anything: Generating Materials for Any 3D Object via Diffusion Paper • 2411.15138 • Published Nov 22, 2024 • 42
Large-Scale Text-to-Image Model with Inpainting is a Zero-Shot Subject-Driven Image Generator Paper • 2411.15466 • Published Nov 23, 2024 • 34
SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory Paper • 2411.11922 • Published Nov 18, 2024 • 18
FlipSketch: Flipping Static Drawings to Text-Guided Sketch Animations Paper • 2411.10818 • Published Nov 16, 2024 • 24
AnimateAnything: Consistent and Controllable Animation for Video Generation Paper • 2411.10836 • Published Nov 16, 2024 • 23
AutoVFX: Physically Realistic Video Editing from Natural Language Instructions Paper • 2411.02394 • Published Nov 4, 2024 • 17
Bielik 7B v0.1: A Polish Language Model -- Development, Insights, and Evaluation Paper • 2410.18565 • Published Oct 24, 2024 • 44
SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree Paper • 2410.16268 • Published Oct 21, 2024 • 66