zzfive
's Collections
LocalMamba: Visual State Space Model with Windowed Selective Scan
Paper
•
2403.09338
•
Published
•
7
GiT: Towards Generalist Vision Transformer through Universal Language
Interface
Paper
•
2403.09394
•
Published
•
25
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Paper
•
2402.19479
•
Published
•
32
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection
Paper
•
2405.10300
•
Published
•
26
EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything
Model
Paper
•
2406.20076
•
Published
•
8
SVGCraft: Beyond Single Object Text-to-SVG Synthesis with Comprehensive
Canvas Layout
Paper
•
2404.00412
•
Published
•
2
LKCell: Efficient Cell Nuclei Instance Segmentation with Large
Convolution Kernels
Paper
•
2407.18054
•
Published
•
12
Paper
•
2407.21017
•
Published
•
22
SAM 2: Segment Anything in Images and Videos
Paper
•
2408.00714
•
Published
•
109
NeuFlow v2: High-Efficiency Optical Flow Estimation on Edge Devices
Paper
•
2408.10161
•
Published
•
13
Sapiens: Foundation for Human Vision Models
Paper
•
2408.12569
•
Published
•
89
DepthCrafter: Generating Consistent Long Depth Sequences for Open-world
Videos
Paper
•
2409.02095
•
Published
•
35
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
Paper
•
2409.01704
•
Published
•
83
Mamba-YOLO-World: Marrying YOLO-World with Mamba for Open-Vocabulary
Detection
Paper
•
2409.08513
•
Published
•
11
OmniGen: Unified Image Generation
Paper
•
2409.11340
•
Published
•
108
Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think
Paper
•
2409.11355
•
Published
•
28
Degradation-Guided One-Step Image Super-Resolution with Diffusion Priors
Paper
•
2409.17058
•
Published
•
11
Self-Supervised Any-Point Tracking by Contrastive Random Walks
Paper
•
2409.16288
•
Published
•
5
Lotus: Diffusion-based Visual Foundation Model for High-quality Dense
Prediction
Paper
•
2409.18124
•
Published
•
32
MinerU: An Open-Source Solution for Precise Document Content Extraction
Paper
•
2409.18839
•
Published
•
27
Depth Pro: Sharp Monocular Metric Depth in Less Than a Second
Paper
•
2410.02073
•
Published
•
41
Towards Natural Image Matting in the Wild via Real-Scenario Prior
Paper
•
2410.06593
•
Published
•
2
SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a
Training-Free Memory Tree
Paper
•
2410.16268
•
Published
•
66
SMITE: Segment Me In TimE
Paper
•
2410.18538
•
Published
•
15
GrounDiT: Grounding Diffusion Transformers via Noisy Patch
Transplantation
Paper
•
2410.20474
•
Published
•
14
DELTA: Dense Efficient Long-range 3D Tracking for any video
Paper
•
2410.24211
•
Published
•
8
Face Anonymization Made Simple
Paper
•
2411.00762
•
Published
•
7
ITACLIP: Boosting Training-Free Semantic Segmentation with Image, Text,
and Architectural Enhancements
Paper
•
2411.12044
•
Published
•
13
SEAGULL: No-reference Image Quality Assessment for Regions of Interest
via Vision-Language Instruction Tuning
Paper
•
2411.10161
•
Published
•
8
SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking
with Motion-Aware Memory
Paper
•
2411.11922
•
Published
•
18
DINO-X: A Unified Vision Model for Open-World Object Detection and
Understanding
Paper
•
2411.14347
•
Published
•
13
Knowledge Transfer Across Modalities with Natural Language Supervision
Paper
•
2411.15611
•
Published
•
15
Edge Weight Prediction For Category-Agnostic Pose Estimation
Paper
•
2411.16665
•
Published
•
4
EfficientViM: Efficient Vision Mamba with Hidden State Mixer based State
Space Duality
Paper
•
2411.15241
•
Published
•
5
Scaling Image Tokenizers with Grouped Spherical Quantization
Paper
•
2412.02632
•
Published
•
10
EMOv2: Pushing 5M Vision Model Frontier
Paper
•
2412.06674
•
Published
•
13