Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion Paper • 2412.04424 • Published Dec 5, 2024 • 59
Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion Paper • 2412.04424 • Published Dec 5, 2024 • 59 • 4
umd-vt-nyu/clip-evaclip-und-gen_shutterstock-forawrd-backward-long-caption-70M_sft-florence Updated Nov 18, 2024 • 4