An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models Paper • 2403.06764 • Published Mar 11, 2024 • 26
TokenPacker: Efficient Visual Projector for Multimodal LLM Paper • 2407.02392 • Published Jul 2, 2024 • 21
DiffRate : Differentiable Compression Rate for Efficient Vision Transformers Paper • 2305.17997 • Published May 29, 2023