DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via Multi-Modal Causal Attention Paper • 2309.14327 • Published Sep 25, 2023 • 21
MambaVision: A Hybrid Mamba-Transformer Vision Backbone Paper • 2407.08083 • Published Jul 10, 2024 • 28
Teaching Transformers Causal Reasoning through Axiomatic Training Paper • 2407.07612 • Published Jul 10, 2024 • 2