-
LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs
Paper • 2501.06186 • Published • 58 -
Graph-Aware Isomorphic Attention for Adaptive Dynamics in Transformers
Paper • 2501.02393 • Published • 8 -
Virgo: A Preliminary Exploration on Reproducing o1-like MLLM
Paper • 2501.01904 • Published • 31 -
ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing
Paper • 2412.14711 • Published • 16
Hasan Arif
hasanar1f
AI & ML interests
Efficient training and inference
Recent Activity
updated
a collection
1 day ago
ML Optimization Papers
updated
a collection
1 day ago
Fundamentals
upvoted
a
paper
1 day ago
ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing
Organizations
Collections
2
-
FAST: Efficient Action Tokenization for Vision-Language-Action Models
Paper • 2501.09747 • Published • 22 -
Tensor Product Attention Is All You Need
Paper • 2501.06425 • Published • 75 -
SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training
Paper • 2501.06842 • Published • 15 -
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token
Paper • 2501.03895 • Published • 48
Papers
1
spaces
1
models
None public yet
datasets
None public yet