emjay73
's Collections
architecture
updated
TCNCA: Temporal Convolution Network with Chunked Attention for Scalable
Sequence Processing
Paper
•
2312.05605
•
Published
•
2
VMamba: Visual State Space Model
Paper
•
2401.10166
•
Published
•
38
Rethinking Patch Dependence for Masked Autoencoders
Paper
•
2401.14391
•
Published
•
23
Deconstructing Denoising Diffusion Models for Self-Supervised Learning
Paper
•
2401.14404
•
Published
•
17
LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D
Generation
Paper
•
2403.12019
•
Published
•
9
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale
Prediction
Paper
•
2404.02905
•
Published
•
65
On the Scalability of Diffusion-based Text-to-Image Generation
Paper
•
2404.02883
•
Published
•
17
ViTAR: Vision Transformer with Any Resolution
Paper
•
2403.18361
•
Published
•
52
When Do We Not Need Larger Vision Models?
Paper
•
2403.13043
•
Published
•
25
Paper
•
2405.18407
•
Published
•
46
An Image is Worth More Than 16x16 Patches: Exploring Transformers on
Individual Pixels
Paper
•
2406.09415
•
Published
•
50
MambaVision: A Hybrid Mamba-Transformer Vision Backbone
Paper
•
2407.08083
•
Published
•
28
FAN: Fourier Analysis Networks
Paper
•
2410.02675
•
Published
•
25
Paper
•
2410.05258
•
Published
•
169