Fundamental Research
Paper • 2408.11029 • Published • 3Note 1. The random search algorithm is to blame due to its unconstrained search space, which accelerates the converging of search towards the bias of verifiers. 2. start -> sample noise i.i.d -> add noise -> denoise -> verify -> Best of N -> start
Token Turing Machines
Paper • 2211.09119 • Published • 1Note 1. The result of memory “read” is fed to the processing unit; The output from the processing unit is “written” to the memory. 2. Token summarisation: implemented as a weighted summation of all context in memory. R_{k x p} * R_{p x d} = R{k x d}; Make R_{k x p} learnable. 3. Add positional embedding to distinguish tokens from memory vs. tokens from inputs.
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
Paper • 2203.12602 • PublishedNote 1. It has an upgrade version: https://arxiv.org/pdf/2303.16727 1.1. Progressive fine-tuning of the pre-trained models can contribute to higher performance. 1.2. Decoder takes inputs from the encoder visible tokens and only reconstructs the visible tokens under the decoder mask. 1.3 The supervision only applies to the decoder output tokens invisible to the encoder.
Getting ViT in Shape: Scaling Laws for Compute-Optimal Model Design
Paper • 2305.13035 • PublishedInference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps
Paper • 2501.09732 • Published • 63Do generative video models learn physical principles from watching videos?
Paper • 2501.09038 • Published • 28VideoWorld: Exploring Knowledge Learning from Unlabeled Videos
Paper • 2501.09781 • Published • 17