Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2412.11768

Papers - Fine-tuning - PEFT

OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework

Paper • 2404.14619 • Published Apr 22, 2024 • 126
Scaling Down to Scale Up: A Guide to Parameter-Efficient Fine-Tuning

Paper • 2303.15647 • Published Mar 28, 2023 • 4
Hyper-X: A Unified Hypernetwork for Multi-Task Multilingual Transfer

Paper • 2205.12148 • Published May 24, 2022 • 2
No More Adam: Learning Rate Scaling at Initialization is All You Need

Paper • 2412.11768 • Published 24 days ago • 41

Papers - Image - Datasets - ImageNet

All you need is a good init

Paper • 1511.06422 • Published Nov 19, 2015 • 1
Align Your Steps: Optimizing Sampling Schedules in Diffusion Models

Paper • 2404.14507 • Published Apr 22, 2024 • 21
Efficient Transformer Encoders for Mask2Former-style models

Paper • 2404.15244 • Published Apr 23, 2024 • 1
Deep Residual Learning for Image Recognition

Paper • 1512.03385 • Published Dec 10, 2015 • 6

Papers - Pre-training - Layer Initialization

All you need is a good init

Paper • 1511.06422 • Published Nov 19, 2015 • 1
No More Adam: Learning Rate Scaling at Initialization is All You Need

Paper • 2412.11768 • Published 24 days ago • 41

Papers - Image - Datasets - CIFAR

All you need is a good init

Paper • 1511.06422 • Published Nov 19, 2015 • 1
Align Your Steps: Optimizing Sampling Schedules in Diffusion Models

Paper • 2404.14507 • Published Apr 22, 2024 • 21
Deep Residual Learning for Image Recognition

Paper • 1512.03385 • Published Dec 10, 2015 • 6
MoDE: CLIP Data Experts via Clustering

Paper • 2404.16030 • Published Apr 24, 2024 • 12

Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models

Paper • 2404.13013 • Published Apr 19, 2024 • 30
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing

Paper • 2404.12253 • Published Apr 18, 2024 • 54
Data-Efficient Contrastive Language-Image Pretraining: Prioritizing Data Quality over Quantity

Paper • 2403.12267 • Published Mar 18, 2024
No More Adam: Learning Rate Scaling at Initialization is All You Need

Paper • 2412.11768 • Published 24 days ago • 41

Papers - Image - Encoders - ViT

DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic Gaussian Splatting

Paper • 2404.06903 • Published Apr 10, 2024 • 18
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data

Paper • 2404.15653 • Published Apr 24, 2024 • 26
MoDE: CLIP Data Experts via Clustering

Paper • 2404.16030 • Published Apr 24, 2024 • 12
BlenderAlchemy: Editing 3D Graphics with Vision-Language Models

Paper • 2404.17672 • Published Apr 26, 2024 • 18

Papers - Image - Decoders - ViT

Adapting LLaMA Decoder to Vision Transformer

Paper • 2404.06773 • Published Apr 10, 2024 • 17
No More Adam: Learning Rate Scaling at Initialization is All You Need

Paper • 2412.11768 • Published 24 days ago • 41

Papers - Training Research - Optimizers

Why Transformers Need Adam: A Hessian Perspective

Paper • 2402.16788 • Published Feb 26, 2024 • 1
Adam-mini: Use Fewer Learning Rates To Gain More

Paper • 2406.16793 • Published Jun 24, 2024 • 68
No More Adam: Learning Rate Scaling at Initialization is All You Need

Paper • 2412.11768 • Published 24 days ago • 41

Papers - Image - Training

Lexicon-Level Contrastive Visual-Grounding Improves Language Modeling

Paper • 2403.14551 • Published Mar 21, 2024 • 2
Adapting LLaMA Decoder to Vision Transformer

Paper • 2404.06773 • Published Apr 10, 2024 • 17
Toward a Better Understanding of Fourier Neural Operators: Analysis and Improvement from a Spectral Perspective

Paper • 2404.07200 • Published Apr 10, 2024 • 1
An inclusive review on deep learning techniques and their scope in handwriting recognition

Paper • 2404.08011 • Published Apr 10, 2024 • 1

Papers - Training

SELF: Language-Driven Self-Evolution for Large Language Model

Paper • 2310.00533 • Published Oct 1, 2023 • 2
GrowLength: Accelerating LLMs Pretraining by Progressively Growing Training Length

Paper • 2310.00576 • Published Oct 1, 2023 • 2
A Pretrainer's Guide to Training Data: Measuring the Effects of Data Age, Domain Coverage, Quality, & Toxicity

Paper • 2305.13169 • Published May 22, 2023 • 3
Transformers Can Achieve Length Generalization But Not Robustly

Paper • 2402.09371 • Published Feb 14, 2024 • 13

Previous
1
2
3
4
5
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs