-
The Chosen One: Consistent Characters in Text-to-Image Diffusion Models
Paper ā¢ 2311.10093 ā¢ Published ā¢ 57 -
Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models
Paper ā¢ 2311.12092 ā¢ Published ā¢ 22 -
DREAM: Diffusion Rectification and Estimation-Adaptive Models
Paper ā¢ 2312.00210 ā¢ Published ā¢ 14 -
HiFi Tuner: High-Fidelity Subject-Driven Fine-Tuning for Diffusion Models
Paper ā¢ 2312.00079 ā¢ Published ā¢ 14
Collections
Discover the best community collections!
Collections including paper arxiv:2312.06550
-
DualMix: Unleashing the Potential of Data Augmentation for Online Class-Incremental Learning
Paper ā¢ 2303.07864 ā¢ Published ā¢ 1 -
Self-Evolution Learning for Mixup: Enhance Data Augmentation on Few-Shot Text Classification Tasks
Paper ā¢ 2305.13547 ā¢ Published ā¢ 1 -
MixPro: Simple yet Effective Data Augmentation for Prompt-based Learning
Paper ā¢ 2304.09402 ā¢ Published ā¢ 2 -
LM-CPPF: Paraphrasing-Guided Data Augmentation for Contrastive Prompt-Based Few-Shot Fine-Tuning
Paper ā¢ 2305.18169 ā¢ Published ā¢ 1
-
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
Paper ā¢ 2211.05100 ā¢ Published ā¢ 28 -
CsFEVER and CTKFacts: Acquiring Czech data for fact verification
Paper ā¢ 2201.11115 ā¢ Published -
Training language models to follow instructions with human feedback
Paper ā¢ 2203.02155 ā¢ Published ā¢ 16 -
FinGPT: Large Generative Models for a Small Language
Paper ā¢ 2311.05640 ā¢ Published ā¢ 28
-
A technical note on bilinear layers for interpretability
Paper ā¢ 2305.03452 ā¢ Published ā¢ 1 -
Interpreting Transformer's Attention Dynamic Memory and Visualizing the Semantic Information Flow of GPT
Paper ā¢ 2305.13417 ā¢ Published ā¢ 1 -
Explainable AI for Pre-Trained Code Models: What Do They Learn? When They Do Not Work?
Paper ā¢ 2211.12821 ā¢ Published ā¢ 1 -
The Linear Representation Hypothesis and the Geometry of Large Language Models
Paper ā¢ 2311.03658 ā¢ Published ā¢ 1
-
The Impact of Depth and Width on Transformer Language Model Generalization
Paper ā¢ 2310.19956 ā¢ Published ā¢ 10 -
Retentive Network: A Successor to Transformer for Large Language Models
Paper ā¢ 2307.08621 ā¢ Published ā¢ 170 -
RWKV: Reinventing RNNs for the Transformer Era
Paper ā¢ 2305.13048 ā¢ Published ā¢ 15 -
Attention Is All You Need
Paper ā¢ 1706.03762 ā¢ Published ā¢ 50
-
Understanding LLMs: A Comprehensive Overview from Training to Inference
Paper ā¢ 2401.02038 ā¢ Published ā¢ 63 -
Learning To Teach Large Language Models Logical Reasoning
Paper ā¢ 2310.09158 ā¢ Published ā¢ 1 -
ChipNeMo: Domain-Adapted LLMs for Chip Design
Paper ā¢ 2311.00176 ā¢ Published ā¢ 9 -
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct
Paper ā¢ 2308.09583 ā¢ Published ā¢ 7
-
AlpaGasus: Training A Better Alpaca with Fewer Data
Paper ā¢ 2307.08701 ā¢ Published ā¢ 22 -
The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset
Paper ā¢ 2303.03915 ā¢ Published ā¢ 6 -
MADLAD-400: A Multilingual And Document-Level Large Audited Dataset
Paper ā¢ 2309.04662 ā¢ Published ā¢ 22 -
SlimPajama-DC: Understanding Data Combinations for LLM Training
Paper ā¢ 2309.10818 ā¢ Published ā¢ 10
-
Creative Robot Tool Use with Large Language Models
Paper ā¢ 2310.13065 ā¢ Published ā¢ 9 -
CodeCoT and Beyond: Learning to Program and Test like a Developer
Paper ā¢ 2308.08784 ā¢ Published ā¢ 5 -
Lemur: Harmonizing Natural Language and Code for Language Agents
Paper ā¢ 2310.06830 ā¢ Published ā¢ 31 -
CodePlan: Repository-level Coding using LLMs and Planning
Paper ā¢ 2309.12499 ā¢ Published ā¢ 74
-
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper ā¢ 2402.17764 ā¢ Published ā¢ 607 -
Mixtral of Experts
Paper ā¢ 2401.04088 ā¢ Published ā¢ 158 -
Mistral 7B
Paper ā¢ 2310.06825 ā¢ Published ā¢ 47 -
Don't Make Your LLM an Evaluation Benchmark Cheater
Paper ā¢ 2311.01964 ā¢ Published ā¢ 1