Collections
Discover the best community collections!
Collections including paper arxiv:2309.03852
-
Attention Is All You Need
Paper • 1706.03762 • Published • 50 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 16 -
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Paper • 1907.11692 • Published • 7 -
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Paper • 1910.01108 • Published • 14
-
When can transformers reason with abstract symbols?
Paper • 2310.09753 • Published • 2 -
In-Context Pretraining: Language Modeling Beyond Document Boundaries
Paper • 2310.10638 • Published • 29 -
Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model
Paper • 2310.09520 • Published • 10 -
Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers
Paper • 2309.08532 • Published • 53
-
One Wide Feedforward is All You Need
Paper • 2309.01826 • Published • 31 -
Gated recurrent neural networks discover attention
Paper • 2309.01775 • Published • 7 -
FLM-101B: An Open LLM and How to Train It with $100K Budget
Paper • 2309.03852 • Published • 44 -
Large Language Models as Optimizers
Paper • 2309.03409 • Published • 75
-
MADLAD-400: A Multilingual And Document-Level Large Audited Dataset
Paper • 2309.04662 • Published • 22 -
Neurons in Large Language Models: Dead, N-gram, Positional
Paper • 2309.04827 • Published • 16 -
Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs
Paper • 2309.05516 • Published • 9 -
DrugChat: Towards Enabling ChatGPT-Like Capabilities on Drug Molecule Graphs
Paper • 2309.03907 • Published • 10
-
FLM-101B: An Open LLM and How to Train It with $100K Budget
Paper • 2309.03852 • Published • 44 -
Extending LLMs' Context Window with 100 Samples
Paper • 2401.07004 • Published • 15 -
LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration
Paper • 2402.11550 • Published • 16 -
The What, Why, and How of Context Length Extension Techniques in Large Language Models -- A Detailed Survey
Paper • 2401.07872 • Published • 2
-
TheBirdLegacy/FreeLoaderLM
Text Generation • Updated -
CofeAI/FLM-101B
Text Generation • Updated • 17 • 92 -
FLM-101B: An Open LLM and How to Train It with $100K Budget
Paper • 2309.03852 • Published • 44 -
Composable Function-preserving Expansions for Transformer Architectures
Paper • 2308.06103 • Published • 19