Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2309.16039

TRAMS: Training-free Memory Selection for Long-range Language Modeling

Paper • 2310.15494 • Published Oct 24, 2023 • 1
A Long Way to Go: Investigating Length Correlations in RLHF

Paper • 2310.03716 • Published Oct 5, 2023 • 9
YaRN: Efficient Context Window Extension of Large Language Models

Paper • 2309.00071 • Published Aug 31, 2023 • 65
Giraffe: Adventures in Expanding Context Lengths in LLMs

Paper • 2308.10882 • Published Aug 21, 2023 • 1

Simple linear attention language models balance the recall-throughput tradeoff

Paper • 2402.18668 • Published Feb 28, 2024 • 18
Linear Transformers with Learnable Kernel Functions are Better In-Context Models

Paper • 2402.10644 • Published Feb 16, 2024 • 79
Repeat After Me: Transformers are Better than State Space Models at Copying

Paper • 2402.01032 • Published Feb 1, 2024 • 22
Zoology: Measuring and Improving Recall in Efficient Language Models

Paper • 2312.04927 • Published Dec 8, 2023 • 2

Effective Long-Context Scaling of Foundation Models

Paper • 2309.16039 • Published Sep 27, 2023 • 30

Research on LLM

When can transformers reason with abstract symbols?

Paper • 2310.09753 • Published Oct 15, 2023 • 2
In-Context Pretraining: Language Modeling Beyond Document Boundaries

Paper • 2310.10638 • Published Oct 16, 2023 • 29
Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model

Paper • 2310.09520 • Published Oct 14, 2023 • 10
Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers

Paper • 2309.08532 • Published Sep 15, 2023 • 53

attention and long context

Efficient Streaming Language Models with Attention Sinks

Paper • 2309.17453 • Published Sep 29, 2023 • 13
Effective Long-Context Scaling of Foundation Models

Paper • 2309.16039 • Published Sep 27, 2023 • 30
allenai/longformer-base-4096

Updated Apr 5, 2023 • 2.54M • 181
google/bigbird-roberta-base

Updated Jun 2, 2021 • 35.3k • 51

Effective Long-Context Scaling of Foundation Models

Paper • 2309.16039 • Published Sep 27, 2023 • 30

Advanced and Recent Papers

Advanced and recent papers about deep learning. Please send your recommend paper to email: [email protected]

AutoCLIP: Auto-tuning Zero-Shot Classifiers for Vision-Language Models

Paper • 2309.16414 • Published Sep 28, 2023 • 19
Dynamic ASR Pathways: An Adaptive Masking Approach Towards Efficient Pruning of A Multilingual ASR Model

Paper • 2309.13018 • Published Sep 22, 2023 • 9
Robust Speech Recognition via Large-Scale Weak Supervision

Paper • 2212.04356 • Published Dec 6, 2022 • 25
Language models in molecular discovery

Paper • 2309.16235 • Published Sep 28, 2023 • 10

LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models

Paper • 2309.12307 • Published Sep 21, 2023 • 88
Effective Long-Context Scaling of Foundation Models

Paper • 2309.16039 • Published Sep 27, 2023 • 30
Efficient Streaming Language Models with Attention Sinks

Paper • 2309.17453 • Published Sep 29, 2023 • 13

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27, 2024 • 605
Mixtral of Experts

Paper • 2401.04088 • Published Jan 8, 2024 • 158
Mistral 7B

Paper • 2310.06825 • Published Oct 10, 2023 • 47
Don't Make Your LLM an Evaluation Benchmark Cheater

Paper • 2311.01964 • Published Nov 3, 2023 • 1

LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models

Paper • 2309.12307 • Published Sep 21, 2023 • 88
LMDX: Language Model-based Document Information Extraction and Localization

Paper • 2309.10952 • Published Sep 19, 2023 • 65
Table-GPT: Table-tuned GPT for Diverse Table Tasks

Paper • 2310.09263 • Published Oct 13, 2023 • 39
BitNet: Scaling 1-bit Transformers for Large Language Models

Paper • 2310.11453 • Published Oct 17, 2023 • 96

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs