-
GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval
Paper • 2112.07577 • Published -
TSDAE: Using Transformer-based Sequential Denoising Auto-Encoder for Unsupervised Sentence Embedding Learning
Paper • 2104.06979 • Published -
Text Embeddings by Weakly-Supervised Contrastive Pre-training
Paper • 2212.03533 • Published • 1 -
SimCSE: Simple Contrastive Learning of Sentence Embeddings
Paper • 2104.08821 • Published
Collections
Discover the best community collections!
Collections including paper arxiv:2310.12109
-
Simple linear attention language models balance the recall-throughput tradeoff
Paper • 2402.18668 • Published • 19 -
Linear Transformers with Learnable Kernel Functions are Better In-Context Models
Paper • 2402.10644 • Published • 80 -
Repeat After Me: Transformers are Better than State Space Models at Copying
Paper • 2402.01032 • Published • 23 -
Zoology: Measuring and Improving Recall in Efficient Language Models
Paper • 2312.04927 • Published • 2
-
Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture
Paper • 2310.12109 • Published • 1 -
Laughing Hyena Distillery: Extracting Compact Recurrences From Convolutions
Paper • 2310.18780 • Published • 3 -
FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores
Paper • 2311.05908 • Published • 12 -
Multi-Dimensional Hyena for Spatial Inductive Bias
Paper • 2309.13600 • Published • 1
-
Efficient Memory Management for Large Language Model Serving with PagedAttention
Paper • 2309.06180 • Published • 25 -
LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models
Paper • 2308.16137 • Published • 39 -
Scaling Transformer to 1M tokens and beyond with RMT
Paper • 2304.11062 • Published • 2 -
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models
Paper • 2309.14509 • Published • 17