Collections
Discover the best community collections!
Collections including paper arxiv:2311.10770
-
FlashDecoding++: Faster Large Language Model Inference on GPUs
Paper • 2311.01282 • Published • 36 -
Co-training and Co-distillation for Quality Improvement and Compression of Language Models
Paper • 2311.02849 • Published • 4 -
Prompt Cache: Modular Attention Reuse for Low-Latency Inference
Paper • 2311.04934 • Published • 29 -
Exponentially Faster Language Modelling
Paper • 2311.10770 • Published • 118
-
Exponentially Faster Language Modelling
Paper • 2311.10770 • Published • 118 -
Orca 2: Teaching Small Language Models How to Reason
Paper • 2311.11045 • Published • 73 -
Adapters: A Unified Library for Parameter-Efficient and Modular Transfer Learning
Paper • 2311.11077 • Published • 25 -
Make Pixels Dance: High-Dynamic Video Generation
Paper • 2311.10982 • Published • 68
-
Fast Chain-of-Thought: A Glance of Future from Parallel Decoding Leads to Answers Faster
Paper • 2311.08263 • Published • 16 -
Exponentially Faster Language Modelling
Paper • 2311.10770 • Published • 118 -
microsoft/Orca-2-13b
Text Generation • Updated • 13.6k • 666 -
Memory Augmented Language Models through Mixture of Word Experts
Paper • 2311.10768 • Published • 17