FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness Paper • 2205.14135 • Published May 27, 2022 • 11
Fast Transformer Decoding: One Write-Head is All You Need Paper • 1911.02150 • Published Nov 6, 2019 • 6
Efficient Training of Language Models to Fill in the Middle Paper • 2207.14255 • Published Jul 28, 2022 • 1