How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study Paper • 2404.14047 • Published Apr 22, 2024 • 45
Reasoning in Large Language Models: A Geometric Perspective Paper • 2407.02678 • Published Jul 2, 2024 • 1
Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published Dec 13, 2024 • 89