The blog post mentions that Llama-3.1- Nemotron-40B-Instruct exists that prioritize speed and cost. Given that people are making and using quantized versions of the 51B, a 40B alternative would be nice to have.
Β· Sign up or log in to comment