yulan-team/YuLan-Mini
Text Generation
•
Updated
•
566
•
26
A highly capable 2.4B lightweight LLM using only 1T pre-training data with all details.
Note A highly capable 2.4B lightweight LLM using only 1T pre-training data.
Note The model & optimizer states of the last curriculum phase before learning rate annealing.
Note The model & optimizer states of the 20th curriculum phase.